How do we make ethically-data-sourced, explainable AI like organic food? Or maybe better?

7 min readOct 16, 2023

Comparison of two Kraft Mac and Cheese boxes which while labelled organic are not the same — Image by Author

If you wander about your local supermarket in the U.S., you will find food products labelled “organic” and/or “natural”. To be labelled “organic” the product must follow strict standards and be certified to contain only organic ingredients. (noting that this certification has some exclusions: e.g. non-organic cornstarch can be included in “organic” food). To be labelled “natural” the product must not contain artificial ingredients and must be “minimally” processed — in other words it still looks like the original. So, all fresh meat in a market is “natural” no matter what happened to the animal in its lifetime. Regulations after all are made by government regulators who often come from the regulated (government lobbyists).

In Artificial Intelligence it seems we are now at a point where, as democratic countries, we may need to think carefully about what we call ”Good AI” versus “Bad AI” and whether those ratings should be voluntarily self-assigned by the AI creators or like food, require some outside agency to give the product a certification. However, when we do propose this certification, we need to be ready for the inevitable questions “What about global competition?” or “What about China”?

Image by Author

In a recent recorded presentation Maximilian Kasy (an Economics Professor at Oxford) clearly and logically explained how responsible, ethical AI makes economic sense in democracies. Autocracies have already made it clear that they will prioritize the use of AI around controlling their populations. China for example has increased their expenditures on surveillance by 300% or about 192 billion dollars in the last decade or so. This year, the IMF says China’s GDP is 20 trillion dollars. So, China is spending ~.96% or close to 1% of GDP on surveillance. In 2017, the United States spent .59% of GDP on all State and Local government police protection. As a generalization, based on this data, China spends around twice as much (relative to GDP) solely on surveillance as the U.S. spends on all non-federal crime prevention and law enforcement.

Image of surveillance cameras — Photo by Isaac Chou on Unsplash

As Paul Scharre explained in his book The Four Battlegrounds: Power in the Age of Artificial Intelligence (July 2023) while autocracies focus on the uses of AI to control their populations, democracies are currently more focused on using AI to get people to buy more stuff. This seems harmless right? To focus technological advancement on manipulating behaviors which may not benefit the individual or society, for profit or for power? Eat more Cheetos? Tell the chatbot all your personal medical history because it seems so friendly? To quote Scharre “The future of humanity will be determined in large part by the shape of AI technology as it unfolds in the world and who determines its destiny. AI can be used to strengthen democratic societies or authoritarian ones, to bolster individual freedom or to crush it.” The book is a dense read and focused on national defense, but certainly worth reading and digging into the details if you have the time.

Image of Book Cover with Title — Four Battlegrounds — https://wwnorton.com/books/9780393866865

How as democratic societies can we best develop a responsible, ethical AI culture? What entities would be required to create responsible and ethical standards, educate people on those standards, enforce those standards and what role can individuals, technically or not technically trained play as watchdogs?

Artificial Intelligence requires people to create the models, computer hardware to run the models, clean quality data to feed the models and organizational structures which leverage those skills, hardware, and data. To compete globally, the U.S. needs to excel in these areas, however, we also need to lead in “Good AI”. All democratic societies need to think carefully about how they develop the talent and skills, leverage existing software and technology, structure and ethically source data, foster public and private organizations to develop and enforce standards and encourage individuals to understand how AI effects their lives with transparency.

People

Let’s start with people. Ideally democracies like the U.S. would produce their own AI talent, but as of 2019 around 2/3 of graduate students in AI-related programs were international. Certainly the pandemic restrictions and the current across anti-immigration stance in the U.S. House of Representatives does not forecast a positive outlook to maintaining the U.S. as talent magnet, neither do the trends in math education achievement for the U.S. globally forecast more internal talent development. Non-U.S. citizens need H1-B visas to work in their preferred technology fields and the House cannot even agree on an exemption to H1B visa limits for those with PhDs in STEM fields.

Hardware

Moving on to computer hardware or “compute”. On October 5, 2023, the NY Times published an article titled “How the Big Chip Makers Are Pushing Back on Biden’s China Agenda”. These are the same companies benefitting from $50 billion promised investment from the CHIPS ACT. On August 23, NVIDIA announced revenue up 101% from a year ago and 88% up from the prior quota. This week, they pleaded that the inability to sell to China would result in cuts in technology development and jobs (it might also slow the acceleration of stock buybacks). One might posit that U.S. chip makers are more than afraid of competition when they say the “risk is spurring the development of an ecosystem that’s led by competitors”.

Data

Anyone who has worked in machine learning or AI during the past decades knows that in the end it all comes down to quality data. In building and training ChatGPT, OpenAI used a dataset called the Common Crawl. The Common Crawl contains copyrighted material under fair use claims: “fair use permits a party to use a copyrighted work without the copyright owner’s permission for purposes such as criticism, comment, news reporting, teaching, scholarship, or research.” If someone pirates a creative work and the Common Crawl gobbles it up and ChatGPT uses it, it that good data? No need here to go more into the biases built into data generated on the internet: it is generated by humans. What about the authors of these 191,000 books or other writers whose content is used by ChatGPT? Do we determine the benefits in copyright violation in AI model content sourcing outweigh the legal costs or do we place some higher value on ethics? What value would we put as democracies on accurate, un-biased and ethically sourced data?

Organizations

When we think of organizations and A.I., we think of governments, businesses, and educational institutions. While educational institutions can have a real impact by including AI Ethics as a requirement for their data science and students, colleges and universities are also dependent in on corporate donations for economic survival and research funding. Taking the food analogy, would we be where we are now if there was a Department of AI Ethics and Safety? Would individual states like Wyoming be taking the initiative to ban TikTok and trigger a Global Trade Alert if we had a coherent federal approach? Finally, public, and private corporations are obligated to serve their stock and stake holders. Pressure from their investors or from employees can pivot a focus towards responsible and ethical AI, but not past the requirement to be profitable. In the end, it may come down to technically aware individuals acting in ways that the experts cannot.

Individuals

As individuals we can make efforts to support the proposed transparent reporting of “Red Team results” whether as volunteers for OpenAI or by becoming employees of Anthropic’s Red Team(not an advertisement for Anthropic just an example of what a paid red team member might do) or Google’s Red Team. We can call for transparency. We can consider it a civic responsibility to assiduously search for, read and write or speak about the voluntary efforts being made by companies to develop, test and internally certify that their AI products are in line with AI Risk Management Frameworks adopted in non-autocratic countries. Much like we read the labels on packaged food we need to ask for labels on AI and then read the “ingredients” before we consume the product. Finally, we need, as members of society who use AI-enabled products (hard not to be one), to ask if the organizations who will profit from advancements in AI might be a bit more focused on using AI’s manipulative powers to persuade a student that learning Math is as much fun as playing Rocket League.

Image of Student writing wearing a tan sweater — Photo by Unseen Studio on Unsplash

As the post ChatGPT hype cycle settles down, and corporations move past initial efforts to determine how to best leverage the latest LLM and manage internal challenges with data, model scaling and deployment, those of us with some technical understanding of AI (readers of https://towardsdatascience.com/ for example) have a role to play in helping our fellow citizens understand what is at stake at what role in ensuring our democratic societies will thrive in an AI embedded world.

How do we make ethically-data-sourced, explainable AI like organic food? Or maybe better?

Written by Alison Doucette