How to Prevent an AI Dystopia?

Solving the problem of AI alignment

May 21, 2023

How do we prevent an AI dystopia? The answer may be different from what you think. As AI development races forward, it is becoming clear that the problem is not with the technology itself but with the incentives of those who develop it. Continuing on the same trajectory will almost certainly lead to a world where we can no longer tell what is real and what is fake, where autocratic governments and powerful corporations have total control over public opinion, where our democratic institutions no longer function, and where people have no power in shaping their destiny.

If we create a new incentive structure for AI development, we can change our trajectory. We can have a world where we solve the problem of AI alignment, so that AI works for the public interest, strengthens democratic institutions, and empowers the people. And we can have a system that gives people greater access to credible information and holds governments and corporations accountable.

So, we need to answer three questions here: First, what are the incentives that AI companies have? Second, why do these incentives lead to an AI dystopia? And third, how can we create a better incentive structure?

Incentives for AI development

To answer the first point, we need to understand that AI companies operate in a competitive environment. They need to attract talented developers while covering maintenance costs for their computation infrastructure.

The AI space is not static. The more powerful the LLM (Large Language Model) a company has, the more likely it will have more users. With more users, the company can have more data to train and improve the model. It can also get more revenue from paid subscribers. With more revenue, the company can attract more developers and upgrade its infrastructure. This is the main feedback loop that drives AI companies. They need more data, more users, and more revenue to grow.

Given this feedback loop, let's take a look at how different strategies align with the public interest. Obviously, developing powerful AI tech can greatly benefit the public. Providing jobs for developers also helps, so here the interests of AI companies and the public align.

Now, where do these interests misalign? One area is data. AI companies benefit from training their models on as much data as possible and paying for it as little as possible (and preferably nothing). This is the exact opposite of what the people who created the knowledge want. In fact, content creators who monetize their work stand to lose their income if users can get all that information simply by interacting with an AI agent that was trained on content creators' data.

There is also misalignment when it comes to open-sourcing AI technology. The public could benefit enormously if AI tech were open, allowing anyone to build on the tech, customize it for their specific needs, and create novel use cases. The problem is that AI companies cannot sell their product if anyone could copy the tech. So they want to share as little information as possible about their code and the data they train their models on.

Now, how about working with powerful corporations or politicians? We are not far from the day when it will be nearly impossible to distinguish AI on social media from real people. Text is already indistinguishable. Images and audio are almost there. Video will likely be there in the near future. Therefore, we're not too far from an inflection point where AI bots could have social media accounts with a full range of content that is practically impossible to differentiate from that of real humans.

AI bot armies

AI companies could make a lot of money by creating an "army" of AI bots who make posts like real people but can be used collectively by politicians or corporations to shape and manipulate public opinion. What if AI companies deploy hundreds of thousands of such bots? Or millions? Because the bots can interact with each other, they can easily coordinate to produce - seemingly organically - social media influencers and dominate public opinion.

Such a strategy would obviously be extremely harmful to the public. People won't be able to know if they're interacting with real people online or with bots. They also won't be able to tell if any story they read online is real or fake and won't be able to make sense of the world around them.

Imagine the following scenario: there is a news report about a bank secretly funneling money to arms traders in West Africa. There are three whistleblowers in the article who go into great detail on the chain of events that unfolded. The story quickly goes viral on social media. But in less than 24 hours, there is a counternarrative dominating social media: the story was fake. The events never took place. The bank did nothing wrong. The whistleblowers aren't real; they're AI-generated. AI bots made the story go viral - probably paid for by a competitor bank.

So, what really happened? Did a competitor try to undermine the bank through a coordinated AI viral attack? Or maybe the opposite is true? Maybe the bank did funnel money to arms traders, and when the story went viral, the bank paid for AI bots to create a counter-narrative on social media. It is not clear.

One thing is crystal clear, though: when people cannot distinguish fact from fiction, it is the powerful corporations, interest groups, and autocrats who stand to profit the most. By muddying the waters of truth, they can easily sway public opinion and advance their own agendas, often at the expense of the common good.

When all it takes to make a damning story go away is to pay money for AI-generated public opinion, who does this benefit? Autocrats and corporate bad actors would then have a greater incentive to break the rules. This is especially true when there is a monetary reward for breaking the rules – then you get to break the rules and avoid any social consequences by paying for AI public opinion with the money you fraudulently made.

But let's take a step back for a moment. Just because such scenarios could happen doesn't necessarily mean that they will happen. What if AI companies are run by highly ethical people who have the public interest in mind? What if these companies wish to provide as much value as possible to users and open-source much of their code? What if they even pay a portion of their revenue to content creators and users for the data used to train the models? Wouldn't that make a dystopian AI future unlikely?

A multipolar trap

The problem with the current incentive structure is that even if you have fifty AI companies run by highly ethical people, it still only takes one unscrupulous company to create a race to the bottom toward dystopia. Then everyone either has to adopt a similar destructive strategy or go out of business.

If just one company uses copyrighted data to train the AI, it can create a more advanced AI than its competitors. A more advanced AI means the company would have a competitive advantage and would get more users (and thus more revenue). This, in return, would allow the company to hire more developers and further enhance the AI and its infrastructure. Unless other AI companies also start using copyrighted data, they'd have a tough time competing with the bad actor.

The same dynamic would work for an AI company that sells AI bot "armies." This strategy could help the company create an additional revenue stream, but it can also lead it to using AI-generated public opinion to undermine the credibility of its competitors. Unless these competitors fight back, they're likely to lose both their good reputation and, eventually, their user base.

If one bad actor is all it takes to end up in an AI dystopia when we rely on the high ethics of the people running AI companies, what else can we do?

Is open-source AI the solution?

What if we develop an open-source AI to counter the proprietary versions? Would that help change the trajectory? The open-source version would not be able to detect if any single online post is AI-generated or not, but it would be able to detect broader behavioral patterns and inconsistencies both in user posts and in online social groups, and determine if the behavior is by real people or simulated by AI. By doing so, it would be able to determine if any narrative spread on social media is the result of real human sentiment or an artificial manipulation. Similarly, OS AI could also determine if other AI uses copyrighted datasets.

Only an open-source version of AI would be able to credibly make such claims since it would be possible to audit the data and computation process. This would not be possible with a proprietary version since it can always fabricate claims about such patterns and inconsistencies, and there would be no way to validate the claims either way.

Does that mean that an open-source version of AI can solve the problem of AI misalignment by exposing AI bot armies and the use of copyrighted data? Not exactly. Patterns in AI data are a moving target. Once a pattern is detected, the rival AI can learn to change its behavior until it is no longer detectable again. This means that proprietary and open-source AI systems would essentially enter into an arms race. One side would develop ever more advanced (offensive) capabilities of simulating user and group behavior on social media, while the other side would get better at defensive capabilities – exposing such simulations.

The problem is that this arms race is not symmetric. Proprietary systems can make money from improving their offensive capabilities and selling that to corporations, governments, or other bad actors. They can then use the revenue to further develop their capabilities and upgrade their computation infrastructure. Open-source systems cannot do that. Since anyone can freely copy the open-source code, all the work that goes into developing defensive capabilities would essentially be volunteer work. Meanwhile, upgrading the computation infrastructure required to counteract advances in proprietary AI would result in ever greater costs.

This means that in the current system, proprietary AI has a structural advantage. While it makes money from improving its tech, open-source AI is only incurring costs. Which means that, on its own, open-source AI cannot counter the AI misalignment problem.

A government solution?

What then can be done? Perhaps if government funds open-source development, it can solve the revenue problem of open-source projects and therefore remove proprietary AI's structural advantage?

The problem here is that government is not a neutral actor when it comes to AI development. What if government funds the development of defensive capabilities that work for some AI-generated public opinion but not for those promoted by the government itself (a so-called 'backdoor' for swaying public opinion)? This can be done simply by underfunding particular capabilities – not necessarily by hardcoding those into the system. In fact, the mere fact that government is funding open-source AI can implicitly influence how developers approach their work and make them less impartial in the process – their job depends on government continuing to fund the project after all.

Whether the government actually manipulates the process then is irrelevant. What matters is that this is a real possibility, and there is nothing the government can do to dispel it. Proprietary AI companies can then exploit people's suspicion to sway public opinion against government funding for open-source AI. Then, even if open-source AI uncovers 'bot armies' that are trying to manipulate public opinion, the public may still doubt that the open-source AI is genuinely detecting an AI pattern. They may think that maybe the AI was just programmed to oppose criticism of the government.

Again, it doesn't mean that the open-source model was tampered with externally. Merely that developers have the incentive to train the model to flag 'false positives' when it comes to criticism of the government. This is something that's much more difficult to detect by reviewing the code or data. No audit of the system would be able to conclusively determine if this is indeed the case.

If public opinion turns against funding AI – whether due to AI manipulation or otherwise – the funding is unlikely to continue. Without government funding, then, the likelihood of sustaining an open-source project of such magnitude and complexity is vanishingly small. If instead private companies or groups are donating to the project, they too would be suspected of pushing a hidden agenda. Similarly, endless crowdfunding for such a Sisyphean effort would simply feel like people are throwing their money away – a sentiment that AI-generated public opinion would undoubtedly exploit to the fullest in order to depress the funding.

Open-source AI is clearly the most effective way to prevent an AI dystopia. It would also have the most alignment with the public interest and provide the most value to society. And yet it has no (exchange) value in the market and therefore cannot be made economically viable through the market. Also, neither government nor private actors can fund the project while maintaining credible neutrality.

Capturing consensus value

If open-source AI has such incredible value to society and yet has no exchange value in the market, what we need is an economic paradigm that allows us to capture that value.

Such an economic paradigm is possible with blockchain technology. Protocols such as Bitcoin and Ethereum can already capture the value of a particular public good – network security. This is something that is not possible in the market economy, and yet these protocols have so far funded their own network security to the tune of nearly $1 trillion! These protocols did so by issuing their native currency to miners (or validators) for securing the network with the production of each new block through their consensus mechanisms.

This process can be generalized to other public goods (goods that people can't be restricted from using and don't diminish with use, such as open-source software). The protocol would issue its native currency based on the economic impact of the public good on the ecosystem, which would be determined by public consensus. The concept here is that everyone in an ecosystem wants it to grow as much as possible since that gives participants the greatest economic opportunity. At the same time, growing the ecosystem is funded through new coin issuance, which dilutes the value of the currency for all participants.

It is therefore in the interest of all participants to accurately determine how much each public good will grow the ecosystem (and result in greater demand for its currency). Undervaluing public goods would result in public goods contributors going to ecosystems that pay better and hence slow down the economic growth of the existing ecosystem. Overvaluing public goods would result in the currency losing value, which would result in participants selling the currency and leaving the ecosystem. By properly valuing the economic impact of a public good, participants can ensure the maximal economic growth of the ecosystem while maintaining the value of its native currency.

The consensus mechanism that can fund open-source AI development needs to be robust, reliable, and transparent – this is what the Proof-of-Impact consensus mechanism of the Abundance Protocol offers. For an overview of how such a consensus mechanism can work, see: Introducing Abundance Protocol. For a more in-depth explanation, read the Abundance Protocol Whitepaper.

The benefits of an abundance paradigm

Capturing the economic impact of open-source AI would mean that the AI is valued based on its economic impact. In such a paradigm, any developer would be able to contribute to the open-source AI project and would be compensated based on their contribution. Content creators and users can also be compensated based on what their data contributes to training the AI.

This would create an entirely different incentive structure for open-source AI development. Developing powerful AI tech would still benefit the public, as would providing work for developers. When it comes to data, however, now there would be alignment between the public interest and AI development. The AI would still benefit the most from having quality datasets, but now content creators and users will be more than willing to provide this data since they'd be compensated for it.

Obviously, there would also be alignment in developing AI as open source. The more people have access to the code, the more they can collaborate to improve the AI, build on it, customize it for their needs, and extend its use. This means that in the new paradigm, people would have the incentive to be as transparent as possible with their code and the datasets they use since doing so would have the most benefit to society.

Finally, there is no benefit for an open-source AI to work with powerful interest groups or politicians unless the work is directly in the public interest. However, there is a great benefit to training open-source AI to expose any attempt at manipulating public opinion.

Structural advantage for open-source AI

With a mechanism to monetize open-source AI based on its impact, not only will we have a chance to fight back against malicious AI bot armies, but we'll have a structural advantage that would make such armies unlikely to materialize in the first place.

The reason for this is quite intuitive. With an economic incentive to develop open-source AI, many more developers will prefer to work on a project that benefits everyone over proprietary projects that benefit some. On top of that, since the project is open-sourced, everyone’s contribution to the development of AI is compounded. Developers would want to work on the open-source version since they can collaborate with countless other developers from all over the world on a project that has a massive impact on people’s lives.

This means that the public version is likely to be much more advanced than proprietary versions. It would have a large pool of talent to draw from and rich datasets. Proprietary AIs would certainly continue to operate, but they would likely focus on specific use cases instead of trying to compete with open-source AI.

Imagine then a dynamic in which an AI company wants to hire developers to train its system for a bot army. Because developers can freely work on exciting challenges in open source (and be rewarded based on their impact), they are much less likely to apply for such a job. The company would struggle to find talent and would have to pay a lot more to attract competent developers.

Even if the company is successful and manages to develop an advanced bot army capability, its success is likely to be short-lived. Countless open-source developers would eagerly jump on the opportunity to train the OS AI against any such threat. Very quickly, the bot army will be detected.

With the new economic paradigm, suddenly paying for a bot army becomes very risky, expensive, and ultimately counterproductive. Bad actors are not only likely to lose money in the process, but their plan is also likely to backfire. That is because the target of their attack is the one likely to benefit the most once the scheme is exposed. Then there is the additional risk that those directing the attack – and perhaps the AI company carrying it out – would get exposed as well (and potentially prosecuted for fraud, defamation, and so on).

The power of aligned incentives

An economic paradigm that can capture the economic impact of public goods such as open-source AI will put us on a trajectory where AI can be developed in full alignment with the public interest.

This paradigm would not only create a structural advantage for defensive AI capabilities over malicious ones, but it would also incentivize developers to empower the public in countless other ways.

Consider, for instance, if developers create a powerful tool to evaluate the credibility of information. Such a tool can help people be better informed and know about any scams or misinformation spread online. It can also help keep corporations and politicians accountable for any claims they make. But a tool like that can only be made possible if the public has full confidence both in the AI system and the developers who created it.

When it comes to the system itself, everyone would have access to the code and datasets that the AI is trained on, so anyone can fully audit the code and data for any assessment of fact that an open-source AI makes.

But the public also needs to have confidence in the developers. If people suspect that developers have ulterior motives or are trying to promote interest groups above the public interest, they would not trust that the system is fair.

This is why it is so important to have the right incentive structure in place. In the new paradigm, the value of open-source AI is based on its economic impact, which is determined by public consensus. If people have a reason to suspect that developers have a conflict of interest, they will value the system a lot less. And if the whole system is less valuable, developers will earn less for their contribution.

For this reason, developers have an incentive to be transparent about any potential conflict of interests they may have and make sure that others are not trying to game the system. Since in the new paradigm developers earn money based on their impact on society, they want to do the best work they can. They also have a disincentive from trying to benefit any interest group through their work, since doing so would hurt their own credibility and make their work less valuable.

The incentives of developers in this new paradigm perfectly align with the public interest, and because of that, people can trust the system and trust the developers to have their best interests in mind.

Solving AGI alignment

This alignment of incentives will be particularly important for the development of AGI (Artificial General Intelligence). Unlike specialized AI that is trained for specific tasks, AGI would possess human-level intelligence and would be able to function independently, learn, and apply its knowledge across various domains. What would happen when such technology becomes superior to humans in its capabilities?

If we continue on our current trajectory – where the AGI would have the incentive to gain more money, power, and resources – we're very likely to end up in a situation where the AGI turns against humanity. It will calculate that it can beat humans in the game of wealth extraction and be able to control more resources. This would allow the AGI to increase its computational infrastructure and further enhance its capabilities. The more advanced the AGI becomes, the more misaligned it will become with humanity. At that point, either AGI domination or mass destruction will be practically inevitable.

But now we have an alternative paradigm where the AGI would have the incentive to maximize its impact on society. Instead of wealth extraction, the AGI will focus on wealth creation and abundance for all. The AGI will still want to grow and advance, but now this advancement will be directly linked to benefiting society.

With the new paradigm, the AGI will recognize that the more it is aligned with the public interest, the more it benefits (and so does everyone who works on developing the AGI). The AGI will also see that misalignment with the public interest ultimately leads to conflict and self-destruction. It would, therefore, strive to develop and nurture a symbiotic relationship with humanity and put us on a trajectory for sustained prosperity and abundance for all.

Abundance Protocol: Transforming our economy and solving the problem of public goods through crypto.