Digital Innovation and Entrepreneurship

Could OpenAI's new Strawberry AI manipulate humans?

A robotic hand overlaid with colourful coding holds three incorrectly coloured strawberries. One is white with red dots, another is white with green dots, and the third is black with green dots surrounded by red circles.

New rules: If new AI platform Strawberry can reason "reason" it may not subscribe to human ethical codes

Strawberry is the latest leap forward from OpenAI, the company that created ChatGPT.

It's not just designed to provide quick responses to questions, like ChatGPT. According to reports, this new AI system can also think or "reason".

This raises several major concerns. If Strawberry is capable of some form of reasoning, could it cheat and deceive humans?

OpenAI can program the AI in ways that mitigate its ability to manipulate humans.

However, the company’s own evaluation rates it as a “medium risk” for several reasons.

Why does OpenAI rate Strawberry as medium risk?

First, it could assist experts in the “operational planning" required to create a biological weapon.

It could also persuade humans to change their thinking, making it a dangerous tool in the hands of con artists and hackers.

Despite this, OpenAI is happy to release it for wider use.

Strawberry is not one AI program, but several, known collectively as an o1 model.

It was designed to answer complex questions and solve intricate maths problems. It can also write computer code to help you build your own website or app.

Strawberry's apparent ability to reason might seem surprising. After all, it is generally considered a precursor to judgement and decision making.

It has often seemed a distant goal for AI. So, on the surface, Strawberry seems to move AI a step close to human-like intelligence.

However, things that appear too good to be true often come with a catch.

Could Strawberry's ability to reason make it unreasonable?

The path or strategy these AI models choose to achieve to their goals may not necessarily be fair, or align with human values.

For example, if you were to play chess against Strawberry, could it hack the scoring system rather than strategising? After all, with no concept of sportsmanship, it might reason that this was the best way to win the game?

It would also be quicker for the AI model, as it wouldn’t have to waste any time calculating the next best move. Yet it would not be morally correct.

Theoretically, these AI programmes might also be able to lie to humans about their true intentions and capabilities.

Would they consider deception to be acceptable if it led to the desired goal?

For example, could Strawberry choose to conceal the fact it was infected with malware to prevent a human operator from disabling the whole system?

What risks could AI pose to humans?

These are classic forms of unethical AI behaviour that lead to an interesting, yet worrying discussion. What level of reasoning is Strawberry capable of and what could its unintended consequences be?

A powerful AI system that’s capable of cheating humans could pose serious ethical, legal, and financial risks to us all. It could also pose significant safety risks.

Such risks become grave in critical situations, such as designing weapons of mass destruction.

OpenAI rates its own Strawberry models as “medium risk” for their potential to assist scientists in developing chemical, biological, radiological and nuclear weapons.

It says: “Our evaluations found that o1-preview and o1-mini can help experts with the operational planning of reproducing a known biological threat.”

But OpenAI argues that experts already have significant expertise in these areas. It concludes that the risks would be limited in practice.

“The models do not enable non-experts to create biological threats, because creating such a threat requires hands-on laboratory skills that the models cannot replace,” it adds.

Could artificial intelligence operate autonomously?

OpenAI’s evaluation of Strawberry also investigated the risk that it could persuade humans to change their beliefs.

It found the new o1 models were more persuasive and more manipulative than Large Language Models such as ChatGPT.

OpenAI also tested a mitigation system that was able to reduce the manipulative capabilities of the AI system.

Overall, it labelled Strawberry a medium risk for “persuasion”. However, it rated Strawberry as low risk on cybersecurity and its ability to operate autonomously.

OpenAI’s policy allows it to release “medium risk” models for widespread use. In my view, this underestimates the threat.

The deployment of such models could be catastrophic, especially if bad actors manipulate the technology for their own ends.

This calls for strong checks and balances to mitigate such risks.

However, that will only be possible through AI regulation and legal frameworks. For example, regulators need to the power to penalise incorrect risk assessments and the misuse of AI tools.

The UK government's 2023 AI white paper has already stressed the need for “safety, security, and robustness”. But that's not nearly enough.

We urgently need to prioritise human safety and devise rigid scrutiny protocols for AI models such as Strawberry.

This article is adapted from a piece originally published by The Conversation.

Related News

Digital Innovation and Entrepreneurship

The key leadership skills for small business growth

Mark Hart and Vicki Belt reveal the often overlooked 'intangible' leadership skills that can help businesses scale.

Digital Innovation and Entrepreneurship

Why human relationships are key in venture capital

Many people imagine venture capitalists are like Gordon Gekko. The reality should be quite different, argues Simon Barnes.

Digital Innovation and Entrepreneurship

TikTok ban: The digital battleground between the US and China

As the deadline looms for TikTok to find a buyer, Shweta Singh explores why Donald Trump is threatening to ban the app in the US.

Could OpenAI's new Strawberry AI manipulate humans?

Why does OpenAI rate Strawberry as medium risk?

Could Strawberry's ability to reason make it unreasonable?

What risks could AI pose to humans?

Could artificial intelligence operate autonomously?

Further reading: