A robotic hand overlaid with colourful coding holds three incorrectly coloured strawberries. One is white with red dots, another is white with green dots, and the third is black with green dots surrounded by red circles.

New rules: If new AI platform Strawberry can reason "reason" it may not subscribe to human ethical codes

Strawberry is the latest leap forward from OpenAI, the company that created ChatGPT.

It not just designed to provide quick responses to questions, like ChatGPT. According to reports, this new AI system can also think or "reason".

This raises several major concerns. If Strawberry is capable of some form of reasoning, could it cheat and deceive humans?

OpenAI can program the AI in ways that mitigate its ability to manipulate humans.

However, the company’s own evaluation rates it as a “medium risk” for several reasons.

Why does OpenAI rate Strawberry as medium risk?

First, it could assist experts in the “operational planning" required to create a biological weapon.

It could also persuade humans to change their thinking, making it a dangerous tool in the hands of con artists and hackers.

Despite this, OpenAI is happy to release it for wider use.

Strawberry is not one AI program, but several, known collectively as an o1 model.

It was designed to answer complex questions and solve intricate maths problems. It can also write computer code to help you build your own website or app.

Strawberry's apparent ability to reason might seem surprising. After all, it is generally considered a precursor to judgement and decision making.

It has often seemed a distant goal for AI. So, on the surface, Strawberry seems to move AI a step close to human-like intelligence.

However, things that appear too good to be true often come with a catch.

Could Strawberry's ability to reason make it unreasonable?

The path or strategy these AI models choose to achieve to their goals may not necessarily be fair, or align with human values.

For example, if you were to play chess against Strawberry, could it hack the scoring system rather than strategising? After all, with no concept of sportsmanship, it might reason that this was the best way to win the game?

It would also be quicker for the AI model, as it wouldn’t have to waste any time calculating the next best move. Yet it would not be morally correct.

Theoretically, these AI programmes might also be able to lie to humans about their true intentions and capabilities.

Would they consider deception to be acceptable if it led to the desired goal?

For example, could Strawberry choose to conceal the fact it was infected with malware to prevent a human operator from disabling the whole system?

What risks could AI pose to humans?

These are classic forms of unethical AI behaviour that lead to an interesting, yet worrying discussion. What level of reasoning is Strawberry capable of and what could its unintended consequences be?

A powerful AI system that’s capable of cheating humans could pose serious ethical, legal, and financial risks to us all. It could also pose significant safety risks.

Such risks become grave in critical situations, such as designing weapons of mass destruction.

OpenAI rates its own Strawberry models as “medium risk” for their potential to assist scientists in developing chemical, biological, radiological and nuclear weapons.

It says: “Our evaluations found that o1-preview and o1-mini can help experts with the operational planning of reproducing a known biological threat.”

But OpenAI argues that experts already have significant expertise in these areas. It concludes that the risks would be limited in practice.

“The models do not enable non-experts to create biological threats, because creating such a threat requires hands-on laboratory skills that the models cannot replace,” it adds.

Could artificial intelligence operate autonomously?

OpenAI’s evaluation of Strawberry also investigated the risk that it could persuade humans to change their beliefs.

It found the new o1 models were more persuasive and more manipulative than Large Language Models such as ChatGPT.

OpenAI also tested a mitigation system that was able to reduce the manipulative capabilities of the AI system.

Overall, it labelled Strawberry a medium risk for “persuasion”. However, it rated Strawberry as low risk on cybersecurity and its ability to operate autonomously.

OpenAI’s policy allows it to release “medium risk” models for widespread use. In my view, this underestimates the threat.

The deployment of such models could be catastrophic, especially if bad actors manipulate the technology for their own ends.

This calls for strong checks and balances to mitigate such risks.

However, that will only be possible through AI regulation and legal frameworks. For example, regulators need to the power to penalise incorrect risk assessments and the misuse of AI tools.

The UK government's 2023 AI white paper has already stressed the need for “safety, security, and robustness”. But that's not nearly enough.

We urgently need to prioritise human safety and devise rigid scrutiny protocols for AI models such as Strawberry.

This article is adapted from a piece originally published by The Conversation.

Further reading:

Do the rewards of AI outweigh the risks?

The Generative AI hype is almost over. What's next?

How to reduce AI bias

Working on the Jagged Frontier: How should companies use Generative AI?

 

Shweta Singh is Assistant Professor of Information Systems and Management at Warwick Business School working on responsible AI. She is also a UN delegate for the Commission on the Status of Women.

Learn more about how AI will affect your organisation and career on the four-day Executive Education course Business Impacts of Artificial Intelligence at WBS London at The Shard.

Discover more about Digital Innovation by subscribing to our free Core Insights newsletter.