Was OpenAI’s newest discovery: Q* the reason for firing Sam Altman?

Published in

Predict

2 min readNov 24, 2023

Q*- the supposed new AI breakthrough from OpenAI

In the last few days, several reports have stated that OpenAI should have made a major breakthrough with a project called Q* (pronounced Q-star). So big that several leading researchers felt compelled to warn the board and management about the breakthrough, which they saw as a threat to humanity. This was just days before Sam Altman’s firing last Friday.

But what is Q*? Let me try to demystify it. While it may sound complex, its impact is far-reaching across industries, including robotics, healthcare, finance, and more.

So, What is Q?*

There are basically three ways to train an AI. ChatGPT is trained with a method called Reinforcement learning. Think of traditional reinforcement learning in AI like teaching a child to solve a puzzle by rewarding them for each correct piece they place. The child learns step-by-step, focusing on immediate rewards. But if the puzzle changes they must start from scratch.

Enter Q* based reinforcement learning.

Imagine you’re overseeing the construction of a bridge. In traditional reinforcement learning, your AI assistant is like a worker who learns by trial and error, focusing on small tasks: laying a brick, or setting a beam. Each task completed earns a ‘reward’. However, if the bridge design changes, the worker has to relearn these tasks in the new context.

Now, envision Q* at work. Instead of just learning tasks, the AI will learn to understand the concept of bridge-building. It anticipates future steps and adapts. If the design changes, the AI doesn’t start from zero. It already knows the principles of construction and adapts to the new design, efficiently continuing its work.

This is the power of Q*: learning general principles, not just specific tasks.

Q* could mark a significant leap in AI learning getting us much closer to Artificial Generative Intelligence. It’s about strategic, long-term thinking, not just short-term gains. This approach can rapidly improve efficiency and adaptability in various sectors and have a huge impact on several industries.

But with great power comes great responsibility. The former OpenAI board’s concern could have been about potential misuse or just the fact that we are on the edge of developing technology so fast, that we don’t have the time to consider the consequences. Just imagine AI with strategic long-term planning in the wrong hands…

It’s a call for ethical guidelines and responsible usage which might have led to them reaching out to Anthropic about a potential merger since they are known for having a more cautious approach to AI.

Whether this was the real reason behind the lay-off of Altman, we might never know, but Q* could be a paradigm shift in AI learning, promising to revolutionize how we solve complex problems.

Was OpenAI’s newest discovery: Q* the reason for firing Sam Altman?

So, What is Q?*

Enter Q* based reinforcement learning.

Written by Tim Frank Andersen