OpenAI o1: The future for Advanced Reasoning in AI

OpenAI o1 Hub | OpenAI

2-3 min read •

The power of AI in our society continues to grow by the day. Yet, we never would have even thought that something as simple as Chat GPT (which was once a technological marvel) would reach the point of being able to properly think, reason, and provide insight to the level of a PhD student. This is the power of OpenAI o1, the newest and seemingly most powerful AI that has been released to this date.

What is OpenAI o1?

OpenAI o1 is a new series of models, that, unlike prior models, spend more time thinking of a response than actually outputting one. If you go ahead and ask ChatGPT or GPT-4 any questions, within seconds or even milliseconds, you'll get a continuously generating response. The new o1 series, on the other hand, will process and analyze the complexities of your problem before it returns a fully thought out response. In this series are o1-preview, and o1-mini. The preview version is the powerful, deep-reasoning mastermind, and the mini model is a smaller, faster version that excels in coding and debugging.

Reasoning Development

While other text generation models use typical text tokens, o1 models uses reasoning tokens, that actually think through the problem. Whenever you ask an average text generation model anything, you'll either get an answer that's scraped from the internet into a summarized form, or something that comes from the model's training data. But they can't properly think. This is not the case for o1. From OpenAI's o1 research, "In a qualifying exam for the International Mathematics Olympiad (IMO), GPT-4o correctly solved only 13% of problems, while the reasoning model scored 83%." Separate from the IMO, here's a problem that was asked to both GPT-4 and o1:

-------------------------------------------------------------------------------------------

oyfjdnisdr rtqwainr acxz mynzbhhx -> Think step by step

Use the example above to decode:

oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz

-------------------------------------------------------------------------------------------

In this example, a series of seemingly random characters was assigned a decoding format, unknown to the AI's. GPT-4 couldn't solve the problem and asked for more context, but this was no challenge for OpenAI's o1. It found the solution:

For every 2 characters in the encoded text (the seemingly random letters), average the positions of the letters in the alphabet, and that new number corresponds to the decoded letter. For example, in "oyfjdnisdr rtqwainr acxz mynzbhhx", "O" has a position of 15 in the 1-26 alphabet range, and "Y" has a position of 25. Averaged, they result a position of 20, that is "T". Repeat for every 2 characters.

That decodes "oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz" to "THERE ARE THREE R'S IN STRAWBERRY".

The AI also showed its thinking process throughout, and where it got stuck in the process. That part is too long to show in this blog. Yet, from this we can truly see the power of reasoning that this AI holds. Most people wouldn't be able to make sense of what the AI was asked.

Limitations of o1

The o1 series, despite being one of the strongest AI's made by OpenAI, has some limitations. Firstly, they cannot support image inputs, system prompts (customization prompts that guide how the model should respond to user input), and batch calls. The models can also take up to minutes to generate a response, given the complexity of the prompt. As of now, the API calls are also restricted to those with significant amounts of credits, and the reasoning tokens that I previously discussed are hidden to the user, but they still count toward the billing.

Who can access o1?

Currently, only those with the paid Chat GPT plan ($20/month) can access both o1-preview and o1-mini. While this feature may soon open to free users, as of today, September 22, 2024, you must be on the paid plan to access. Paid users have a maximum of 50 queries per week for the preview and mini model separately. If your work revolves around complex reasoning and logic handling, be sure to look into the plan.

Sources:

Source 1

Source 2

Search This Blog