We Want to Make Advanced Technology Available To Free Users: Sam Altman Speculations of the San Francisco-based company introducing GPT 5 this summer are strong. Meanwhile it quietly released GPT-4o
You're reading Entrepreneur India, an international franchise of Entrepreneur Media.
Over the 1.5 years, OpenAI has not just launched a one-hit wonder. With GPT-4, DALL·E 3, and Sora, it is pushing its and the technology's limits to achieve something that currently exists just in theory.
GPT 4, a multimodal system that accepts both image and text to provide textual outputs, is the largest language model with one trillion parameters. Speculations of the San Francisco-based company introducing GPT 5 this summer are strong.
"We take our time on releases of major models. It will be great when we do it and I think we may release it in a different way than previous models. I don't even know if we will call it GPT-5" said Sam Altman, CEO, OpenAI during an All-in podcast. Being deemed as a 'state-of-the-art language model' GPT 5 will make users feel they are communicating with a person rather than a machine.
In April, OpenAI made GPT 4 available to all paying API customers. But Altman wants to change that.
"One of the things we want to do is figure out how to make more advanced technology available to free users. That's a super important part," he shares.
"It makes me sad that we have not figured out how to make GPT 4-level technology available to free users."
Why hasn't that been done yet? It's pretty expensive.
On OpenAI's business decision to keep things open-sourced, he shares that speed and cost have been important to them. While he doesn't put a timestamp, he is confident they'll be able to.
For the uninitiated, open source and closed source in coding refers to the availability of code to the public. The former is readily available to the general public, while the latter is only available to a restricted audience as a piece of classified information.
Altman is keen and looking forward to an open-source model which runs on phones, "That seems like a really important thing to do at some point in time." On being asked if we can expect something similar for the team at OpenAI, he earnestly says "I don't know if we will or someone will." Can Meta's LLM Llama be a possibility? "That should be fitable on a phone. But I haven't played with it. I don't know if it is good enough to do the thing here."
Only an inventor appreciates an inventor. Altman shares his love and appreciation for Apple, "iPhone is the greatest technology humanity has ever made."
Like Tony Stark created and was aided by his AI system JARVIS, how will advanced agents change the way we interface with apps? Today, the interface is of utmost value. In August 2023, food delivery platform DoorDash introduced voice ordering solutions. This cost-efficient innovation was adopted to increase sales while providing an excellent end-to-end customer experience. "Customers expect more from restaurateurs, and in return, restaurateurs expect even more technology-forward solutions from us – including support for phone channels to meet customers where they're ordering," said Rajat Shroff, Head of Product and Design, DoorDash during the announcement.
Notably, Apple's Siri can order an Uber for you. This was developed back in 2016.
While AI assistance for ordering is an impressive step, Altman cannot imagine a world where one can ask 'Hey ChatGPT, order me Sushi." The visual interface is good for a lot of things, but "It's hard to imagine a world where you never look at the screen and just use voice mode only."
"There is something about designing a world that is usable equally by humans and AIs. That's why I am more excited about humanoids as compared to robots (in other shapes). The world is designed for humans and we should keep it that way," he concludes.
While we wait for GPT 5, OpenAI quietly launched GPT-4o ("o" for "omni"), the latest version can accept any combination of text, audio, and image and generate any combination of text, audio, and image outputs. In its official blog, the company shares that its response time to audio inputs is as little as 232 milliseconds, with an average of 320 milliseconds. This figure is similar to an average human response time in a conversation.
For language tokenization, 20 languages were chosen as representatives across different language families including Gujarati, Telugu, Tamil, Urdu, Hindi, and Marathi. GPT-4o's text and image capabilities will be available in ChatGPT.
As a step towards his free advanced technology vision, they "are making GPT-4o available in the free tier, and to Plus users with up to 5x higher message limits."
It will be rolling out a new version of Voice Mode with GPT-4o in alpha within ChatGPT Plus in the coming weeks.
"Talking to a computer has never felt natural for me; now it does. As we add (optional) personalization, access to your information, the ability to take actions on your behalf, and more, I can see an exciting future where we can use computers to do much more than ever before," Altman posted on his personal blog.