Nov 15, 2023
Our goal is to let language models control computers.
But computers weren't designed to be operated from your terminal. They were designed to be operated via a keyboard, mouse, and bitmap display.
They were designed visually.
So, in addition to "interfaces" like Python and Shell, we need to build a multimodal OS interface into the computer that LLMs can operate.
Where do we start?
Vision I introduces visual inputs, which let Open Interpreter design components from a sketch/screenshot.
We also introduced visual feedback, which feeds visual outputs (HTML/CSS designs, charts from Python, etc) back into the model so it can iterate on its code.
Subscribe to future changes
Get notified when we release new features.