Nov 15, 2023
Vision I
Vision I introduced multimodal input and feedback.
Our goal is to let language models control computers.
But computers weren't designed to be operated from your terminal. They were designed to be operated via a keyboard, mouse, and bitmap display.
They were designed visually.
So, in addition to "interfaces" like Python and Shell, we need to build a multimodal OS interface into the computer that LLMs can operate.
Where do we start?
Vision I introduces visual inputs, which let Open Interpreter design components from a sketch/screenshot.
We also introduced visual feedback, which feeds visual outputs (HTML/CSS designs, charts from Python, etc) back into the model so it can iterate on its code.
Subscribe to future changes
Get notified when we release new features.