It’s like a web view, but native

I think modern operating systems should support something I call the outerframe. An outerframe is like a web view, but it runs compiled machine code and uses the underlying operating system’s APIs to create UI.

A fundamental reason web views are useful is that they let the user experience be driven by something external, rather than confining the app to only use rigid built-in native app logic. The web view serves as a safeguard, preventing the external code from doing something dumb or nefarious.

The problem with the conventional web is its inefficiency, sacrificing the quality of the final product to make life easier for developers. For example, in my benchmark last year, web apps had a ~6x reduction in efficiency compared to their equivalent outerframe app. Up until recently, these costs have actually been fine, because high-end phones and laptops have been getting more powerful without a clear use case for all that power. But now there’s a renewed need for efficiency, both on high-end laptops and low-power devices. At the high end, there is a race to build useful local AI into the device, and this local AI can use every transistor and watt that we can free up. Meanwhile at the low end there’s a race to build a compelling AR / VR headset, which could conceivably move a lot of our work out of desktops and laptops and into the physical space of our rooms. It’s time for conventional software to get out of the way and fit into a smaller footprint, making room for local AI and new classes of devices. And, luckily, we simultaneously have a rise of AI code generation, which makes this practical.

With this blog post, I am open-sourcing an outerframe for macOS. The outerframe is a key part of Outer Loop, which uses it for SSH-based apps. The outerframe repo also contains “Outer Frame”, a simple web browser that you can build and launch from Xcode on a Mac. Try vibe-coding your own outerframe content, or try putting an outerframe in your own app.

Top: The first outerframe app

Because the first app to use the outerframe is an SSH client, the first outerframe app that I’m shipping is a modern top app. I call it Top (with a capital ‘T’).

Here’s a video walkthrough.

The Top backend can be run on any Linux or Mac device, and Outer Loop can install it for you with one click. See more here.

How it works

Broadly: The browser downloads a dynamically loaded library (a .dylib / .dll / .so file) and loads it into a sandboxed process. The sandboxed process renders the window contents. The browser and the sandboxed process send messages back and forth for various user interaction events and APIs.

I designed the outerframe for an ambitious possible future where it becomes part of the open web. The philosophy of the web would shift to being multi-platform, rather than cross-platform, while still using HTML where it makes sense. Webservers could choose to serve multiple implementations of their web apps, one per platform. Here I’ll describe how a browser and webserver negotiate which code to run. Then I’ll describe how the macOS-specific outerframe works.

The HTTP requests and the file formats

We need a cross-platform protocol that hands off to platform-specific behavior. Here’s the current spec for browsers and web views supporting the outerframe.

On navigation, the browser sends the additional HTTP header Outerframe-Accept: application/vnd.outerframe. (Ideally it would instead do this in the Accept header, but Apple’s web view doesn’t expose a way to modify Accept on navigation requests, so I chose to add a new Outerframe-Accept header.) If a server wants to respond with outerframe content, its response HTTP headers should include Content-Type: application/vnd.outerframe. When the browser sees this header, it parses the response body as:

byte 0..3     The ASCII string "OUTR", as a "magic" sanity-check
byte 4..7     UInt32 little-endian format version, currently 1
bytes 8..15   UInt64 little-endian offset to beginning of path
bytes 16..23  UInt64 little-endian length of path
bytes 24..31  UInt64 little-endian offset to beginning of data
bytes 32..39  UInt64 little-endian length of data
remaining     Variable length region, where path and data live

This is the outerframe’s analog of an “.html” file, and I call it the “.outer” file. Its contents are simply a format version, a path, and a data blob. Right up front, you see how this philosophy is different from the conventional web. This is not just plaintext, instead it’s an extremely-fast-to-parse binary format. You’ll generate this blob programmatically (e.g. using a simple Python script), or edit it with a hex editor. Making this format binary is planting a cultural flag: we put users, not developers, first. HTML already exists, so this platform gets to focus on the other extreme.

The browser parses the path and appends a platform string, for example this macOS implementation appends “/macos-arm” (or “/macos-x86” on Intel Macs). Other implementations might append strings like “/windows-x86” or “/linux-wayland-arm”. We then fetch that path to get the compiled code blob. The remaining data is an opaque data blob intended for the compiled code to interpret. (In the “Top” app, I don’t use this blob, but document-based outerframe apps use it heavily.)

This is where we move to the platform-specific part. The implementation gets to choose its own format for this compiled code blob. For this macOS implementation, I made it as Apple-native as possible; the “compiled code” is a “.bundle.aar” file, i.e. a compressed Apple Archive containing a loadable NSBundle. The outerframe code extracts this archive and a heavily sandboxed non-UI process “OuterframeContent” loads the bundle and asks it to start rendering.

For this particular “Top” app, this compressed compiled code download is 356 KB, so it’s fairly small compared to typical web apps. (It’s written in Swift. Anecdotally I’ve found with another app that the Objective-C version is ~100 KB smaller than the Swift version, but I haven’t studied this further.)

Rendering

The philosophy of the outerframe is to lean on the operating system’s UI APIs, but it’s important to note that not all UI frameworks are compatible with rendering in a sandboxed separate process. Neither SwiftUI nor AppKit’s NSViews can be used. Outerframe content still uses the underlying operating system, but it must use lower level primitives.

In my original implementation, the outerframe content code received a framebuffer (a.k.a. graphics context or IOSurface), and populated it. To use an analogy from the current web, this is similar to making the page a single <canvas/> and rendering everything with WebGL.

That approach is perfectly valid, but on macOS your potential is limited if you use a framebuffer. The Firefox blog discusses this. One fundamental issue is that you can’t tell the macOS compositor (WindowServer) about small updates, you can only invalidate the entire framebuffer.

A core UI component of Apple’s platforms is the CALayer, a higher-level abstraction that gives you access to framebuffers when you want them. CALayers form a tree/hierarchy. By embracing the CALayer, you solve the problem above, plus you get a bunch of animation functionality that runs completely in the operating system’s compositor, not even waking up your threads. For example, in the Top video above, when the process rows animate up and down, Outer Loop and OuterframeContent don’t need to do frame-by-frame updates; the threads simply sleep through the entire animation. And, of course, embracing this higher level primitive gives us a lot more functionality for free, which is convenient. There are a few kinds of CALayer, including a CAMetalLayer if you want to use Metal GPU shaders.

To use a remote-rendered CALayer, the outerframe uses the CALayerHost, a private API. I hope Apple someday makes a public API analogous to CALayerHost, as it is vital to having efficient sandboxed UI (Safari and Chrome also use it). For the reasons above, the public API IOSurface is less efficient and much less powerful.

So the macOS outerframe rendering model is: your binary registers a CALayer (which is typically the root of a tree of CALayers) and keeps it up-to-date. You can read more about the binary interface here.

Events, operating system UI

Of course, the rendering layer is just part of an app platform. Other parts include mouse events, keyboard input, interacting with the IME (e.g. the operating system’s overlay UI that converts Pinyin to Chinese characters), text editing hotkeys, accessibility, native blinking carets, Dark Mode, context menus, copy/paste, and so on. For each of these, the Outerframe mirrors the underlying operating system, passing events from the browser down to the content, and letting the content code send messages back. All of the above is implemented, but there are more things in this category that are not implemented yet. The current set of messages is documented here.

This is an area where Apple would be able to create a nicer API than I can, since they could modify the operating system.

Networking

The outerframe content code has access to macOS’s built-in networking APIs, but it is sandboxed to have no direct access to the network. Instead, it connects through a local proxy provided by the outerframe. That proxy enforces same-origin policies, so the content doesn’t have unfettered access to the network. It also gives the hosting app the ability to connect to other things. For example, Outer Loop uses this proxy to let the outerframe connect to servers over SSH, and to connect to local Unix socket files.

Vibe-coding already works

A challenge with launching new platforms nowadays is that you might have to wait for the large language models to pre-train on example code for that platform, which could take years. Conveniently, coding agents are already pretty good at writing for this platform, because the platform is macOS.

There are example projects for creating outerframe content in Swift or in C. Feel free to clone one of these, rename it, and ask your favorite coding agent to build something.

Not compatible with phones or headsets

There’s an elephant in the room. The outerframe is a coherent idea on macOS, Windows, and Linux. But for phone-like operating systems like iOS, Android, and headset operating systems like visionOS, app stores won’t allow apps that load external machine code. So we’re in an ironic situation where an eventual iPhone or Vision Pro version of Outer Loop will have to use a conventional web view to run “Top”, while the MacBook Pro will run the lightweight efficient native version.

I think it’s time to rethink these policies. Compiled machine code is not obviously dangerous, as long as it is sandboxed and only has “safe” APIs available to it.

Maybe the scariest thing about running untrusted machine code is side-channel attacks, like the classic Spectre and Meltdown, and the more recent proof-of-concept Apple M1 GoFetch attack. It’s questionable whether side-channel attacks are prevented by WASM / JavaScript JITs, but let’s suppose they are. Side-channel attacks broadly occur because, for a long time, it has been the job of chip designers to take mediocre code and make it run fast. As AI code generation gets better, I think chip designers can finally plan for a world where code is written with the hardware in mind, and it will actually become faster to use simpler chips with fewer performance hacks. (This is analogous to how GPUs are much simpler than CPUs, because they rely on the code to be better.) I’m betting that side-channel attacks can become a thing of the past, without sacrificing performance, but I confess that this is just a guess.

Apple could still choose to implement restrictions. They could require outerframe content on the iPhone or Vision Pro to be notarized, the same way they require non-App-Store macOS apps to be notarized. Then they would have the ability to instantly block everything from developers flagged as “bad”. Or they could just go all in, treating outerframe content the same way they treat JavaScript (which doesn’t need to be signed).

This isn’t about replacing the web, but letting it grow to fill new niches

I’m not saying that your favorite news website or blog should embrace machine code. The HTML web is good for that kind of thing.

The web has always been awkward as an app platform. It has succeeded anyway, because the browser is such a natural place for certain experiences, but I think not being a first-class app platform has prevented useful tools from arising. For example, with Outer Loop, I’m trying to use the web as the UI layer for remote servers and edge devices, and using actual native binaries makes the experience much better (like with Top, in the video above). With the HTML web, I don’t think I could get people to switch from command line apps to modern GUIs, but with native outerframe, I think I might.

I think some existing web experiences should move to native web apps, but the bigger growth area would be a set of new experiences/apps. The rise of AI brings two broad areas for this growth: the first is conventional software, now that so many more people can build stuff, and the second is dynamic generative UI, with AI creating experiences in real time. Combine all of this with the race to build a compelling AR / VR headset, in addition to the race to repurpose local compute for local LLMs, and I think machine code web content becomes the natural next step.

Maybe native apps will become web-like before the web becomes native

I like the idea of the open web embracing native web apps. It would change the focus of the web from being lowest-common-denominator cross-platform code to being heterogeneous and multi-platform. Each platform would then be free to innovate. Webservers could choose to serve N different native apps to N different platforms, with one of those platforms being the conventional HTML web. All of this would become especially practical because AI could translate code to other platforms and test it.

An alternate path is that a set of native apps will adopt a dynamic browser-like architecture, and the “native web” ends up existing in a collection of apps using outerframes, or something similar.

Either path sounds good to me.

FAQ

Why not WebAssembly? WebAssembly (WASM) is cool, but it’s still a lowest-common-denominator virtual machine, and it just improves the JavaScript part of “the web”. You still have to combine it with one of the web’s UI models (either treating an app as a “document” via the DOM, or using GPU shaders on WebGL / WebGPU). With the outerframe, you have full control over how your code renders different experiences to the screen. This path certainly has greater potential, and my hunch is that the payoff is big enough to be worth it. With the outerframe, you get to really take ownership of your actual assembly code. It’s a lot of fun. Broadly, I think a homogeneous platform will hold us back, and a heterogeneous world where each platform gets to innovate will lead to better things.

Is this just ActiveX all over again? It is similar to ActiveX, but with two important differences. First, the outerframe embraces heavy sandboxing, so the security concerns are greatly reduced compared to early ActiveX. Sandboxing also means we don’t need the “Do you want to allow this ActiveX control?” UI friction. In fact, ActiveX gradually approached an outerframe-like security model from 2007 - 2012, culminating in IE’s eventual “Enhanced Protected Mode” running ActiveX in a true sandbox, but by that point ActiveX was on its way out, because the web community had chosen HTML5. Second, with AI code generation, the world has changed around us. It is becoming practical to ship different native binaries to different platforms without a lock-in effect. Much of the web community hated ActiveX, even after its security issues were arguably solved, but I think they could come around to this. So I guess I am saying, “ActiveX, circa 2012, actually might be a good idea in 2026.”

(Thanks to Rosanne Liu for reading drafts of this post.)