The web could use machine code

Think of all the client-side code that runs on your devices. Most technical people would say that it falls into two categories:

Native apps, which are written for a specific platform and compiled to machine code.
The web, which is written in cross-platform interpreted code.

This mental model is a misconception. These categories are real, but nothing I mentioned about them is fundamental. The actual thing that distinguishes between these categories is whether the code can render a top-level window. Native apps can do this, while the web is confined to a “visually sandboxed” area. All those other details are orthogonal.

Web-like platforms will remain relevant for the foreseeable future, because we will always need a safe space to run other people’s code. Curated app stores are good for certain classes of code, like apps or games, but they aren’t appropriate for other use cases like scientific visualizations.

Today, a core feature of the web is that it lets you “write once, run everywhere”, but this could change. As AI code generation becomes cheaper and better, future web-like platforms could be based on actual machine code, compiled from any language. Each visualization would include compiled code for multiple device platforms. This would free us from the current “lowest common denominator” web, and instead would let visualizations use the full power of the device, which will be essential for making the leap to low-power devices like AR glasses.

Rather than waiting for the web to change, today we can create new analogs of the web view that support machine code. I call these outerframes. They could show up in lots of apps, new and old:

Jupyter, with a box showing that the generated figures would display in this web-like-view

Claude, with a box showing that the generated figures would display in this web-like-view

ChatGPT, with a box showing that the generated figures would display in this web-like-view

An actual custom app that I've built for machine learning model comparison, using machine code

Four example apps that could use outerframes. The blue dotted line shows which part of the app could be an outerframe. The first two apps are Jupyter and Claude, which use the web. The third is ChatGPT, which is native and uses only static visualizations. The fourth is an actual app I've built using outerframes.

An outerframe is a safe space for running untrusted code within an app. As you would expect, the untrusted code is sandboxed in its own process, unable to access your data. Importantly, the untrusted code is also restricted from taking over the screen, and instead is “visually sandboxed” by the app, similar to how a website is visually sandboxed to a browser frame. Unlike in a conventional web browser, this code can include any compiled machine code, targeting both the CPU and the GPU, with full support for threading. Thus, the outerframe eliminates the “lowest common denominator” overhead of the web.

I built an outerframe for macOS. To demo it, I built a toy browser that, instead of using HTML and JavaScript, uses a cross-platform binary file format as its “page” and platform-specific machine code as its “script”.

This app is open-source, you can build and run it yourself.

In this post, I try to quantify the efficiency gains of moving to a machine code web on today’s devices.

Benchmark

In a previous post, I used a set of pragmatic model visualizations:

See old post for full version

When I wrote that post, I found that these visualizations were far too expensive to animate, at least when using a standard 2D canvas. Now, I use this task as my benchmark. I plot large distributions of machine learning model parameters at 120 frames per second. (120 hz is the refresh rate of modern phone and laptop screens.)

For the web, I tested using JavaScript and the browser’s GPU shader language, testing both WebGL and WebGPU. For the outerframe, I used a mix of Swift, C, and Metal Shading Language.

This benchmark plays to the web’s performance strengths in a couple of ways. First, I measure the visualization’s render loop, not its initial loading and display, where I think the outerframe will have a larger advantage. Second, this task is GPU-heavy, so a lot of the work is being done by compiled shader code, whereas in CPU-heavy tasks I think the outerframe would again have a larger advantage. For these reasons, I think it’s okay that I relied on JavaScript and didn’t test WebAssembly (Wasm), which would be more important for benchmarks of initialization or of CPU-heavy tasks.

I measure CPU utilization, GPU utilization, and macOS Activity Monitor’s “Energy Impact”. I use this Energy Impact as the measure of efficiency. (CPU utilization is a very noisy measure of efficiency, because “100% utilization” only sometimes means high power consumption. For example, when coordinating with the GPU, a CPU will often run a special “spin loop” that uses little energy but still uses the CPU at 100% for a non-negligible fraction of every frame.)

Result 1: Machine code can be more web-like than the web

The web is built on documents, and I first approached this visualization as a document. I implemented it using text with inline canvas elements. In the outerframe with machine code, this worked great.

In a conventional web browser, it was terribly inefficient.

WebGL

View in browser ↗

WebGPU

View in browser ↗

Platform	Frames per second	CPU	GPU	Energy Impact (relative)
Firefox (WebGL)	5	67%	10%	1550 (86x)
Chrome (WebGL)	61	101%	37%	1610 (90x)
Safari (WebGL)	120	76%	56%	1005 (56x)
Safari (WebGPU)	120	55%	16%	710 (39x)
Chrome (WebGPU)	120	55%	20%	435 (24x)
outerframe (macOS)	120	16%	14%	18 (baseline)

Rendering on the web consumed roughly 24x more power, at best, than rendering with machine code.

The issue here is that WebGL and WebGPU are not designed to be used as figures within a document. When you use these, you are supposed to use very few canvases, maybe 1-2. This was particularly true in WebGL, where using more than 16 WebGL canvases isn’t even possible, but it continues to be very expensive with WebGL’s successor, WebGPU. Meanwhile, in an outerframe I embrace the native platform, and macOS will happily let you draw to as many canvases (a.k.a. CALayers) as you want.

So my first result was: I learned that building this visualization for the web requires moving away from the natural approach, and instead using tricks like treating the canvas as a background behind the text, and carefully drawing to the correct locations. To be fair, when building native document apps, this can be a performant way of doing things, and I also see some efficiency wins when I adopt this approach in an outerframe. But it was striking how machine code and the native platform allowed me to build something in the way that felt natural and web-like, whereas the web did not.

Result 2: Machine code was about 6x more efficient

I changed to a single-canvas approach, and the web was finally able to handle it.

WebGL

View in browser ↗

WebGPU

View in browser ↗

Platform	Frames per second	CPU	GPU	Energy Impact (relative)
Safari (WebGL)	120	55%	18%	63 8x
Firefox (WebGL)	120	44%	15%	53 7x
Chrome (WebGL)	120	52%	22%	57 7x
Safari (WebGPU)	120	35%	13%	49 6x
Chrome (WebGPU)	120	51%	13%	47 6x
outerframe	120	13%	13%	8 (baseline)

With this approach, the outerframe was 6x more efficient than the the web. The outerframe version improved on its Result 1 numbers because rendering one canvas saves us from lots of copying, and macOS likes it when you draw to one CALayer, rather than drawing to 20 CALayers.

I suspect Results 1 and 2 are a representative story. The story is: there’s a set of scenarios that the web does a pretty good job of supporting, and sometimes you need to tweak your design to fit within that set of scenarios. But even then, it’s about 6x less efficient. (With error bars, of course. The actual number might be 3x, or it might be 10x.)

Result 3: The operating system’s native text view is not good at fast dynamic text updates

From that same previous blog post, here’s a variant of the visualization which plots parameters in one dimension:

See old post for full version

In that visualization, the min / max number labels are actual text in the page, not text rendered into the canvas. When I implemented this visualization in an outerframe, I used the operating system’s native text view and I batch-updated each subrange of the text in the text view, rather than rendering it myself. I found something disappointing: the performance was abysmal. In fact, even just this is terrible:

Platform	CPU utilization	GPU utilization	Energy Impact
outerframe	60%	0%	1550

Not surprisingly, the operating system’s text view was not designed for dynamically updating text many times per second. I’m sure this is one reason Safari’s WebKit has a different text stack.

So, I moved the text into the canvas, rendering it myself. With this change, the text view’s text is now static, with all changes occurring inside of the canvas. Now it’s much better:

Platform	CPU utilization	GPU utilization	Energy Impact
outerframe	21%	5%	17

Takeaway: If you’re going to create an outerframe, you probably shouldn’t base it on the operating system’s native text view.

How outerframes work

The point of the outerframe (and the point of the web) is to enable playful sharing. To support this, outerframes must perform “visual sandboxing”. Without that, a rogue actor could take over your whole screen, show you a pixel-perfect replica of any app, and trick you into doing anything.

There is no single correct way to implement an outerframe. There are two natural ways of building it:

Create a viewer for a data structure. Let the untrusted code modify that data structure, while your trusted viewer displays it.
Let the untrusted code write directly to a framebuffer / canvas / graphics context that the parent app controls.

The web uses both of these approaches, using the DOM and canvas elements, respectively.

This blog post’s demo outerframe is very web-like, in that it offers untrusted code analogous ways to display stuff to the user:

Create and update a rich text “attributed string” document, which the app displays.
Render to a “canvas” that is inline in the document, editing pixels directly.

Using this trusted document viewer approach, I got basic functionality like text selection for free. I hosted it in Apple’s native text view, using text attachments to host the canvases. You could imagine hosting it in another text stack, e.g. Zed’s GPUI, or a custom stack.

Since the native text stack is inefficient at dynamic text updates, I might simplify the interface and make the outerframe hold nothing but one big canvas. For outerframe content that wants to be document-like, it can implement the document affordances itself, via the canvas. This means implementing basic features like text selection, but it might not be a crazy idea, especially when you consider that it could become a reusable component, analogous to how JavaScript libraries are reused today.

Operating system vendors could choose to provide their own frameworks for hosting “visually sandboxed” UI in background processes. That would be great.

Classically, browsers allowed machine code plugins to write to canvas-like elements using a scheme called NPAPI. Browsers moved away from that, with the Chrome team describing it as a “90s-era architecture”. I like that. Now that we have LLM code generation, I think a few 90s-era architectures are worth trying again. This time around, from the user’s perspective it wouldn’t be a plugin model, but would be much lower friction, like JavaScript.

The whole point here is to lean in and embrace the machine, so I’m not trying to provide any standard interface. The outerframe is the opposite of a platform; I’m trying to remove layers, not add them. The platform is the platform.

How would an “outerdoc” work?

The outerframe solves the problem of running untrusted code on a particular platform. We still need a good shareable document file format. How can we write documents, presentations, blog posts, etc., that can be viewed from any device, while using the full power of each device?

I designed a format that enables this, but I’m still working on it. Two useful tricks:

Let documents provide preference-ordered implementations. This would be similar to how the HTML <video> element lets you provide preference-ordered <source> tags. Documents would provide a list of platform-implementation pairs, and they would typically include a HTML/JavaScript implementation as a fallback.
Provide “content types” like com.probablymarcus.ScalarDistributionList, letting viewer apps provide implementations for any content types that they want, either directly or via extensions. This would let apps provide first-party “trusted” implementations of certain content types, which would be useful if you want to bridge to the operating system’s native UI controls (e.g. NSTableView on macOS), and it could enable native apps to render a limited set of outerframe visualizations on platforms like iOS that don’t currently allow unvetted machine code.

Rebalancing trade-offs

With AI code generation, I think table stakes are changing. The “optimal way to build something” is always a function of development cost, and we’re experiencing a step change downward in development cost. The “write once, run everywhere” web made sense in the previous era, but now it seems extravagant.

Chip designers and chip manufacturers are pushing against the laws of physics to eke out every last bit of efficiency, and there’s a wave of new low-power devices that are within a couple years of being viable. A machine code web would give us an immediate ~6x software efficiency boost, and that’s on today’s hardware that has compromised its design to be good at JavaScript. Larger gains would come when hardware designers are freed from this constraint. I doubt we’re going to leave all of these gains on the table.

There is a good path forward for a machine code web. Today, anyone can design their own outerframe and use it in their apps. If you do this, you won’t be able to publish your app in most app stores, since they don’t allow untrusted machine code. But that’s okay! On those platforms, just fall back to a conventional web view. The experience won’t be as good, but that’s the platform’s choice, and they will have to compete with other platforms that do have outerframes. They’ll come around eventually.

(Thanks to Rosanne Liu, Jason Yosinski, Adam Zethraeus, and Mirko Klukas for reading drafts of this post.)