<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Marcus Lewis</title>
    <description>Marcus Lewis</description>
    <link>https://probablymarcus.com/</link>
    <atom:link href="https://probablymarcus.com/feed.xml" rel="self" type="application/rss+xml"/>
    <pubDate>Sun, 10 May 2026 09:35:06 -0700</pubDate>
    <lastBuildDate>Sun, 10 May 2026 09:35:06 -0700</lastBuildDate>
    <generator>Jekyll v4.3.2</generator>
    
      <item>
        <title>It&apos;s like a web view, but native</title>
        <description>&lt;style&gt;
article &gt; h1, article &gt; h2, article &gt; h3, article &gt; h4 {
  margin-top:30px;
}

pre, code {
  font-size: 13px;
  border: 1px solid #e8e8e8;
  color: #900;
  border-radius: 3px;
  background-color: inherit;
}
&lt;/style&gt;

&lt;p&gt;I think modern operating systems should support something I call the &lt;em&gt;outerframe&lt;/em&gt;. An outerframe is like a web view, but it runs compiled machine code and uses the underlying operating system’s APIs to create UI.&lt;/p&gt;

&lt;p&gt;A fundamental reason web views are useful is that they let the user experience be driven by something external, rather than confining the app to only use rigid built-in native app logic. The web view serves as a safeguard, preventing the external code from doing something dumb or nefarious.&lt;/p&gt;

&lt;p&gt;The problem with the conventional web is its inefficiency, sacrificing the quality of the final product to make life easier for developers. For example, in my &lt;a href=&quot;/blocks/2025/06/08/the-web-could-use-machine-code.html&quot;&gt;benchmark&lt;/a&gt; last year, web apps had a ~6x reduction in efficiency compared to their equivalent outerframe app. Up until recently, these costs have actually been fine, because high-end phones and laptops have been getting more powerful without a clear use case for all that power. But now there’s a renewed need for efficiency, both on high-end laptops and low-power devices. At the high end, there is a race to build useful local AI into the device, and this local AI can use every transistor and watt that we can free up. Meanwhile at the low end there’s a race to build a compelling AR / VR headset, which could conceivably move a lot of our work out of desktops and laptops and into the physical space of our rooms. It’s time for conventional software to get out of the way and fit into a smaller footprint, making room for local AI and new classes of devices. And, luckily, we simultaneously have a rise of AI code generation, which makes this practical.&lt;/p&gt;

&lt;p&gt;With this blog post, I am open-sourcing an &lt;a href=&quot;https://github.com/outergroup/outerframe&quot;&gt;outerframe for macOS&lt;/a&gt;. The outerframe is a key part of &lt;a href=&quot;https://outerloop.sh&quot;&gt;Outer Loop&lt;/a&gt;, which uses it for SSH-based apps. The outerframe repo also contains “Outer Frame”, a simple web browser that you can build and launch from Xcode on a Mac. Try vibe-coding your own outerframe content, or try putting an outerframe in your own app.&lt;/p&gt;

&lt;h2 id=&quot;top-the-first-outerframe-app&quot;&gt;Top: The first outerframe app&lt;/h2&gt;

&lt;p&gt;Because the first app to use the outerframe is an &lt;a href=&quot;https://outerloop.sh/&quot;&gt;SSH client&lt;/a&gt;, the first outerframe app that I’m shipping is a modern &lt;em&gt;top&lt;/em&gt; app. I call it &lt;em&gt;Top&lt;/em&gt; (with a capital ‘T’).&lt;/p&gt;

&lt;p&gt;Here’s a video walkthrough.&lt;/p&gt;

&lt;video src=&quot;/images/2026-04-27-Top-and-outerframe.mp4&quot; poster=&quot;/images/2026-04-27-Top-and-outerframe.png&quot; controls=&quot;&quot; playsinline=&quot;&quot; style=&quot;width: 100%; margin-bottom:20px;&quot; preload=&quot;metadata&quot;&gt;&lt;/video&gt;

&lt;p&gt;The Top backend can be run on any Linux or Mac device, and Outer Loop can install it for you with one click. See more &lt;a href=&quot;https://github.com/outergroup/top&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;how-it-works&quot;&gt;How it works&lt;/h2&gt;

&lt;p&gt;Broadly: The browser downloads a dynamically loaded library (a .dylib / .dll / .so file) and loads it into a sandboxed process. The sandboxed process renders the window contents. The browser and the sandboxed process send messages back and forth for various user interaction events and APIs.&lt;/p&gt;

&lt;p&gt;I designed the outerframe for an ambitious possible future where it becomes part of the open web. The philosophy of the web would shift to being multi-platform, rather than cross-platform, while still using HTML where it makes sense. Webservers could choose to serve multiple implementations of their web apps, one per platform. Here I’ll describe how a browser and webserver negotiate which code to run. Then I’ll describe how the macOS-specific outerframe works.&lt;/p&gt;

&lt;h4 id=&quot;the-http-requests-and-the-file-formats&quot;&gt;The HTTP requests and the file formats&lt;/h4&gt;

&lt;p&gt;We need a cross-platform protocol that hands off to platform-specific behavior. Here’s the current spec for browsers and web views supporting the outerframe.&lt;/p&gt;

&lt;p&gt;On navigation, the browser sends the additional HTTP header &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Outerframe-Accept: application/vnd.outerframe&lt;/code&gt;. (Ideally it would instead do this in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Accept&lt;/code&gt; header, but Apple’s web view doesn’t expose a way to modify &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Accept&lt;/code&gt; on navigation requests, so I chose to add a new &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Outerframe-Accept&lt;/code&gt; header.) If a server wants to respond with outerframe content, its response HTTP headers should include &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Content-Type: application/vnd.outerframe&lt;/code&gt;. When the browser sees this header, it parses the response body as:&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;byte 0..3     The ASCII string &quot;OUTR&quot;, as a &quot;magic&quot; sanity-check
byte 4..7     UInt32 little-endian format version, currently 1
bytes 8..15   UInt64 little-endian offset to beginning of path
bytes 16..23  UInt64 little-endian length of path
bytes 24..31  UInt64 little-endian offset to beginning of data
bytes 32..39  UInt64 little-endian length of data
remaining     Variable length region, where path and data live
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This is the outerframe’s analog of an “.html” file, and I call it the “.outer” file. Its contents are simply a format version, a path, and a data blob. Right up front, you see how this philosophy is different from the conventional web. This is not just plaintext, instead it’s an extremely-fast-to-parse binary format. You’ll generate this blob programmatically (e.g. using a simple Python script), or edit it with a hex editor. Making this format binary is planting a cultural flag: we put users, not developers, first. HTML already exists, so this platform gets to focus on the other extreme.&lt;/p&gt;

&lt;p&gt;The browser parses the path and appends a platform string, for example this macOS implementation appends “/macos-arm” (or “/macos-x86” on Intel Macs). Other implementations might append strings like “/windows-x86” or “/linux-wayland-arm”. We then fetch that path to get the compiled code blob. The remaining data is an opaque data blob intended for the compiled code to interpret. (In the “Top” app, I don’t use this blob, but document-based outerframe apps use it heavily.)&lt;/p&gt;

&lt;p&gt;This is where we move to the platform-specific part. The implementation gets to choose its own format for this compiled code blob. For this macOS implementation, I made it as Apple-native as possible; the “compiled code” is a “.bundle.aar” file, i.e. a compressed Apple Archive containing a loadable NSBundle. The outerframe code extracts this archive and a heavily sandboxed non-UI process “OuterframeContent” loads the bundle and asks it to start rendering.&lt;/p&gt;

&lt;p&gt;For this particular “Top” app, this compressed compiled code download is 356 KB, so it’s fairly small compared to typical web apps. (It’s written in Swift. Anecdotally I’ve found with &lt;a href=&quot;https://github.com/outergroup/outerframe-cookbook&quot;&gt;another app&lt;/a&gt; that the Objective-C version is ~100 KB smaller than the Swift version, but I haven’t studied this further.)&lt;/p&gt;

&lt;h4 id=&quot;rendering&quot;&gt;Rendering&lt;/h4&gt;

&lt;p&gt;The philosophy of the outerframe is to lean on the operating system’s UI APIs, but it’s important to note that not all UI frameworks are compatible with rendering in a sandboxed separate process. Neither SwiftUI nor AppKit’s NSViews can be used. Outerframe content still uses the underlying operating system, but it must use lower level primitives.&lt;/p&gt;

&lt;p&gt;In my original implementation, the outerframe content code received a framebuffer (a.k.a. graphics context or IOSurface), and populated it. To use an analogy from the current web, this is similar to making the page a single &amp;lt;canvas/&amp;gt; and rendering everything with WebGL.&lt;/p&gt;

&lt;p&gt;That approach is perfectly valid, but on macOS your potential is limited if you use a framebuffer. The Firefox blog &lt;a href=&quot;https://mozillagfx.wordpress.com/2019/10/22/dramatically-reduced-power-usage-in-firefox-70-on-macos-with-core-animation/&quot;&gt;discusses this&lt;/a&gt;. One fundamental issue is that you can’t tell the macOS compositor (WindowServer) about small updates, you can only invalidate the entire framebuffer.&lt;/p&gt;

&lt;p&gt;A core UI component of Apple’s platforms is the &lt;a href=&quot;https://developer.apple.com/documentation/quartzcore/calayer&quot;&gt;CALayer&lt;/a&gt;, a higher-level abstraction that gives you access to framebuffers when you want them. CALayers form a tree/hierarchy. By embracing the CALayer, you solve the problem above, plus you get a bunch of animation functionality that runs completely in the operating system’s compositor, not even waking up your threads. For example, in the Top video above, when the process rows animate up and down, Outer Loop and OuterframeContent don’t need to do frame-by-frame updates; the threads simply sleep through the entire animation. And, of course, embracing this higher level primitive gives us a lot more functionality for free, which is convenient. There are a few kinds of CALayer, including a CAMetalLayer if you want to use Metal GPU shaders.&lt;/p&gt;

&lt;p&gt;To use a remote-rendered CALayer, the outerframe uses the CALayerHost, a private API. I hope Apple someday makes a public API analogous to CALayerHost, as it is vital to having efficient sandboxed UI (Safari and Chrome also use it). For the reasons above, the public API IOSurface is less efficient and much less powerful.&lt;/p&gt;

&lt;p&gt;So the macOS outerframe rendering model is: your binary registers a CALayer (which is typically the root of a tree of CALayers) and keeps it up-to-date. You can read more about the binary interface &lt;a href=&quot;https://github.com/outergroup/outerframe/blob/main/macOS/README.md&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h4 id=&quot;events-operating-system-ui&quot;&gt;Events, operating system UI&lt;/h4&gt;

&lt;p&gt;Of course, the rendering layer is just part of an app platform. Other parts include mouse events, keyboard input, interacting with the IME (e.g. the operating system’s overlay UI that converts Pinyin to Chinese characters), text editing hotkeys, accessibility, native blinking carets, Dark Mode, context menus, copy/paste, and so on. For each of these, the Outerframe mirrors the underlying operating system, passing events from the browser down to the content, and letting the content code send messages back. All of the above is implemented, but there are more things in this category that are not implemented yet. The current set of messages is documented &lt;a href=&quot;https://github.com/outergroup/outerframe/blob/main/macOS/outerframe-socket-messages.md&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This is an area where Apple would be able to create a nicer API than I can, since they could modify the operating system.&lt;/p&gt;

&lt;h4 id=&quot;networking&quot;&gt;Networking&lt;/h4&gt;

&lt;p&gt;The outerframe content code has access to macOS’s built-in networking APIs, but it is sandboxed to have no direct access to the network. Instead, it connects through a local proxy provided by the outerframe. That proxy enforces same-origin policies, so the content doesn’t have unfettered access to the network. It also gives the hosting app the ability to connect to other things. For example, Outer Loop uses this proxy to let the outerframe connect to servers over SSH, and to connect to local Unix socket files.&lt;/p&gt;

&lt;h2 id=&quot;vibe-coding-already-works&quot;&gt;Vibe-coding already works&lt;/h2&gt;

&lt;p&gt;A challenge with launching new platforms nowadays is that you might have to wait for the large language models to pre-train on example code for that platform, which could take years. Conveniently, coding agents are already pretty good at writing for this platform, because the platform &lt;em&gt;is&lt;/em&gt; macOS.&lt;/p&gt;

&lt;p&gt;There are example projects for creating outerframe content in &lt;a href=&quot;https://github.com/outergroup/hello-outerframe-macOS-swift&quot;&gt;Swift&lt;/a&gt; or in &lt;a href=&quot;https://github.com/outergroup/hello-outerframe-macOS-objc&quot;&gt;C&lt;/a&gt;. Feel free to clone one of these, rename it, and ask your favorite coding agent to build something.&lt;/p&gt;

&lt;h2 id=&quot;not-compatible-with-phones-or-headsets&quot;&gt;Not compatible with phones or headsets&lt;/h2&gt;

&lt;p&gt;There’s an elephant in the room. The outerframe is a coherent idea on macOS, Windows, and Linux. But for phone-like operating systems like iOS, Android, and headset operating systems like visionOS, app stores won’t allow apps that load external machine code. So we’re in an ironic situation where an eventual iPhone or Vision Pro version of Outer Loop will have to use a conventional web view to run “Top”, while the MacBook Pro will run the lightweight efficient native version.&lt;/p&gt;

&lt;p&gt;I think it’s time to rethink these policies. Compiled machine code is not obviously dangerous, as long as it is sandboxed and only has “safe” APIs available to it.&lt;/p&gt;

&lt;p&gt;Maybe the scariest thing about running untrusted machine code is side-channel attacks, like the classic Spectre and Meltdown, and the more recent proof-of-concept Apple M1 &lt;a href=&quot;https://www.computerenhance.com/p/the-apple-m-series-gofetch-attack&quot;&gt;GoFetch&lt;/a&gt; attack. It’s questionable whether side-channel attacks are prevented by WASM / JavaScript JITs, but let’s suppose they are. Side-channel attacks broadly occur because, for a long time, it has been the job of chip designers to take mediocre code and make it run fast. As AI code generation gets better, I think chip designers can finally plan for a world where code is written with the hardware in mind, and it will actually become faster to use simpler chips with fewer performance hacks. (This is analogous to how GPUs are much simpler than CPUs, because they rely on the code to be better.) I’m betting that side-channel attacks can become a thing of the past, without sacrificing performance, but I confess that this is just a guess.&lt;/p&gt;

&lt;p&gt;Apple could still choose to implement restrictions. They could require outerframe content on the iPhone or Vision Pro to be notarized, the same way they require non-App-Store macOS apps to be notarized. Then they would have the ability to instantly block everything from developers flagged as “bad”. Or they could just go all in, treating outerframe content the same way they treat JavaScript (which doesn’t need to be signed).&lt;/p&gt;

&lt;h2 id=&quot;this-isnt-about-replacing-the-web-but-letting-it-grow-to-fill-new-niches&quot;&gt;This isn’t about replacing the web, but letting it grow to fill new niches&lt;/h2&gt;

&lt;p&gt;I’m not saying that your favorite news website or blog should embrace machine code. The HTML web is good for that kind of thing.&lt;/p&gt;

&lt;p&gt;The web has always been awkward as an app platform. It has succeeded anyway, because the browser is such a natural place for certain experiences, but I think not being a first-class app platform has prevented useful tools from arising. For example, with Outer Loop, I’m trying to use the web as the UI layer for remote servers and edge devices, and using actual native binaries makes the experience much better (like with Top, in the video above). With the HTML web, I don’t think I could get people to switch from command line apps to modern GUIs, but with native outerframe, I think I might.&lt;/p&gt;

&lt;p&gt;I think &lt;em&gt;some&lt;/em&gt; existing web experiences should move to native web apps, but the bigger growth area would be a set of new experiences/apps. The rise of AI brings two broad areas for this growth: the first is conventional software, now that so many more people can build stuff, and the second is dynamic generative UI, with AI creating experiences in real time. Combine all of this with the race to build a compelling AR / VR headset, in addition to the race to repurpose local compute for local LLMs, and I think machine code web content becomes the natural next step.&lt;/p&gt;

&lt;h2 id=&quot;maybe-native-apps-will-become-web-like-before-the-web-becomes-native&quot;&gt;Maybe native apps will become web-like before the web becomes native&lt;/h2&gt;

&lt;p&gt;I like the idea of the open web embracing native web apps. It would change the focus of the web from being lowest-common-denominator cross-platform code to being heterogeneous and multi-platform. Each platform would then be free to innovate. Webservers could choose to serve N different native apps to N different platforms, with one of those platforms being the conventional HTML web. All of this would become especially practical because AI could translate code to other platforms and test it.&lt;/p&gt;

&lt;p&gt;An alternate path is that a set of native apps will adopt a dynamic browser-like architecture, and the “native web” ends up existing in a collection of apps using outerframes, or something similar.&lt;/p&gt;

&lt;p&gt;Either path sounds good to me.&lt;/p&gt;

&lt;h2 id=&quot;faq&quot;&gt;FAQ&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why not WebAssembly?&lt;/strong&gt; WebAssembly (WASM) is cool, but it’s still a lowest-common-denominator virtual machine, and it just improves the JavaScript part of “the web”. You still have to combine it with one of the web’s UI models (either treating an app as a “document” via the DOM, or using GPU shaders on WebGL / WebGPU). With the outerframe, you have full control over how your code renders different experiences to the screen. This path certainly has greater potential, and my hunch is that the payoff is big enough to be worth it. With the outerframe, you get to really take ownership of your &lt;em&gt;actual&lt;/em&gt; assembly code. It’s a lot of fun. Broadly, I think a homogeneous platform will hold us back, and a heterogeneous world where each platform gets to innovate will lead to better things.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is this just ActiveX all over again?&lt;/strong&gt; It is similar to ActiveX, but with two important differences. First, the outerframe embraces heavy sandboxing, so the security concerns are greatly reduced compared to early ActiveX. Sandboxing also means we don’t need the “Do you want to allow this ActiveX control?” UI friction. In fact, ActiveX gradually approached an outerframe-like security model from 2007 - 2012, culminating in IE’s eventual “Enhanced Protected Mode” running ActiveX in a true sandbox, but by that point ActiveX was on its way out, because the web community had chosen HTML5. Second, with AI code generation, the world has changed around us. It is becoming practical to ship different native binaries to different platforms without a lock-in effect. Much of the web community hated ActiveX, even after its security issues were arguably solved, but I think they could come around to this. So I guess I am saying, “ActiveX, circa 2012, actually might be a good idea in 2026.”&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(Thanks to Rosanne Liu for reading drafts of this post.)&lt;/em&gt;&lt;/p&gt;
</description>
        <pubDate>Sun, 10 May 2026 00:00:00 -0700</pubDate>
        <link>https://probablymarcus.com/blocks/2026/05/10/like-a-web-view-but-native.html</link>
        <guid isPermaLink="true">https://probablymarcus.com/blocks/2026/05/10/like-a-web-view-but-native.html</guid>
        
        
        <category>blocks</category>
        
      </item>
    
      <item>
        <title>Tip: Use services, not the terminal, to run local backends</title>
        <description>&lt;style&gt;
article &gt; h1, article &gt; h2, article &gt; h3 {
  margin-top:30px;
}
&lt;/style&gt;

&lt;p&gt;I’ve embraced a fun way of working with Jupyter, Tensorboard, and other local webservers.&lt;/p&gt;

&lt;h2 id=&quot;example-launch-jupyter-lab&quot;&gt;Example: Launch Jupyter Lab&lt;/h2&gt;

&lt;video src=&quot;/images/2026-04-06-outer-loop-service-list-6.mp4&quot; poster=&quot;/images/2026-04-06-outer-loop-service-list-6-poster.png&quot; controls=&quot;&quot; playsinline=&quot;&quot; style=&quot;width: 100%; margin-bottom:20px;&quot;&gt;&lt;/video&gt;

&lt;p&gt;In this video, I clicked a new “services” item in my macOS menu bar:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/2026-04-09-menu-zoom-in.png&quot; alt=&quot;Menu bar listing backends, as described below&quot; /&gt;&lt;/p&gt;

&lt;p&gt;As you can see, the current backends on my laptop are Jupyter Lab, two websites, and a “Top” utility that I built. I only had to create these entries once, and now I have them forever. Now I’m a few clicks away from any of my backends, instead of needing to use a terminal to access them.&lt;/p&gt;

&lt;h2 id=&quot;how&quot;&gt;How?&lt;/h2&gt;

&lt;p&gt;I built a lightweight user interface that uses the operating system’s built-in “services” feature.&lt;/p&gt;

&lt;p&gt;Most operating systems have built-in service management:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;macOS has “launchd”. (You can see the services by running &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;launchctl list&lt;/code&gt;.)&lt;/li&gt;
  &lt;li&gt;Most Linux distributions have “systemd”, which was inspired by launchd. (Run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemctl&lt;/code&gt;.)&lt;/li&gt;
  &lt;li&gt;Windows has offered services longer than either of those. (Open Task Manager.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Services are heavily used by the operating system and apps, but most people don’t use them directly. Meanwhile, many scientific computing tools make you open a terminal and run a command to launch an app backend. Services are a perfect match for this use case; all that’s missing is good UI.&lt;/p&gt;

&lt;p&gt;You could vibe-code your own version of this experience, or use &lt;a href=&quot;https://outerloop.sh&quot;&gt;Outer Loop&lt;/a&gt;. It’s a pretty lightweight set of features, since the operating system does the service management heavy-lifting.&lt;/p&gt;

&lt;p&gt;To me, this flow just feels obviously better. Some overlapping reasons:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Requiring a terminal filters out a subset of people, and it breaks my creative flow. For many people who need use computers for data analysis, it takes a lot of cognitive overhead to find the terminal app, “cd” to the correct folder, and remember which command to run.&lt;/li&gt;
  &lt;li&gt;In terms of information input, this workflow takes on the same shape as projects. You type the command once into Outer Loop (or your own service creation app), then proceed to reuse it for the next few months, rather than having to repeat the command again and again.&lt;/li&gt;
  &lt;li&gt;With a terminal, I always seem to have Jupyter / Tensorboard / local webservers hiding in a sea of tabs. I’m finally done fishing through tabs.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Hope you like it! Download the Mac app &lt;a href=&quot;https://outerloop.sh&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;h2 id=&quot;footnotes&quot;&gt;Footnotes&lt;/h2&gt;

&lt;p&gt;For Jupyter specifically, VS Code has done a good job with their file-viewer approach, where the backend and frontend are just one thing. But that’s pretty specific to local file browsing. One nice thing about preserving the separation of frontends and backends is that is scales up naturally to running apps like Jupyter on remote servers.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(Thanks to Rosanne Liu for giving feedback on earlier drafts of this post.)&lt;/em&gt;&lt;/p&gt;
</description>
        <pubDate>Thu, 09 Apr 2026 00:00:00 -0700</pubDate>
        <link>https://probablymarcus.com/blocks/2026/04/09/use-os-services-for-your-local-backends.html</link>
        <guid isPermaLink="true">https://probablymarcus.com/blocks/2026/04/09/use-os-services-for-your-local-backends.html</guid>
        
        
        <category>blocks</category>
        
      </item>
    
      <item>
        <title>Web apps over SSH can be surprisingly good</title>
        <description>&lt;style&gt;
article &gt; h1, article &gt; h2, article &gt; h3 {
  margin-top:30px;
}

.mono {
  font-family: ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, &quot;Liberation Mono&quot;, &quot;Courier New&quot;, monospace;
  font-weight: 500;
  letter-spacing: -0.01em;
}
&lt;/style&gt;

&lt;p&gt;When you’re working directly with a server, web apps are useful. A few examples: Jupyter lets you run Python in a visual environment, Tensorboard lets you observe deep learning training runs, and tools like phpMyAdmin help you manage websites. The problem is, these apps typically aren’t directly accessible to web browsers, because exposing them to the web puts your server at risk.&lt;/p&gt;

&lt;p&gt;The standard solution to this is SSH port forwarding. You create an SSH connection to the server, and then you tunnel arbitrary TCP connections over that encrypted SSH connection. This works, but it introduces friction and is often janky. Many entire “software-as-a-service” companies are devoted to providing smoother alternatives. For example, some companies give you a convenient way to log data to &lt;em&gt;their&lt;/em&gt; servers, and they provide a safe web interface so that you never need to learn how to forward a port.&lt;/p&gt;

&lt;p&gt;In this blog post I share two key improvements to SSH port forwarding that make it much more fun and reliable:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Solve scenario: &lt;em&gt;“I closed my laptop, then reopened it, and the connection stopped working”&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;Incorporate SSH directly into the browser&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To demonstrate this, I’m launching &lt;a href=&quot;https://outerloop.sh/&quot;&gt;Outer Loop&lt;/a&gt;, a specialized web browser for Mac.&lt;/p&gt;

&lt;h2 id=&quot;change-1-let-me-close-my-laptop&quot;&gt;Change 1: Let me close my laptop&lt;/h2&gt;

&lt;p&gt;In terms of the information flow, port forwarding and SSH don’t really have the same shape. If we aren’t intentional, we will get a “round-peg-in-square-hole” situation.&lt;/p&gt;

&lt;p&gt;When you use SSH more generally, the SSH session is stateful. It has a current working directory, it has running processes, it has environment variables. If you abandon an SSH session, you lose all of this.&lt;/p&gt;

&lt;p&gt;When you use SSH for port forwarding, you aren’t relying on any of that. For you, an SSH session can just be a short-lived thing that may even get created and discarded multiple times while you interact with your web app.&lt;/p&gt;

&lt;p&gt;So a first-class port forwarding library should treat SSH sessions as &lt;em&gt;transient&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/2025-10-11-port-forwarding-diagram.svg?v3&quot; style=&quot;margin-bottom:10px;&quot; alt=&quot;A figure showing a transient connection between your device&apos;s &apos;Port Forwarder&apos; and the remote &apos;SSH Server&apos;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;When you perform port forwarding, there doesn’t always need to be a live connection with the server. Rather, the only thing constant is that you have a process that listens to a local port on your system. This process (I’ll call it the “port forwarder”) is responsible for creating an SSH session with the remote server, but that session can be created opportunistically. The port forwarder may wait and create it &lt;em&gt;after&lt;/em&gt; a browser connects to the port. Then, after the browser navigation, the port forwarder might disconnect the session immediately, or it might keep it around. The mindset shift here is: reusing an SSH session is not essential, it’s just a performance heuristic.&lt;/p&gt;

&lt;p&gt;Of course, it is more efficient to reuse sessions when possible. In fact, it’s worth going further than that: there is a big performance win if we pre-allocate a set of SSH &lt;em&gt;channels&lt;/em&gt;, each representing a speculatively-created fresh TCP connection between the remote SSH server and the remote web app backend. This lets us quickly forward HTTP requests when the browser sends them. (Remember: when a page loads, there is one HTTP request per page, image, and script.)&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/2025-10-16-port-forwarding-idle-state.svg&quot; style=&quot;margin-bottom:10px;&quot; alt=&quot;The same figure as above, now showing multiple TCP connections within the server, and a corresponding channel pool&quot; /&gt;&lt;/p&gt;

&lt;p&gt;But these performance heuristics make fault tolerance more complicated. When there is some kind of network fault (e.g. &lt;em&gt;“I close my laptop for 10 minutes”&lt;/em&gt;), we need to avoid sending browser HTTP requests into a channel of a closed session. Keeping sessions around, and in particular, keeping a pool of speculative channels, increases the chance that this will happen. If we send a request into a failed channel, we will quickly learn that the session has failed, but we will not know with 100% certainty that the HTTP request &lt;em&gt;didn’t&lt;/em&gt; reach the web app backend. Perhaps the request reached the backend, and the fault occurred immediately afterward. So we can’t safely just re-send the request over a new session, because we risk doing a double-send, which in rare cases may be destructive. Thus, after sending an HTTP request into a closed session, we are forced to surface the broken connection to the user’s browser.&lt;/p&gt;

&lt;p&gt;Fortunately, it is quite easy to solve this with simple pings, sending &lt;span class=&quot;mono&quot;&gt;SSH_MSG_GLOBAL_REQUEST&lt;/span&gt; packets to the SSH server. The port forwarder can use a simple rule like, &lt;em&gt;&quot;If N minutes have passed since the last traffic, perform a ping and wait for the reply before using this session again.&quot;&lt;/em&gt; On my home internet connection to a GCP server, this ping introduces about 70 milliseconds of latency. If the session is found to be invalid, the server is still fast to respond, and we quickly set up a new SSH session and forward the TCP connection over that new session instead.&lt;/p&gt;

&lt;p&gt;Independent of SSH, apps with long-running TCP connections will still have their connections broken by long laptop closes. Long-running TCP connections are inherently fragile, and any app that uses them is responsible for building in their own fault tolerance. The point here is to prevent the port-forwarding layer from introducing new faults or hangs, particularly in web app scenarios (which typically involve many very short TCP connections, a.k.a. HTTP requests).&lt;/p&gt;

&lt;h2 id=&quot;change-2-create-the-obvious-intuitive-product&quot;&gt;Change 2: Create the obvious intuitive product&lt;/h2&gt;

&lt;p&gt;Today, using web apps over SSH requires you to open two windows. You have to open a new local terminal window and copy-paste a command like:&lt;br /&gt;&amp;nbsp;&lt;span class=&quot;mono&quot;&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;ssh -L 24601:localhost:8889 mrcslws@lambda4.mycompany.com&lt;/span&gt;&lt;br /&gt;&lt;br /&gt; Then you go to your browser window, and type &lt;span class=&quot;mono&quot;&gt;&quot;localhost:24601&quot;&lt;/span&gt; in the address bar.&lt;/p&gt;

&lt;p&gt;What if, instead, the browser itself is SSH-aware?&lt;/p&gt;

&lt;style&gt;

.inline-photo {
  margin: clamp(28px, 5vw, 60px) auto;
  position: relative;
  left: 50%;
  transform: translateX(-50%);
  width: min(100vw - 48px, 1340px);
  text-align: center;
}

.inline-photo picture {
  display: block;
}

.inline-photo img {
  display: block;
  width: 100%;
  height: auto;
}
&lt;/style&gt;

&lt;figure class=&quot;inline-photo&quot; style=&quot;margin-top:-20px;margin-bottom:-10px;&quot;&gt;
&lt;img src=&quot;/images/2025-10-10-outer-loop-screenshot.png&quot; alt=&quot;A browser window with server info in the upper-left corner. It&apos;s showing two tabs: Tensorboard and a Start Page. The Start Page shows a set of web apps running on the server. The image also shows a context menu, revealing that the app is performing port forwarding for these web apps.&quot; /&gt;
&lt;/figure&gt;

&lt;p&gt;This window connects to the server over SSH. Using this window, you can easily navigate to any private web apps via that server. You never have to think about the fact that you’re connecting to your own localhost. In this app, the term “localhost” actually refers to the server’s localhost, which means you can copy-paste URLs from an SSH terminal, for example when Jupyter or Tensorboard prints a launch URL to the terminal. You no longer have to choose your own arbitrary local port numbers; the app remembers the web apps you’ve connected to previously, and it carefully uses the same local port for that app over time, so the web app’s browser cookies are preserved. Port forwarding runs in multiple sandboxed processes, one for each server login. And you are free to use your own preferred web browser, using this app as a standalone port forwarder – just keep the app window open (or minimized).&lt;/p&gt;

&lt;p&gt;Feel free to &lt;a href=&quot;https://outerloop.sh/&quot;&gt;try it yourself&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;surprises&quot;&gt;Surprises&lt;/h2&gt;

&lt;p&gt;Going into this project, I assumed that connecting over SSH would sacrifice some performance, compared to connecting directly (and insecurely) via the web. I was surprised to see that, in practice, SSH is often faster. There are a couple of interesting discussion points here. First, with SSH, you typically already have an underlying TCP connection to the server, rather than having to create fresh ones, so that gives you a head start. Second, when we connect over SSH, we aren’t simply adding an extra SSH “layer of indirection”. We are replacing HTTPS (or rather, SSL) with SSH. We’re swapping out our authentication / encryption protocol; it’s a replacement, not an addition. If you’re doing big downloads or uploads, going over SSH will likely be slower, but I was surprised to see that in many scenarios it is actually faster.&lt;/p&gt;

&lt;p&gt;There is a lot of low-hanging fruit in this space. We can make the experience of “working with a server” a lot more fun and accessible.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(Thanks to Rosanne Liu and Charlie Liu Lewis for reading and listening to drafts of this post.)&lt;/em&gt;&lt;/p&gt;
</description>
        <pubDate>Fri, 10 Oct 2025 00:00:00 -0700</pubDate>
        <link>https://probablymarcus.com/blocks/2025/10/10/web-apps-over-ssh-surprisingly-good.html</link>
        <guid isPermaLink="true">https://probablymarcus.com/blocks/2025/10/10/web-apps-over-ssh-surprisingly-good.html</guid>
        
        
        <category>blocks</category>
        
      </item>
    
      <item>
        <title>Smooth Pursuit, and how Screens Don&apos;t Mimic Reality</title>
        <description>&lt;!-- Hello, curious traveler! I vibe-coded this blog post&apos;s figures. I haven&apos;t read most of the code. --&gt;

&lt;style&gt;
article &gt; h1, article &gt; h2, article &gt; h3 {
  margin-top:30px;
}
&lt;/style&gt;

&lt;p&gt;Your brain’s visual system is economical. Rather than capturing a big high-resolution image, only a small portion of your retina has light receptors packed together closely enough to capture fine detail. To construct a big useful picture of your surroundings, your brain relies heavily on eye movement.&lt;/p&gt;

&lt;p&gt;One nice trick your eyes and brain use is &lt;a href=&quot;https://en.wikipedia.org/wiki/Smooth_pursuit&quot;&gt;smooth pursuit&lt;/a&gt;. You can lock your eyes on a moving object and move them continuously, not as a set of discrete changes, but by actually giving your eyes and head an &lt;em&gt;angular velocity&lt;/em&gt;. Your visual system uses low-level details to perform quick error-corrections to stay locked on the object. This lets you give the most powerful part of your retina a stable input image for the moving object, allowing you to see fine detail even for fast-moving objects.&lt;/p&gt;

&lt;p&gt;This is an elegant design. But this design breaks down if you’re looking at a screen.&lt;/p&gt;

&lt;h2 id=&quot;moving-text-is-easy-to-read-unless-its-on-a-screen&quot;&gt;Moving text is easy to read, unless it’s on a screen&lt;/h2&gt;

&lt;p&gt;Watch this animation.&lt;/p&gt;

&lt;div class=&quot;text-animation-container&quot;&gt;
  &lt;div class=&quot;animation-controls&quot;&gt;
    &lt;label for=&quot;speedControl&quot;&gt;Text movement speed (pixels/second): &lt;/label&gt;
    &lt;input type=&quot;range&quot; id=&quot;speedControl&quot; min=&quot;0&quot; max=&quot;8&quot; value=&quot;6&quot; step=&quot;1&quot; list=&quot;speedValues&quot; /&gt;
    &lt;datalist id=&quot;speedValues&quot;&gt;
      &lt;option value=&quot;0&quot; label=&quot;0&quot;&gt;&lt;/option&gt;
      &lt;option value=&quot;1&quot; label=&quot;60&quot;&gt;&lt;/option&gt;
      &lt;option value=&quot;2&quot; label=&quot;120&quot;&gt;&lt;/option&gt;
      &lt;option value=&quot;3&quot; label=&quot;300&quot;&gt;&lt;/option&gt;
      &lt;option value=&quot;4&quot; label=&quot;600&quot;&gt;&lt;/option&gt;
      &lt;option value=&quot;5&quot; label=&quot;900&quot;&gt;&lt;/option&gt;
      &lt;option value=&quot;6&quot; label=&quot;1200&quot;&gt;&lt;/option&gt;
      &lt;option value=&quot;7&quot; label=&quot;1500&quot;&gt;&lt;/option&gt;
      &lt;option value=&quot;8&quot; label=&quot;2000&quot;&gt;&lt;/option&gt;
    &lt;/datalist&gt;
    &lt;span id=&quot;speedDisplay&quot;&gt;1200&lt;/span&gt;

    &lt;button id=&quot;pauseButton1&quot; class=&quot;control-button pause-button paused&quot;&gt;Play animations&lt;/button&gt;
  &lt;/div&gt;

  &lt;div class=&quot;text-animation-fullscreen&quot;&gt;
    &lt;div class=&quot;animation-viewport&quot;&gt;
      &lt;div class=&quot;animated-text&quot; id=&quot;movingText&quot;&gt;The quick brown fox jumps over the lazy dog&lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
  &lt;div class=&quot;animation-info&quot;&gt;
    &lt;p id=&quot;playPrompt&quot;&gt;Press play and try to read the text.&lt;/p&gt;
    &lt;p id=&quot;landscapeHint&quot; class=&quot;landscape-hint&quot;&gt;(Turn your phone, it&apos;s better in landscape.)&lt;/p&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;style&gt;
.text-animation-container {
  margin: 2em 0;
  padding: 1.5em;
  background: #f9f9f9;
  border: 1px solid #ddd;
  border-radius: 8px;
}

.text-animation-fullscreen {
  position: relative;
  left: 50%;
  right: 50%;
  margin-left: -50vw;
  margin-right: -50vw;
  width: 100vw;
  margin-top: 1.5em;
  margin-bottom: 0;
}

.animation-viewport {
  height: 80px;
  position: relative;
  overflow: hidden;
  background: white;
  width: 100vw;
  border-top: 1px solid #e0e0e0;
  border-bottom: 1px solid #e0e0e0;
}

.animated-text {
  position: absolute;
  top: 50%;
  transform: translateY(-50%);
  white-space: nowrap;
  font-size: 24px;
  font-family: Georgia, serif;
  color: #333;
  will-change: transform;
  left: 0;
}

@media (max-width: 768px) {
  .animated-text {
    font-size: 18px;
  }
}

.animation-controls {
  display: flex;
  align-items: center;
  gap: 1em;
  flex-wrap: wrap;
  justify-content: center;
  margin-bottom: 1.5em;
}

/* Unified control layout for all figures */
.animation-controls,
.discrete-controls,
.speed-selector {
  display: flex;
  align-items: center;
  gap: 1em;
  flex-wrap: nowrap;
}

.animation-controls input[type=&quot;range&quot;],
.discrete-controls input[type=&quot;range&quot;],
.speed-selector input[type=&quot;range&quot;] {
  width: 300px;
}

@media (max-width: 768px) {
  .animation-controls,
  .discrete-controls,
  .speed-selector {
    flex-wrap: wrap;
    justify-content: center;
  }

  .animation-controls label,
  .discrete-controls label,
  .speed-selector label {
    flex: 0 0 auto;
  }

  .animation-controls input[type=&quot;range&quot;],
  .discrete-controls input[type=&quot;range&quot;],
  .speed-selector input[type=&quot;range&quot;] {
    flex: 1 1 auto;
    min-width: 150px;
    max-width: 200px;
  }

  .animation-controls span,
  .discrete-controls span,
  .speed-selector span {
    flex: 0 0 auto;
    margin-left: -0.5em;
  }

  .animation-controls .control-button,
  .discrete-controls .control-button,
  .speed-selector .control-button {
    flex: 0 0 100%;
    margin-top: 0.5em;
  }
}

.animation-controls label {
  font-weight: bold;
}


#speedDisplay {
  min-width: 40px;
  font-weight: bold;
}

.control-button {
  padding: 0.5em 1em;
  background: #e74c3c;
  color: white;
  border: none;
  border-radius: 4px;
  cursor: pointer;
  font-size: 14px;
  min-width: 150px;
}

.control-button:hover {
  background: #c0392b;
}

.pause-button {
  background: #3498db;
}

.pause-button:hover {
  background: #2980b9;
}

.pause-button.paused {
  background: #27ae60;
}

.pause-button.paused:hover {
  background: #229954;
}

@keyframes slideText {
  from {
    transform: translateY(-50%) translateX(var(--end-position));
  }
  to {
    transform: translateY(-50%) translateX(var(--start-position));
  }
}

.animated-text.animating {
  animation: slideText var(--animation-duration) linear infinite;
}

.animated-text.stopped {
  left: 50% !important;
  transform: translateY(-50%) translateX(-50%) !important;
}
&lt;/style&gt;

&lt;script&gt;
// Detect if on phone and update computerType text
(function() {
  const isPhone = /Android|webOS|iPhone|iPad|iPod|BlackBerry|IEMobile|Opera Mini/i.test(navigator.userAgent);
  if (isPhone) {
    document.addEventListener(&apos;DOMContentLoaded&apos;, function() {
      document.querySelectorAll(&apos;.computerType&apos;).forEach(el =&gt; {
        el.textContent = &apos;phone&apos;;
      });
    });
  }
})();

// Global animation state
window.globalAnimationsPlaying = false;

// Speed values array
const speedValues = [0, 60, 120, 300, 600, 900, 1200, 1500, 2000];

// Global refresh rate
window.globalRefreshRate = 120;

// Get slowdown factor based on refresh rate
window.getSlowdownFactor = function() {
  switch(window.globalRefreshRate) {
    case 30: return 2;
    case 60: return 4;
    case 120: return 8;
    default: return 8;
  }
};

// Unified function to update all animations
window.updateAllAnimations = function(playing) {
  window.globalAnimationsPlaying = playing;

  // Update all pause buttons
  document.querySelectorAll(&apos;.pause-button&apos;).forEach(btn =&gt; {
    btn.textContent = playing ? &apos;Pause animations&apos; : &apos;Play animations&apos;;
    btn.classList.toggle(&apos;paused&apos;, !playing);
  });

  // Update main animation
  const mainText = document.getElementById(&apos;movingText&apos;);
  const mainSpeedControl = document.getElementById(&apos;speedControl&apos;);
  if (mainText &amp;&amp; mainSpeedControl) {
    const mainSpeedIndex = parseInt(mainSpeedControl.value);
    if (playing &amp;&amp; speedValues[mainSpeedIndex] &gt; 0) {
      // Make sure animation is properly set up by calling updateAnimation
      if (window.updateMainAnimation) {
        window.updateMainAnimation();
      }
      mainText.classList.remove(&apos;stopped&apos;);
      mainText.classList.add(&apos;animating&apos;);
    } else {
      mainText.classList.remove(&apos;animating&apos;);
      mainText.classList.add(&apos;stopped&apos;);
    }
  }

  // Update frame animation and paused overlay
  const frameText = document.getElementById(&apos;frameText&apos;);
  const framePausedOverlay = document.getElementById(&apos;framePausedOverlay&apos;);
  const frameSpeedControl = document.getElementById(&apos;frameSpeedControl&apos;);
  if (frameText &amp;&amp; frameSpeedControl) {
    const frameSpeedIndex = parseInt(frameSpeedControl.value);
    if (playing &amp;&amp; speedValues[frameSpeedIndex] &gt; 0) {
      frameText.classList.add(&apos;animating&apos;);
      if (framePausedOverlay) framePausedOverlay.classList.remove(&apos;visible&apos;);
    } else {
      frameText.classList.remove(&apos;animating&apos;);
      if (framePausedOverlay &amp;&amp; speedValues[frameSpeedIndex] &gt; 0) {
        framePausedOverlay.classList.add(&apos;visible&apos;);
      }
    }
  }

  // Update discrete animation
  const discreteSpeedControl = document.getElementById(&apos;discreteSpeedControl&apos;);
  if (discreteSpeedControl) {
    const discreteSpeedIndex = parseInt(discreteSpeedControl.value);
    if (playing &amp;&amp; speedValues[discreteSpeedIndex] &gt; 0 &amp;&amp; window.updateDiscreteVisualization) {
      window.updateDiscreteVisualization();
    }
  }
};

(function() {
  const movingText = document.getElementById(&apos;movingText&apos;);
  const speedControl = document.getElementById(&apos;speedControl&apos;);
  const speedDisplay = document.getElementById(&apos;speedDisplay&apos;);
  const pauseButton = document.getElementById(&apos;pauseButton1&apos;);
  const viewport = movingText.parentElement;

  let currentSpeed = 1200;
  let previousSpeed = 1200;

  function updateAnimation() {
    const speedIndex = parseInt(speedControl.value);
    currentSpeed = speedValues[speedIndex];
    speedDisplay.textContent = currentSpeed;

    // Remove animation class
    movingText.classList.remove(&apos;animating&apos;);

    if (currentSpeed === 0) {
      // Center the text when stopped
      movingText.classList.add(&apos;stopped&apos;);
      return;
    }

    // Calculate positions and duration
    const textWidth = movingText.offsetWidth;
    const viewportWidth = window.innerWidth;
    const totalDistance = viewportWidth + textWidth;
    const duration = (totalDistance / currentSpeed) * 1000; // Convert to milliseconds

    // Set CSS custom properties (reversed: right to left)
    movingText.style.setProperty(&apos;--start-position&apos;, `-${textWidth}px`);
    movingText.style.setProperty(&apos;--end-position&apos;, `${viewportWidth}px`);
    movingText.style.setProperty(&apos;--animation-duration&apos;, `${duration}ms`);

    // Only update positioning if we&apos;re playing
    if (window.globalAnimationsPlaying) {
      movingText.classList.remove(&apos;stopped&apos;);
      // Force reflow to restart animation
      void movingText.offsetWidth;
      movingText.classList.add(&apos;animating&apos;);
    } else {
      // Keep centered when paused
      movingText.classList.add(&apos;stopped&apos;);
    }
  }

  // Expose update function globally for sync
  window.updateMainAnimation = updateAnimation;

  // Pause button functionality
  pauseButton.addEventListener(&apos;click&apos;, function() {
    window.updateAllAnimations(!window.globalAnimationsPlaying);
  });

  // Initialize
  speedControl.addEventListener(&apos;input&apos;, function() {
    updateAnimation();
    // Sync with the other controls (don&apos;t trigger event to avoid loop)
    const frameSpeedControl = document.getElementById(&apos;frameSpeedControl&apos;);
    const frameSpeedDisplay = document.getElementById(&apos;frameSpeedDisplay&apos;);
    if (frameSpeedControl &amp;&amp; frameSpeedControl.value !== speedControl.value) {
      frameSpeedControl.value = speedControl.value;
      frameSpeedDisplay.textContent = currentSpeed;
      // Call the update function directly instead of triggering event
      if (window.updateFrameVisualization) {
        window.updateFrameVisualization();
      }
    }

    // Sync with discrete visualization
    const discreteSpeedControl = document.getElementById(&apos;discreteSpeedControl&apos;);
    const discreteSpeedDisplay = document.getElementById(&apos;discreteSpeedDisplay&apos;);
    if (discreteSpeedControl &amp;&amp; discreteSpeedControl.value !== speedControl.value) {
      discreteSpeedControl.value = speedControl.value;
      discreteSpeedDisplay.textContent = currentSpeed;
      if (window.updateDiscreteVisualization) {
        window.updateDiscreteVisualization();
      }
    }
  });

  // Start with default speed (but paused)
  updateAnimation();
  // Ensure text is centered when starting paused
  movingText.classList.add(&apos;stopped&apos;);

  // Check for mobile portrait mode
  function checkOrientation() {
    const landscapeHint = document.getElementById(&apos;landscapeHint&apos;);
    if (landscapeHint) {
      const isMobilePortrait = window.innerWidth &lt; 768 &amp;&amp; window.innerHeight &gt; window.innerWidth;
      landscapeHint.style.display = isMobilePortrait ? &apos;block&apos; : &apos;none&apos;;
    }
  }

  // Check orientation on load
  checkOrientation();

  // Handle window resize with debouncing to avoid animation restarts on mobile scroll
  let resizeTimeout;
  window.addEventListener(&apos;resize&apos;, function() {
    checkOrientation();
    
    // Clear any existing timeout
    clearTimeout(resizeTimeout);
    
    // Debounce the animation update
    resizeTimeout = setTimeout(function() {
      if (currentSpeed &gt; 0 &amp;&amp; window.globalAnimationsPlaying) {
        // Only update if viewport width actually changed significantly
        const newViewportWidth = window.innerWidth;
        const oldEndPosition = parseFloat(movingText.style.getPropertyValue(&apos;--end-position&apos;));
        
        // Only restart if the width changed by more than 50px
        if (Math.abs(newViewportWidth - oldEndPosition) &gt; 50) {
          updateAnimation();
        }
      }
    }, 300); // Wait 300ms after resize stops
  });
})();
&lt;/script&gt;

&lt;p&gt;Painful, right?&lt;/p&gt;

&lt;p&gt;Now pick up a piece of paper, a book, or some other device. Move it in front of your screen at the same speed as this animation. You’ll see that it looks &lt;em&gt;amazing&lt;/em&gt; compared to the animation on this screen. You’ll see that “1200 pixels/second” is not actually very fast.&lt;/p&gt;

&lt;h2 id=&quot;the-text-isnt-moving-its-teleporting&quot;&gt;The text isn’t moving, it’s teleporting&lt;/h2&gt;

&lt;p&gt;This may be obvious to some of you.&lt;/p&gt;

&lt;p&gt;On a screen refreshing at &lt;span id=&quot;descriptionFps&quot;&gt;120&lt;/span&gt; frames per second (fps), the text isn’t moving smoothly—it’s teleporting between positions every &lt;span id=&quot;descriptionMs&quot;&gt;8.3&lt;/span&gt; milliseconds. Here are &lt;span id=&quot;descriptionNumFrames&quot;&gt;12&lt;/span&gt; frames of the animation:&lt;/p&gt;

&lt;div class=&quot;discrete-frames-container&quot;&gt;
  &lt;div class=&quot;discrete-controls&quot;&gt;
    &lt;label for=&quot;discreteSpeedControl&quot;&gt;Text movement speed (pixels/second): &lt;/label&gt;
    &lt;input type=&quot;range&quot; id=&quot;discreteSpeedControl&quot; min=&quot;0&quot; max=&quot;8&quot; value=&quot;6&quot; step=&quot;1&quot; list=&quot;discreteSpeedValues&quot; /&gt;
    &lt;datalist id=&quot;discreteSpeedValues&quot;&gt;
      &lt;option value=&quot;0&quot; label=&quot;0&quot;&gt;&lt;/option&gt;
      &lt;option value=&quot;1&quot; label=&quot;60&quot;&gt;&lt;/option&gt;
      &lt;option value=&quot;2&quot; label=&quot;120&quot;&gt;&lt;/option&gt;
      &lt;option value=&quot;3&quot; label=&quot;300&quot;&gt;&lt;/option&gt;
      &lt;option value=&quot;4&quot; label=&quot;600&quot;&gt;&lt;/option&gt;
      &lt;option value=&quot;5&quot; label=&quot;900&quot;&gt;&lt;/option&gt;
      &lt;option value=&quot;6&quot; label=&quot;1200&quot;&gt;&lt;/option&gt;
      &lt;option value=&quot;7&quot; label=&quot;1500&quot;&gt;&lt;/option&gt;
      &lt;option value=&quot;8&quot; label=&quot;2000&quot;&gt;&lt;/option&gt;
    &lt;/datalist&gt;
    &lt;span id=&quot;discreteSpeedDisplay&quot;&gt;1200&lt;/span&gt;
    &lt;button id=&quot;pauseButton2&quot; class=&quot;control-button pause-button paused&quot;&gt;Play animations&lt;/button&gt;
  &lt;/div&gt;

  &lt;div class=&quot;fps-selector&quot;&gt;
    &lt;span class=&quot;fps-label&quot;&gt;Simulated display refresh rate:&lt;/span&gt;
    &lt;button class=&quot;fps-button&quot; data-fps=&quot;30&quot;&gt;30 fps&lt;/button&gt;
    &lt;button class=&quot;fps-button&quot; data-fps=&quot;60&quot;&gt;60 fps&lt;/button&gt;
    &lt;button class=&quot;fps-button active&quot; data-fps=&quot;120&quot;&gt;120 fps&lt;/button&gt;
  &lt;/div&gt;

  &lt;div class=&quot;discrete-frames-viewport&quot; id=&quot;discreteViewport&quot;&gt;
    &lt;!-- Frame elements will be dynamically inserted here --&gt;
  &lt;/div&gt;

  &lt;div class=&quot;discrete-info&quot;&gt;
    &lt;span&gt;At &lt;span id=&quot;discreteFps&quot;&gt;120&lt;/span&gt; frames per second, the text shifts &lt;span id=&quot;discretePixelShift&quot;&gt;10&lt;/span&gt; logical pixels per frame.&lt;/span&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;style&gt;
.discrete-frames-container {
  margin: 2em 0;
  padding: 1.5em;
  background: #f9f9f9;
  border: 1px solid #ddd;
  border-radius: 8px;
}

.discrete-controls {
  margin-bottom: 1.5em;
}

.discrete-controls label {
  font-weight: bold;
}


#discreteSpeedDisplay {
  min-width: 40px;
  font-weight: bold;
}


.discrete-info {
  margin-top: 1em;
  font-size: 14px;
  color: #666;
  text-align: center;
}

.animation-info {
  margin-top: 1.5em;
  text-align: center;
  position: relative;
  z-index: 1;
}

.animation-info p {
  margin: 0.5em 0;
  font-size: 14px;
  color: #666;
}

.animation-info .landscape-hint {
  display: none;
  color: #e74c3c;
  font-style: italic;
}

/* FPS selector styles */
.fps-selector {
  display: flex;
  align-items: center;
  gap: 0.5em;
  margin-top: 1em;
  margin-bottom: 0.5em;
  justify-content: center;
}

.fps-label {
  font-size: 14px;
  color: #666;
  margin-right: 0.5em;
}

&lt;/style&gt;

&lt;script&gt;
(function() {
  const discreteSpeedControl = document.getElementById(&apos;discreteSpeedControl&apos;);
  const discreteSpeedDisplay = document.getElementById(&apos;discreteSpeedDisplay&apos;);
  const discretePixelShift = document.getElementById(&apos;discretePixelShift&apos;);
  const discreteFps = document.getElementById(&apos;discreteFps&apos;);
  const discreteViewport = document.getElementById(&apos;discreteViewport&apos;);
  const pauseButton = document.getElementById(&apos;pauseButton2&apos;);

  let animationInterval = null;
  let currentFrame = 0;
  let frameElements = [];

  function getNumFrames() {
    // Different number of frames based on refresh rate
    switch(window.globalRefreshRate) {
      case 30: return 3;
      case 60: return 6;
      case 120: return 12;
      default: return 12;
    }
  }

  function createFrameElements() {
    // Clear existing elements
    discreteViewport.innerHTML = &apos;&apos;;
    frameElements = [];

    const numFrames = getNumFrames();

    // Create frame elements
    for (let i = 0; i &lt; numFrames; i++) {
      const frame = document.createElement(&apos;div&apos;);
      frame.className = &apos;discrete-frame&apos;;
      frame.textContent = &apos;The quick brown fox&apos;;
      discreteViewport.appendChild(frame);
      frameElements.push(frame);
    }
  }

  function updateDiscreteVisualization() {
    const speedIndex = parseInt(discreteSpeedControl.value);
    const speed = speedValues[speedIndex];
    discreteSpeedDisplay.textContent = speed;

    const pixelsPerFrame = speed / window.globalRefreshRate;
    discretePixelShift.textContent = pixelsPerFrame.toFixed(1);
    discreteFps.textContent = window.globalRefreshRate;

    // Stop any existing animation
    if (animationInterval) {
      clearInterval(animationInterval);
      animationInterval = null;
    }

    // Get number of frames for current refresh rate
    const numFrames = getNumFrames();
    
    // Create frame elements if needed or if count changed
    if (frameElements.length !== numFrames) {
      createFrameElements();
    }

    // Position all frames
    // Calculate total offset to center the group
    const totalOffset = (numFrames - 1) * pixelsPerFrame / 2;

    frameElements.forEach((frame, i) =&gt; {
      const offset = i * pixelsPerFrame - totalOffset;
      frame.style.transform = `translateY(-50%) translateX(calc(-50% + ${offset}px))`;
      frame.classList.remove(&apos;active&apos;);
    });

    if (speed === 0) {
      // Show only one frame when stopped
      frameElements[0].classList.add(&apos;active&apos;);
      frameElements.slice(1).forEach(frame =&gt; frame.style.display = &apos;none&apos;);
      return;
    }

    // Show all frames
    frameElements.forEach(frame =&gt; frame.style.display = &apos;block&apos;);

    // Start animation from the last frame (rightmost position) to animate leftward
    currentFrame = numFrames - 1;
    frameElements[currentFrame].classList.add(&apos;active&apos;);

    // Only animate if globally playing
    if (window.globalAnimationsPlaying) {
      // Calculate timing to make all refresh rates complete their cycle in the same time
      // Target: complete full cycle in 1 second (1000ms)
      const cycleDuration = 1000; // milliseconds for one complete cycle
      const msPerFrame = cycleDuration / numFrames;

      animationInterval = setInterval(() =&gt; {
        if (!window.globalAnimationsPlaying) {
          return; // Skip frame if paused
        }

        // Remove active from current frame
        frameElements[currentFrame].classList.remove(&apos;active&apos;);

        // Move to previous frame (lower index = further right visually, creating leftward motion)
        currentFrame = currentFrame - 1;
        if (currentFrame &lt; 0) currentFrame = getNumFrames() - 1;

        // Add active to new frame
        frameElements[currentFrame].classList.add(&apos;active&apos;);
      }, msPerFrame);
    }
  }

  // Sync with other controls
  discreteSpeedControl.addEventListener(&apos;input&apos;, function() {
    updateDiscreteVisualization();

    // Sync with main animation
    const mainSpeedControl = document.getElementById(&apos;speedControl&apos;);
    const mainSpeedDisplay = document.getElementById(&apos;speedDisplay&apos;);
    if (mainSpeedControl &amp;&amp; mainSpeedControl.value !== discreteSpeedControl.value) {
      mainSpeedControl.value = discreteSpeedControl.value;
      const speedIndex = parseInt(discreteSpeedControl.value);
      mainSpeedDisplay.textContent = speedValues[speedIndex];
      if (window.updateMainAnimation) {
        window.updateMainAnimation();
      }
    }

    // Sync with frame visualization
    const frameSpeedControl = document.getElementById(&apos;frameSpeedControl&apos;);
    const frameSpeedDisplay = document.getElementById(&apos;frameSpeedDisplay&apos;);
    if (frameSpeedControl &amp;&amp; frameSpeedControl.value !== discreteSpeedControl.value) {
      frameSpeedControl.value = discreteSpeedControl.value;
      const speedIndex = parseInt(discreteSpeedControl.value);
      frameSpeedDisplay.textContent = speedValues[speedIndex];
      if (window.updateFrameVisualization) {
        window.updateFrameVisualization();
      }
    }
  });

  // Expose update function
  window.updateDiscreteVisualization = updateDiscreteVisualization;

  // Pause button functionality
  pauseButton.addEventListener(&apos;click&apos;, function() {
    window.updateAllAnimations(!window.globalAnimationsPlaying);
  });

  // Initialize (animations start paused)
  updateDiscreteVisualization();

  // Handle FPS toggle buttons
  const fpsButtons = discreteViewport.parentElement.querySelectorAll(&apos;.fps-button&apos;);
  fpsButtons.forEach(button =&gt; {
    button.addEventListener(&apos;click&apos;, function() {
      const newFps = parseInt(this.dataset.fps);
      window.globalRefreshRate = newFps;

      // Update active state for all FPS buttons
      document.querySelectorAll(&apos;.fps-button&apos;).forEach(btn =&gt; {
        btn.classList.toggle(&apos;active&apos;, btn.dataset.fps === this.dataset.fps);
      });

      // Update visualizations
      updateDiscreteVisualization();
      if (window.updateFrameVisualization) {
        window.updateFrameVisualization();
      }

      // Update description text
      const descFps = document.getElementById(&apos;descriptionFps&apos;);
      const descMs = document.getElementById(&apos;descriptionMs&apos;);
      const descSlowFps = document.getElementById(&apos;descriptionSlowFps&apos;);
      if (descFps) descFps.textContent = newFps + &apos; fps&apos;;
      if (descMs) descMs.textContent = (1000 / newFps).toFixed(1);
      const slowdownFactor = window.getSlowdownFactor();
      if (descSlowFps) descSlowFps.textContent = Math.round(newFps / slowdownFactor);

      // Update all slowdown factors by class
      document.querySelectorAll(&apos;.slowdown-factor&apos;).forEach(el =&gt; {
        el.textContent = slowdownFactor;
      });
      
      // Update number of frames in discrete visualization description
      const descNumFrames = document.getElementById(&apos;descriptionNumFrames&apos;);
      if (descNumFrames) {
        let numFrames;
        switch(newFps) {
          case 30: numFrames = 3; break;
          case 60: numFrames = 6; break;
          case 120: numFrames = 12; break;
          default: numFrames = 12;
        }
        descNumFrames.textContent = numFrames;
      }

      // Update highway blur if it exists
      if (window.renderBlurredSign) {
        window.renderBlurredSign();
      }
    });
  });

  // Cleanup on page unload
  window.addEventListener(&apos;beforeunload&apos;, () =&gt; {
    if (animationInterval) {
      clearInterval(animationInterval);
    }
  });
})();
&lt;/script&gt;

&lt;h2 id=&quot;what-your-moving-eyes-see&quot;&gt;What your moving eyes see&lt;/h2&gt;

&lt;p&gt;When your eyes track moving text on a screen, the actual image passing into your eyes is the text &lt;em&gt;oscillating&lt;/em&gt; repeatedly.&lt;/p&gt;

&lt;div class=&quot;frame-demo-container&quot;&gt;
  &lt;div class=&quot;speed-selector&quot;&gt;
    &lt;label for=&quot;frameSpeedControl&quot;&gt;Text movement speed (pixels/second): &lt;/label&gt;
    &lt;input type=&quot;range&quot; id=&quot;frameSpeedControl&quot; min=&quot;0&quot; max=&quot;8&quot; value=&quot;6&quot; step=&quot;1&quot; list=&quot;frameSpeedValues&quot; /&gt;
    &lt;datalist id=&quot;frameSpeedValues&quot;&gt;
      &lt;option value=&quot;0&quot; label=&quot;0&quot;&gt;&lt;/option&gt;
      &lt;option value=&quot;1&quot; label=&quot;60&quot;&gt;&lt;/option&gt;
      &lt;option value=&quot;2&quot; label=&quot;120&quot;&gt;&lt;/option&gt;
      &lt;option value=&quot;3&quot; label=&quot;300&quot;&gt;&lt;/option&gt;
      &lt;option value=&quot;4&quot; label=&quot;600&quot;&gt;&lt;/option&gt;
      &lt;option value=&quot;5&quot; label=&quot;900&quot;&gt;&lt;/option&gt;
      &lt;option value=&quot;6&quot; label=&quot;1200&quot;&gt;&lt;/option&gt;
      &lt;option value=&quot;7&quot; label=&quot;1500&quot;&gt;&lt;/option&gt;
      &lt;option value=&quot;8&quot; label=&quot;2000&quot;&gt;&lt;/option&gt;
    &lt;/datalist&gt;
    &lt;span id=&quot;frameSpeedDisplay&quot;&gt;1200&lt;/span&gt;
    &lt;button id=&quot;pauseButton3&quot; class=&quot;control-button pause-button paused&quot;&gt;Play animations&lt;/button&gt;
  &lt;/div&gt;

  &lt;div class=&quot;fps-selector&quot;&gt;
    &lt;span class=&quot;fps-label&quot;&gt;Simulated display refresh rate:&lt;/span&gt;
    &lt;button class=&quot;fps-button&quot; data-fps=&quot;30&quot;&gt;30 fps&lt;/button&gt;
    &lt;button class=&quot;fps-button&quot; data-fps=&quot;60&quot;&gt;60 fps&lt;/button&gt;
    &lt;button class=&quot;fps-button active&quot; data-fps=&quot;120&quot;&gt;120 fps&lt;/button&gt;
  &lt;/div&gt;

  &lt;div class=&quot;visualizations-grid&quot;&gt;
    &lt;div class=&quot;visualization-section&quot;&gt;
      &lt;p&gt;Actual time-varying input to your retina during &quot;smooth pursuit&quot; (slowed &lt;span class=&quot;slowdown-factor&quot;&gt;8&lt;/span&gt;x)&lt;/p&gt;
      &lt;div class=&quot;frame-viewport&quot;&gt;
        &lt;div class=&quot;frame-text&quot; id=&quot;frameText&quot;&gt;The quick brown fox&lt;/div&gt;
        &lt;div class=&quot;paused-overlay&quot; id=&quot;framePausedOverlay&quot;&gt;
          &lt;button class=&quot;play-overlay-button&quot; id=&quot;playOverlayButton&quot;&gt;
            &lt;svg width=&quot;48&quot; height=&quot;48&quot; viewBox=&quot;0 0 48 48&quot; fill=&quot;none&quot; xmlns=&quot;http://www.w3.org/2000/svg&quot;&gt;
              &lt;circle cx=&quot;24&quot; cy=&quot;24&quot; r=&quot;23&quot; stroke=&quot;currentColor&quot; stroke-width=&quot;2&quot; /&gt;
              &lt;path d=&quot;M19 16L32 24L19 32V16Z&quot; fill=&quot;currentColor&quot; stroke=&quot;currentColor&quot; stroke-width=&quot;2&quot; stroke-linejoin=&quot;round&quot; /&gt;
            &lt;/svg&gt;
          &lt;/button&gt;
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;

    &lt;div class=&quot;visualization-section&quot;&gt;
      &lt;p&gt;Actual &quot;static&quot; input to retina during &quot;smooth pursuit&quot; (average pixel at each position)&lt;/p&gt;
      &lt;div class=&quot;blur-viewport&quot; id=&quot;blurViewport&quot;&gt;
        &lt;canvas id=&quot;blurCanvas&quot; width=&quot;800&quot; height=&quot;80&quot;&gt;&lt;/canvas&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;style&gt;
.frame-demo-container {
  margin: 2em 0;
  padding: 1.5em;
  background: #f9f9f9;
  border: 1px solid #ddd;
  border-radius: 8px;
}

.speed-selector {
  margin-bottom: 1.5em;
}

.speed-selector label {
  font-weight: bold;
}


#frameSpeedDisplay {
  min-width: 40px;
  font-weight: bold;
}


.visualizations-grid {
  display: grid;
  grid-template-columns: 1fr 1fr;
  gap: 2em;
}

.visualization-section h4 {
  margin-bottom: 1em;
  font-size: 16px;
  color: #333;
}

@media (max-width: 1024px) {
  .visualizations-grid {
    grid-template-columns: 1fr;
    gap: 1em;
  }
}

.frame-viewport {
  position: relative;
  height: 80px;
  background: white;
  border: 1px solid #e0e0e0;
  overflow: hidden;
  margin-bottom: 1em;
  cursor: pointer;
}

.frame-text {
  position: absolute;
  top: 50%;
  left: 50%;
  transform: translateY(-50%) translateX(-50%);
  font-size: 24px;
  font-family: Georgia, serif;
  color: #333;
  white-space: nowrap;
  transition: none;
}

@media (max-width: 768px) {
  .frame-text {
    font-size: 18px;
  }
}

.blur-info, .discrete-info {
  font-size: 14px;
  color: #666;
}

.discrete-frames-viewport {
  position: relative;
  height: 80px;
  background: white;
  border: 1px solid #e0e0e0;
  overflow: hidden;
}

.discrete-frame {
  position: absolute;
  top: 50%;
  left: 50%;
  transform: translateY(-50%) translateX(-50%);
  font-size: 24px;
  font-family: Georgia, serif;
  white-space: nowrap;
  padding: 8px 12px;
  border: 2px solid rgba(255, 0, 0, 0.2);
  border-radius: 4px;
  color: transparent;
  transition: none;
  z-index: 1;
}

@media (max-width: 768px) {
  .discrete-frame {
    font-size: 18px;
    padding: 6px 10px;
  }
}

.discrete-frame.active {
  color: #333;
  background: rgba(255, 255, 255, 0.95);
  border-color: rgba(255, 0, 0, 0.8);
  border-width: 3px;
  box-shadow: 0 0 0 1px rgba(255, 0, 0, 0.3);
  z-index: 10;
}


.blur-viewport {
  position: relative;
  height: 80px;
  background: white;
  border: 1px solid #e0e0e0;
  overflow: hidden;
}

#blurCanvas {
  display: block;
  width: 100%;
  height: 100%;
}

@keyframes singleFrame {
  0% {
    transform: translateY(-50%) translateX(calc(-50% - var(--pixel-shift) / 2));
  }
  90% {
    transform: translateY(-50%) translateX(calc(-50% + var(--pixel-shift) / 2));
  }
  90.01% {
    transform: translateY(-50%) translateX(calc(-50% - var(--pixel-shift) / 2));
  }
  100% {
    transform: translateY(-50%) translateX(calc(-50% - var(--pixel-shift) / 2));
  }
}

.frame-text.animating {
  animation: singleFrame var(--frame-duration) linear infinite;
}

.paused-overlay {
  position: absolute;
  top: 0;
  left: 0;
  right: 0;
  bottom: 0;
  background: rgba(255, 255, 255, 0.85);
  display: none;
  align-items: center;
  justify-content: center;
  z-index: 20;
}

.paused-overlay.visible {
  display: flex;
}

.play-overlay-button {
  background: white;
  border: none;
  padding: 1em;
  border-radius: 50%;
  box-shadow: 0 2px 8px rgba(0, 0, 0, 0.15);
  cursor: pointer;
  color: #3498db;
  transition: all 0.2s;
}

.play-overlay-button:hover {
  transform: scale(1.1);
  box-shadow: 0 4px 12px rgba(0, 0, 0, 0.2);
  color: #2980b9;
}

.play-overlay-button svg {
  display: block;
}
&lt;/style&gt;

&lt;script&gt;
(function() {
  const frameSpeedControl = document.getElementById(&apos;frameSpeedControl&apos;);
  const frameSpeedDisplay = document.getElementById(&apos;frameSpeedDisplay&apos;);
  const frameText = document.getElementById(&apos;frameText&apos;);
  const blurCanvas = document.getElementById(&apos;blurCanvas&apos;);
  const pauseButton = document.getElementById(&apos;pauseButton3&apos;);
  const pausedOverlay = document.getElementById(&apos;framePausedOverlay&apos;);
  const playOverlayButton = document.getElementById(&apos;playOverlayButton&apos;);

  function calculatePixelsPerFrame(speed) {
    // Logical pixels per second / refresh rate = pixels per frame
    return speed / window.globalRefreshRate;
  }

  function renderBlurredText(speed) {
    const pixelsPerFrame = calculatePixelsPerFrame(speed);

    // Set canvas size to match container
    const rect = blurCanvas.getBoundingClientRect();
    const scale = window.devicePixelRatio || 1;
    blurCanvas.width = rect.width * scale;
    blurCanvas.height = rect.height * scale;
    blurCanvas.style.width = rect.width + &apos;px&apos;;
    blurCanvas.style.height = rect.height + &apos;px&apos;;

    const ctx = blurCanvas.getContext(&apos;2d&apos;);
    ctx.scale(scale, scale);

    // Clear canvas with white
    ctx.fillStyle = &apos;white&apos;;
    ctx.fillRect(0, 0, rect.width, rect.height);

    // Set up text rendering
    const fontSize = window.innerWidth &lt;= 768 ? 18 : 24;
    ctx.font = `${fontSize}px Georgia, serif`;
    ctx.textAlign = &apos;center&apos;;
    ctx.textBaseline = &apos;middle&apos;;
    ctx.fillStyle = &apos;#333&apos;;

    if (speed === 0 || pixelsPerFrame &lt; 0.1) {
      // Just draw the text centered when stopped or barely moving
      ctx.fillText(&apos;The quick brown fox&apos;, rect.width / 2, rect.height / 2);
      return;
    }

    // For true pixel averaging, we need to render each position and average the results
    const samples = Math.max(2, Math.min(64, Math.ceil(pixelsPerFrame * 4)));

    // Create an offscreen canvas for accumulating pixel data
    const offscreenCanvas = document.createElement(&apos;canvas&apos;);
    offscreenCanvas.width = blurCanvas.width;
    offscreenCanvas.height = blurCanvas.height;
    const offscreenCtx = offscreenCanvas.getContext(&apos;2d&apos;);
    offscreenCtx.scale(scale, scale);

    // Set up accumulator for pixel data
    const width = blurCanvas.width;
    const height = blurCanvas.height;
    const accumulator = new Float32Array(width * height * 4); // RGBA channels

    // Render text at each position and accumulate pixel values
    for (let i = 0; i &lt; samples; i++) {
      // Clear offscreen canvas
      offscreenCtx.fillStyle = &apos;white&apos;;
      offscreenCtx.fillRect(0, 0, rect.width, rect.height);

      // Draw text at this position
      offscreenCtx.font = `${fontSize}px Georgia, serif`;
      offscreenCtx.textAlign = &apos;center&apos;;
      offscreenCtx.textBaseline = &apos;middle&apos;;
      offscreenCtx.fillStyle = &apos;#333&apos;;

      const offset = (i / (samples - 1)) * pixelsPerFrame - pixelsPerFrame / 2;
      const x = rect.width / 2 + offset;
      offscreenCtx.fillText(&apos;The quick brown fox&apos;, x, rect.height / 2);

      // Get pixel data
      const imageData = offscreenCtx.getImageData(0, 0, width, height);
      const pixels = imageData.data;

      // Accumulate pixel values
      for (let j = 0; j &lt; pixels.length; j++) {
        accumulator[j] += pixels[j];
      }
    }

    // Create averaged image data
    const averagedImageData = ctx.createImageData(width, height);
    const averagedPixels = averagedImageData.data;

    // Average the accumulated values
    for (let i = 0; i &lt; accumulator.length; i++) {
      averagedPixels[i] = Math.round(accumulator[i] / samples);
    }

    // Put the averaged image back on the main canvas
    ctx.putImageData(averagedImageData, 0, 0);
  }

  function updateVisualization() {
    const speedIndex = parseInt(frameSpeedControl.value);
    const speed = speedValues[speedIndex];
    frameSpeedDisplay.textContent = speed;

    const pixelsPerFrame = calculatePixelsPerFrame(speed);

    // Reset animation
    frameText.classList.remove(&apos;animating&apos;);

    // Render the blurred text
    renderBlurredText(speed);

    if (speed === 0) {
      // No animation when speed is 0
      pausedOverlay.classList.remove(&apos;visible&apos;);
      return;
    }

    // Set pixel shift for single frame animation
    frameText.style.setProperty(&apos;--pixel-shift&apos;, `${pixelsPerFrame}px`);
    // Set animation duration based on refresh rate (dynamic slowdown)
    const slowdownFactor = window.getSlowdownFactor();
    const frameDuration = (1000 / window.globalRefreshRate) * slowdownFactor;
    frameText.style.setProperty(&apos;--frame-duration&apos;, `${frameDuration}ms`);

    // Restart animation only if globally playing
    void frameText.offsetWidth; // Force reflow
    if (window.globalAnimationsPlaying) {
      frameText.classList.add(&apos;animating&apos;);
      pausedOverlay.classList.remove(&apos;visible&apos;);
    } else {
      pausedOverlay.classList.add(&apos;visible&apos;);
    }
  }

  // Expose update function globally for sync
  window.updateFrameVisualization = updateVisualization;

  // Pause button functionality
  pauseButton.addEventListener(&apos;click&apos;, function() {
    window.updateAllAnimations(!window.globalAnimationsPlaying);
  });

  // Handle speed control changes
  frameSpeedControl.addEventListener(&apos;input&apos;, function() {
    updateVisualization();
    // Sync with the main animation control (don&apos;t trigger event to avoid loop)
    const mainSpeedControl = document.getElementById(&apos;speedControl&apos;);
    const mainSpeedDisplay = document.getElementById(&apos;speedDisplay&apos;);
    if (mainSpeedControl &amp;&amp; mainSpeedControl.value !== frameSpeedControl.value) {
      mainSpeedControl.value = frameSpeedControl.value;
      const speedIndex = parseInt(frameSpeedControl.value);
      mainSpeedDisplay.textContent = speedValues[speedIndex];
      // Call the update function directly
      if (window.updateMainAnimation) {
        window.updateMainAnimation();
      }
    }

    // Sync with discrete visualization
    const discreteSpeedControl = document.getElementById(&apos;discreteSpeedControl&apos;);
    const discreteSpeedDisplay = document.getElementById(&apos;discreteSpeedDisplay&apos;);
    if (discreteSpeedControl &amp;&amp; discreteSpeedControl.value !== frameSpeedControl.value) {
      discreteSpeedControl.value = frameSpeedControl.value;
      const speedIndex = parseInt(frameSpeedControl.value);
      discreteSpeedDisplay.textContent = speedValues[speedIndex];
      if (window.updateDiscreteVisualization) {
        window.updateDiscreteVisualization();
      }
    }
  });

  // Initialize (animations start paused)
  updateVisualization();
  // Show paused overlay if speed &gt; 0
  const initialSpeedIndex = parseInt(frameSpeedControl.value);
  if (speedValues[initialSpeedIndex] &gt; 0) {
    pausedOverlay.classList.add(&apos;visible&apos;);
  }

  // Handle play overlay button
  if (playOverlayButton) {
    playOverlayButton.addEventListener(&apos;click&apos;, function(e) {
      e.stopPropagation();
      window.updateAllAnimations(true);
    });
  }

  // Click viewport to pause
  const frameViewport = document.querySelector(&apos;.frame-viewport&apos;);
  if (frameViewport) {
    frameViewport.addEventListener(&apos;click&apos;, function(e) {
      // Only pause if animations are playing and we&apos;re not clicking the play button
      if (window.globalAnimationsPlaying &amp;&amp; !e.target.closest(&apos;.play-overlay-button&apos;)) {
        window.updateAllAnimations(false);
      }
    });
  }

  // Handle FPS toggle buttons
  const frameContainer = document.querySelector(&apos;.frame-demo-container&apos;);
  const fpsButtons = frameContainer.querySelectorAll(&apos;.fps-button&apos;);
  fpsButtons.forEach(button =&gt; {
    button.addEventListener(&apos;click&apos;, function() {
      const newFps = parseInt(this.dataset.fps);
      window.globalRefreshRate = newFps;

      // Update active state for all FPS buttons
      document.querySelectorAll(&apos;.fps-button&apos;).forEach(btn =&gt; {
        btn.classList.toggle(&apos;active&apos;, btn.dataset.fps === this.dataset.fps);
      });

      // Update visualizations
      updateVisualization();
      if (window.updateDiscreteVisualization) {
        window.updateDiscreteVisualization();
      }

      // Update description text
      const descFps = document.getElementById(&apos;descriptionFps&apos;);
      const descMs = document.getElementById(&apos;descriptionMs&apos;);
      const descSlowFps = document.getElementById(&apos;descriptionSlowFps&apos;);
      if (descFps) descFps.textContent = newFps + &apos; fps&apos;;
      if (descMs) descMs.textContent = (1000 / newFps).toFixed(1);
      const slowdownFactor = window.getSlowdownFactor();
      if (descSlowFps) descSlowFps.textContent = Math.round(newFps / slowdownFactor);

      // Update all slowdown factors by class
      document.querySelectorAll(&apos;.slowdown-factor&apos;).forEach(el =&gt; {
        el.textContent = slowdownFactor;
      });
      
      // Update number of frames in discrete visualization description
      const descNumFrames = document.getElementById(&apos;descriptionNumFrames&apos;);
      if (descNumFrames) {
        let numFrames;
        switch(newFps) {
          case 30: numFrames = 3; break;
          case 60: numFrames = 6; break;
          case 120: numFrames = 12; break;
          default: numFrames = 12;
        }
        descNumFrames.textContent = numFrames;
      }

      // Update highway blur if it exists
      if (window.renderBlurredSign) {
        window.renderBlurredSign();
      }
    });
  });

  // Handle window resize
  window.addEventListener(&apos;resize&apos;, function() {
    const speedIndex = parseInt(frameSpeedControl.value);
    renderBlurredText(speedValues[speedIndex]);
  });
})();
&lt;/script&gt;

&lt;p&gt;This phenomenon is called &lt;a href=&quot;https://en.wikipedia.org/wiki/Display_motion_blur&quot;&gt;display motion blur&lt;/a&gt;. Motion is much more pleasant in the real world than on screens.&lt;/p&gt;

&lt;h2 id=&quot;designing-for-this&quot;&gt;Designing for this&lt;/h2&gt;

&lt;p&gt;In the age of 120 frames-per-second ProMotion™ Retina™ displays, you may have the impression that screens are now good at showing motion. Nope.&lt;/p&gt;

&lt;p&gt;As we try to build “buttery-smooth” animations, we need to be aware that it’s a fool’s errand. Every motion animation that is much faster than 1 pixel per frame is going to look terrible if you actually pay attention to it. The upgrade from 60 fps to 120 fps doesn’t make motion animations good, it just makes them less bad. For designers, this is an important constraint; we should avoid creating situations where the user tracks a moving object. These animations can happen, but users shouldn’t be paying too much attention to the moving objects, or they will become unhappy. Until we invent screens that move with your eyes, motion is going to look ugly on screens.&lt;/p&gt;

&lt;h2 id=&quot;motion-in-movies&quot;&gt;Motion in movies&lt;/h2&gt;

&lt;p&gt;In these animations, your &lt;span class=&quot;computerType&quot;&gt;computer&lt;/span&gt; is generating individual discrete frames. In video recordings, the camera stores an aggregate blurred image for each frame. So, videos have pre-blurred images; they essentially render the blurry version of “The quick brown fox” above. Like software designers, filmmakers have to design around this; they need to avoid situations where viewers are tracking fast-moving objects.&lt;/p&gt;

&lt;p&gt;Pre-blurring the images obviously doesn’t solve the problem of making the image look realistic and readable. It does solve a different problem: making the animation look smooth and blurry, rather than choppy. Game developers intentionally include “motion blur” in games, rendering multiple positions of an object into a single frame to simulate the film effect. This blurs the image, but it also makes it more obvious what motion occurred, which is important in fast-moving video games, where it’s more important to see where an object went than to show its fine detail in individual frames.&lt;/p&gt;

&lt;p&gt;An interesting observation of this blog post is: in video games, even without motion blur, you’ll still perceive moving objects as blurry, if you’re viewing them on a screen.&lt;/p&gt;

&lt;h2 id=&quot;attaching-a-screen-to-your-face&quot;&gt;Attaching a screen to your face&lt;/h2&gt;

&lt;p&gt;Screens are incompatible with smooth pursuit, and this has implications for VR / AR / “spatial computing”, since anything that gets rendered into the world will be blurry while you move.&lt;/p&gt;

&lt;p&gt;What would happen if you wear an Apple Vision Pro while driving? Let’s assume it works perfectly, with a perfect resolution screen and perfect passthrough.&lt;/p&gt;

&lt;div class=&quot;highway-sign-container&quot;&gt;
  &lt;p class=&quot;highway-intro&quot;&gt;Each of these frames is a clear picture of a sign, and yet you perceive a blurry one, because the screen is not behaving like reality.&lt;/p&gt;
  &lt;div class=&quot;highway-fullscreen&quot;&gt;
    &lt;div class=&quot;highway-viewport&quot;&gt;
      &lt;div class=&quot;highway-perspective&quot;&gt;
        &lt;img src=&quot;/images/2025-08-03-highway-sign.png&quot; alt=&quot;Highway sign&quot; class=&quot;highway-sign&quot; id=&quot;highwaySign&quot; /&gt;
      &lt;/div&gt;
      &lt;div class=&quot;animation-overlay&quot; id=&quot;highwayOverlay&quot;&gt;
        &lt;button class=&quot;play-button&quot; aria-label=&quot;Play animation&quot;&gt;
          &lt;svg width=&quot;80&quot; height=&quot;80&quot; viewBox=&quot;0 0 80 80&quot;&gt;
            &lt;circle cx=&quot;40&quot; cy=&quot;40&quot; r=&quot;39&quot; fill=&quot;rgba(255,255,255,0.9)&quot; stroke=&quot;#333&quot; stroke-width=&quot;2&quot; /&gt;
            &lt;path d=&quot;M30 25 L30 55 L55 40 Z&quot; fill=&quot;#333&quot; /&gt;
          &lt;/svg&gt;
        &lt;/button&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;

  &lt;div class=&quot;highway-visualizations&quot;&gt;
    &lt;h4&gt;Actual input images to your retina when you track the sign&lt;/h4&gt;

    &lt;div class=&quot;highway-visualizations-grid&quot;&gt;
      &lt;div class=&quot;highway-visualization-section&quot;&gt;
        &lt;p&gt;Actual time-varying input to your retina (slowed &lt;span class=&quot;slowdown-factor&quot;&gt;8&lt;/span&gt;x)&lt;/p&gt;
        &lt;div class=&quot;highway-animated-viewport&quot;&gt;
          &lt;img src=&quot;/images/2025-08-03-highway-sign.png&quot; alt=&quot;Highway sign&quot; class=&quot;highway-animated-sign&quot; id=&quot;highwayAnimatedSign&quot; /&gt;
          &lt;div class=&quot;animation-overlay&quot; id=&quot;highwayAnimatedOverlay&quot;&gt;
            &lt;button class=&quot;play-button&quot; aria-label=&quot;Play animation&quot;&gt;
              &lt;svg width=&quot;60&quot; height=&quot;60&quot; viewBox=&quot;0 0 60 60&quot;&gt;
                &lt;circle cx=&quot;30&quot; cy=&quot;30&quot; r=&quot;29&quot; fill=&quot;rgba(255,255,255,0.9)&quot; stroke=&quot;#333&quot; stroke-width=&quot;2&quot; /&gt;
                &lt;path d=&quot;M24 20 L24 40 L40 30 Z&quot; fill=&quot;#333&quot; /&gt;
              &lt;/svg&gt;
            &lt;/button&gt;
          &lt;/div&gt;
        &lt;/div&gt;
      &lt;/div&gt;

      &lt;div class=&quot;highway-visualization-section&quot;&gt;
        &lt;p&gt;Average pixel at each position&lt;/p&gt;
        &lt;div class=&quot;highway-blur-viewport&quot;&gt;
          &lt;canvas id=&quot;highwayBlurCanvas&quot; width=&quot;400&quot; height=&quot;200&quot;&gt;&lt;/canvas&gt;
        &lt;/div&gt;
      &lt;/div&gt;

      &lt;div class=&quot;highway-visualization-section&quot;&gt;
        &lt;p&gt;What you would see if you take the VR headset off&lt;/p&gt;
        &lt;div class=&quot;highway-clear-viewport&quot;&gt;
          &lt;img src=&quot;/images/2025-08-03-highway-sign.png&quot; alt=&quot;Highway sign&quot; class=&quot;highway-clear-sign&quot; id=&quot;highwayClearSign&quot; /&gt;
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;

  &lt;div class=&quot;highway-controls-group&quot;&gt;
    &lt;table style=&quot;width: 100%; border-spacing: 0 0.5em;&quot;&gt;
      &lt;tr&gt;
        &lt;td style=&quot;text-align: right; padding-right: 1em; font-weight: bold; color: #555; white-space: nowrap;&quot;&gt;Sign distance:&lt;/td&gt;
        &lt;td&gt;
          &lt;button class=&quot;distance-button&quot; data-distance=&quot;far&quot;&gt;Far&lt;/button&gt;
          &lt;button class=&quot;distance-button&quot; data-distance=&quot;medium&quot;&gt;Medium&lt;/button&gt;
          &lt;button class=&quot;distance-button active&quot; data-distance=&quot;near&quot;&gt;Near&lt;/button&gt;
        &lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td style=&quot;text-align: right; padding-right: 1em; font-weight: bold; color: #555; white-space: nowrap;&quot;&gt;Simulated display refresh rate:&lt;/td&gt;
        &lt;td&gt;
          &lt;button class=&quot;fps-button&quot; data-fps=&quot;30&quot;&gt;30 fps&lt;/button&gt;
          &lt;button class=&quot;fps-button&quot; data-fps=&quot;60&quot;&gt;60 fps&lt;/button&gt;
          &lt;button class=&quot;fps-button active&quot; data-fps=&quot;120&quot;&gt;120 fps&lt;/button&gt;
        &lt;/td&gt;
      &lt;/tr&gt;
    &lt;/table&gt;
  &lt;/div&gt;

&lt;/div&gt;

&lt;style&gt;
.highway-sign-container {
  margin: 2em 0;
  padding: 1.5em;
  background: #f9f9f9;
  border: 1px solid #ddd;
  border-radius: 8px;
}

.highway-intro {
  margin: 0 0 1.5em 0;
  color: #555;
  text-align: center;
}

.highway-controls {
  margin-bottom: 1.5em;
  display: flex;
  align-items: center;
  gap: 1em;
}

.highway-controls label {
  font-weight: bold;
}

.highway-fullscreen {
  position: relative;
  left: 50%;
  right: 50%;
  margin-left: -50vw;
  margin-right: -50vw;
  width: 100vw;
  margin-top: 1.5em;
  margin-bottom: 1.5em;
}

.highway-viewport {
  position: relative;
  height: 400px;
  background: linear-gradient(to bottom, #87CEEB 0%, #87CEEB 50%, #888888 50%, #888888 100%);
  overflow: hidden;
  border-top: 1px solid #e0e0e0;
  border-bottom: 1px solid #e0e0e0;
  perspective: 800px;
  perspective-origin: 50% 50%;
  width: 100vw;
  cursor: pointer;
}

.highway-perspective {
  position: absolute;
  width: 100%;
  height: 100%;
  transform-style: preserve-3d;
}

.highway-sign {
  position: absolute;
  width: 200px; /* Smaller sign */
  height: auto;
  top: 20%;
  left: 70%; /* Further to the right */
  transform: translateX(-50%) translateZ(0);
  transform-origin: center center;
}

.highway-sign.animating {
  animation: approachSign var(--sign-duration) linear infinite;
}

@keyframes approachSign {
  0% {
    transform: translateX(-50%) translateZ(-2000px) scale(0.1);
    opacity: 0.5;
  }
  80% {
    transform: translateX(50%) translateZ(0px) scale(1);
    opacity: 1;
  }
  100% {
    transform: translateX(150%) translateZ(200px) scale(1.5);
    opacity: 0;
  }
}

.highway-viewport .animation-overlay {
  position: absolute;
  top: 0;
  left: 0;
  right: 0;
  bottom: 0;
  background: rgba(255, 255, 255, 0.7);
  display: flex;
  align-items: center;
  justify-content: center;
  cursor: pointer;
  z-index: 10;
  transition: opacity 0.3s ease;
}

.highway-viewport .animation-overlay.playing {
  opacity: 0;
  pointer-events: none;
}

.highway-viewport .play-button {
  background: none;
  border: none;
  cursor: pointer;
  padding: 0;
  transition: transform 0.2s ease;
}

.highway-viewport .play-button:hover {
  transform: scale(1.1);
}

.highway-visualizations {
  margin-top: 2em;
  text-align: center;
}

.highway-visualizations h4 {
  margin-bottom: 1em;
  font-size: 1.1em;
  color: #333;
}

.highway-controls-group {
  margin: 1.5em auto;
  max-width: 700px;
}

.distance-button,
.fps-button {
  padding: 0.4em 1em;
  border: 1px solid #ddd;
  background: white;
  border-radius: 4px;
  cursor: pointer;
  transition: all 0.2s;
  font-size: 14px;
}

.distance-button:hover,
.fps-button:hover {
  background: #f0f0f0;
}

.distance-button.active,
.fps-button.active {
  background: #3498db;
  color: white;
  border-color: #3498db;
}

.distance-button.active:hover,
.fps-button.active:hover {
  background: #2980b9;
  border-color: #2980b9;
}

.highway-visualizations-grid {
  display: grid;
  grid-template-columns: 1fr 1fr 1fr;
  gap: 1.5em;
  margin-bottom: 1em;
}

.highway-visualization-section {
  text-align: center;
}

.highway-visualization-section p {
  margin: 0 0 0.5em 0;
  font-weight: bold;
  color: #555;
}

.highway-animated-viewport {
  position: relative;
  height: 200px;
  background: white;
  border: 1px solid #e0e0e0;
  overflow: hidden;
  display: flex;
  align-items: center;
  justify-content: center;
  cursor: pointer;
}

.highway-animated-viewport .animation-overlay {
  position: absolute;
  top: 0;
  left: 0;
  right: 0;
  bottom: 0;
  background: rgba(255, 255, 255, 0.7);
  display: flex;
  align-items: center;
  justify-content: center;
  cursor: pointer;
  z-index: 10;
  transition: opacity 0.3s ease;
}

.highway-animated-viewport .animation-overlay.playing {
  opacity: 0;
  pointer-events: none;
}

.highway-animated-viewport .play-button {
  background: none;
  border: none;
  cursor: pointer;
  padding: 0;
  transition: transform 0.2s ease;
}

.highway-animated-viewport .play-button:hover {
  transform: scale(1.1);
}

.highway-animated-sign {
  position: absolute;
  height: 80px;
  width: auto;
}

.highway-animated-sign.animating {
  animation: signOscillate var(--sign-oscillate-duration) linear infinite;
}

@keyframes signOscillate {
  0% {
    /* Start slightly off center to show motion */
    transform: translateX(calc(var(--sign-oscillate-distance) * 0.1)) translateY(calc(var(--sign-oscillate-distance) * 0.3 * -0.1));
  }
  90% {
    /* End slightly off center in opposite direction */
    transform: translateX(calc(var(--sign-oscillate-distance) * -0.1)) translateY(calc(var(--sign-oscillate-distance) * 0.3 * 0.1));
  }
  90.01% {
    /* Jump back to start */
    transform: translateX(calc(var(--sign-oscillate-distance) * 0.1)) translateY(calc(var(--sign-oscillate-distance) * 0.3 * -0.1));
  }
  100% {
    /* Stay at start */
    transform: translateX(calc(var(--sign-oscillate-distance) * 0.1)) translateY(calc(var(--sign-oscillate-distance) * 0.3 * -0.1));
  }
}

.highway-blur-viewport {
  position: relative;
  height: 200px;
  background: white;
  border: 1px solid #e0e0e0;
  overflow: hidden;
}

.highway-clear-viewport {
  position: relative;
  height: 200px;
  background: white;
  border: 1px solid #e0e0e0;
  overflow: hidden;
  display: flex;
  align-items: center;
  justify-content: center;
}

.highway-clear-sign {
  height: 80px;
  width: auto;
}

#highwayBlurCanvas {
  display: block;
  width: 100%;
  height: 100%;
}

.highway-blur-info {
  font-size: 14px;
  color: #666;
  margin: 0;
}

.highway-info {
  margin-top: 1.5em;
  text-align: center;
}

.highway-info p {
  margin: 0.5em 0;
  font-size: 14px;
  color: #666;
}

.highway-warning {
  color: #e74c3c;
  font-weight: bold;
  font-size: 16px !important;
}

@media (max-width: 768px) {
  .highway-controls {
    flex-wrap: wrap;
    justify-content: center;
  }

  .highway-controls label {
    flex: 0 0 auto;
  }

  .highway-controls .control-button {
    flex: 0 0 auto;
  }

  .highway-viewport {
    height: 300px;
  }

  .highway-sign {
    width: 200px;
  }

  .highway-visualizations-grid {
    grid-template-columns: 1fr;
    gap: 1.5em;
  }

  .highway-visualization-section p {
    font-size: 14px;
  }

  .highway-controls-group {
    padding: 0 1em;
  }

  .highway-controls-group table {
    display: block;
  }

  .highway-controls-group tr {
    display: flex;
    flex-direction: column;
    margin-bottom: 0.5em;
  }

  .highway-controls-group td:first-child {
    text-align: left !important;
    margin-bottom: 0.3em;
    font-size: 14px;
  }

  .highway-controls-group td:last-child {
    display: flex;
    justify-content: center;
    flex-wrap: wrap;
    gap: 0.3em;
  }

  .distance-button,
  .fps-button {
    padding: 0.3em 0.8em;
    font-size: 13px;
  }
}
&lt;/style&gt;

&lt;script&gt;
(function() {
  const highwaySign = document.getElementById(&apos;highwaySign&apos;);
  const highwayViewport = document.querySelector(&apos;.highway-viewport&apos;);
  const highwayAnimatedSign = document.getElementById(&apos;highwayAnimatedSign&apos;);
  const highwayClearSign = document.getElementById(&apos;highwayClearSign&apos;);
  const highwayAnimatedOverlay = document.getElementById(&apos;highwayAnimatedOverlay&apos;);

  const speedMph = 180; // Fixed driving speed in mph
  const signDistance = 400; // feet to travel
  const blurCanvas = document.getElementById(&apos;highwayBlurCanvas&apos;);

  // Distance configurations
  // At 180 mph (264 ft/s), calculate angular velocity at different distances
  // Assuming sign is 10 feet wide and we&apos;re looking perpendicular to the road
  const distanceConfigs = {
    far: {
      scale: 0.3,
      distance: 500, // feet from road
      getPixelsPerSecond: function(viewportWidth) {
        // Angular velocity = speed / distance (in radians/second)
        // Convert to pixels based on field of view (~50 degrees = viewport width)
        const angularVelocity = (speedMph * 5280 / 3600) / this.distance; // rad/s
        const pixelsPerRadian = viewportWidth / (50 * Math.PI / 180); // pixels per radian
        return angularVelocity * pixelsPerRadian;
      }
    },
    medium: {
      scale: 0.6,
      distance: 200, // feet from road
      getPixelsPerSecond: function(viewportWidth) {
        const angularVelocity = (speedMph * 5280 / 3600) / this.distance;
        const pixelsPerRadian = viewportWidth / (50 * Math.PI / 180);
        return angularVelocity * pixelsPerRadian;
      }
    },
    near: {
      scale: 1.0,
      distance: 100, // feet from road
      getPixelsPerSecond: function(viewportWidth) {
        const angularVelocity = (speedMph * 5280 / 3600) / this.distance;
        const pixelsPerRadian = viewportWidth / (50 * Math.PI / 180);
        return angularVelocity * pixelsPerRadian;
      }
    }
  };
  let currentDistance = &apos;near&apos;;

  function calculatePixelsPerFrame(speedMph) {
    // Calculate based on the actual animation
    // The sign moves from -50% to 150% of the viewport width (200% total)
    // Animation duration is signDistance / speedFps
    const speedFps = speedMph * 5280 / 3600; // feet per second
    const duration = signDistance / speedFps; // seconds

    // Get viewport width
    const viewportWidth = window.innerWidth;
    const totalPixelsToMove = viewportWidth * 2; // 200% of viewport

    const pixelsPerSecond = totalPixelsToMove / duration;
    return pixelsPerSecond / window.globalRefreshRate;
  }

  function renderBlurredSign() {
    const config = distanceConfigs[currentDistance];

    // Set canvas size to match container
    const rect = blurCanvas.getBoundingClientRect();
    const scale = window.devicePixelRatio || 1;

    // Calculate pixels per frame based on angular velocity and refresh rate
    const pixelsPerSecond = config.getPixelsPerSecond(rect.width);
    const pixelsPerFrame = pixelsPerSecond / window.globalRefreshRate;
    blurCanvas.width = rect.width * scale;
    blurCanvas.height = rect.height * scale;
    blurCanvas.style.width = rect.width + &apos;px&apos;;
    blurCanvas.style.height = rect.height + &apos;px&apos;;

    const ctx = blurCanvas.getContext(&apos;2d&apos;);
    ctx.scale(scale, scale);

    // Clear canvas
    ctx.fillStyle = &apos;white&apos;;
    ctx.fillRect(0, 0, rect.width, rect.height);

    // Since speed is fixed at 180 mph, we always show motion blur

    // For motion blur, accumulate multiple positions
    const samples = Math.max(2, Math.min(64, Math.ceil(pixelsPerFrame)));

    // Create offscreen canvas for accumulation
    const offscreenCanvas = document.createElement(&apos;canvas&apos;);
    offscreenCanvas.width = blurCanvas.width;
    offscreenCanvas.height = blurCanvas.height;
    const offscreenCtx = offscreenCanvas.getContext(&apos;2d&apos;);
    offscreenCtx.scale(scale, scale);

    // Accumulator for pixel data
    const width = blurCanvas.width;
    const height = blurCanvas.height;
    const accumulator = new Float32Array(width * height * 4);

    // Load and render the sign at multiple positions
    const img = new Image();
    img.onload = function() {
      const baseSignWidth = 150;
      const signWidth = baseSignWidth * config.scale;
      const signHeight = signWidth * (img.height / img.width);
      const centerY = rect.height / 2 - signHeight / 2;

      for (let i = 0; i &lt; samples; i++) {
        // Clear offscreen canvas
        offscreenCtx.fillStyle = &apos;white&apos;;
        offscreenCtx.fillRect(0, 0, rect.width, rect.height);

        // Draw sign at this position
        // The eye tracks rightward and upward, so image moves leftward and downward
        const progress = i / (samples - 1);
        const horizontalOffset = (0.5 - progress) * pixelsPerFrame; // Reversed: right to left
        const verticalOffset = (progress - 0.5) * pixelsPerFrame * 0.3; // Down is positive Y

        const x = rect.width / 2 - signWidth / 2 + horizontalOffset;
        const y = centerY + verticalOffset;
        offscreenCtx.drawImage(img, x, y, signWidth, signHeight);

        // Get pixel data
        const imageData = offscreenCtx.getImageData(0, 0, width, height);
        const pixels = imageData.data;

        // Accumulate pixel values
        for (let j = 0; j &lt; pixels.length; j++) {
          accumulator[j] += pixels[j];
        }
      }

      // Create averaged image data
      const averagedImageData = ctx.createImageData(width, height);
      const averagedPixels = averagedImageData.data;

      // Average the accumulated values
      for (let i = 0; i &lt; accumulator.length; i++) {
        averagedPixels[i] = Math.round(accumulator[i] / samples);
      }

      // Put the averaged image back on the main canvas
      ctx.putImageData(averagedImageData, 0, 0);
    };
    img.src = &apos;/images/2025-08-03-highway-sign.png&apos;;
  }

  function updateHighwayAnimation() {
    // Update blur visualization
    renderBlurredSign();

    // Remove animation
    highwaySign.classList.remove(&apos;animating&apos;);

    // Calculate animation duration
    // Convert mph to feet per second: mph * 5280 / 3600
    const speedFps = speedMph * 5280 / 3600;
    const duration = signDistance / speedFps * 1000; // milliseconds

    // Set CSS custom property
    highwaySign.style.setProperty(&apos;--sign-duration&apos;, `${duration}ms`);

    // Restart animation if playing
    void highwaySign.offsetWidth; // Force reflow
    if (window.globalAnimationsPlaying) {
      highwaySign.classList.add(&apos;animating&apos;);
    }

    // Update animated sign visualization
    updateAnimatedSign();
  }

  function updateAnimatedSign() {
    const config = distanceConfigs[currentDistance];

    // Update sign sizes
    highwayAnimatedSign.style.height = (80 * config.scale) + &apos;px&apos;;
    if (highwayClearSign) {
      highwayClearSign.style.height = (80 * config.scale) + &apos;px&apos;;
    }

    // Set oscillation parameters
    const slowdownFactor = window.getSlowdownFactor();
    const oscillateDuration = (1000 / window.globalRefreshRate) * slowdownFactor;

    // Calculate pixels per frame based on angular velocity
    const viewportWidth = window.innerWidth;
    const pixelsPerSecond = config.getPixelsPerSecond(viewportWidth);
    const oscillateDistance = pixelsPerSecond / window.globalRefreshRate;

    highwayAnimatedSign.style.setProperty(&apos;--sign-oscillate-duration&apos;, oscillateDuration + &apos;ms&apos;);
    highwayAnimatedSign.style.setProperty(&apos;--sign-oscillate-distance&apos;, oscillateDistance + &apos;px&apos;);

    // Force animation restart to pick up new duration
    highwayAnimatedSign.classList.remove(&apos;animating&apos;);
    void highwayAnimatedSign.offsetWidth; // Force reflow

    // Update animation state
    if (window.globalAnimationsPlaying) {
      highwayAnimatedSign.classList.add(&apos;animating&apos;);
      if (highwayAnimatedOverlay) {
        highwayAnimatedOverlay.classList.add(&apos;playing&apos;);
      }
    } else {
      if (highwayAnimatedOverlay) {
        highwayAnimatedOverlay.classList.remove(&apos;playing&apos;);
      }
    }
  }

  // Handle play overlay
  const highwayOverlay = document.getElementById(&apos;highwayOverlay&apos;);
  if (highwayOverlay) {
    highwayOverlay.addEventListener(&apos;click&apos;, function(e) {
      e.stopPropagation();
      window.updateAllAnimations(true);
    });
  }

  // Click viewport to pause
  if (highwayViewport) {
    highwayViewport.addEventListener(&apos;click&apos;, function(e) {
      // Only pause if animations are playing and we&apos;re not clicking the play button
      if (window.globalAnimationsPlaying &amp;&amp; !e.target.closest(&apos;.play-button&apos;)) {
        window.updateAllAnimations(false);
      }
    });
  }

  // Handle animated viewport play/pause
  const highwayAnimatedViewport = document.querySelector(&apos;.highway-animated-viewport&apos;);
  if (highwayAnimatedViewport) {
    highwayAnimatedViewport.addEventListener(&apos;click&apos;, function(e) {
      // Toggle play/pause
      if (e.target.closest(&apos;.play-button&apos;) || e.target.closest(&apos;.animation-overlay&apos;)) {
        // If clicking play button, always play
        window.updateAllAnimations(true);
      } else {
        // If clicking elsewhere, toggle
        window.updateAllAnimations(!window.globalAnimationsPlaying);
      }
    });
  }

  // Handle distance toggle buttons
  const distanceButtons = document.querySelectorAll(&apos;.distance-button&apos;);
  distanceButtons.forEach(button =&gt; {
    button.addEventListener(&apos;click&apos;, function() {
      currentDistance = this.dataset.distance;

      // Update active state
      distanceButtons.forEach(btn =&gt; {
        btn.classList.toggle(&apos;active&apos;, btn === this);
      });

      // Update visualizations
      updateHighwayAnimation();
    });
  });

  // Handle FPS toggle buttons
  const highwayContainer = document.querySelector(&apos;.highway-sign-container&apos;);
  const fpsButtons = highwayContainer.querySelectorAll(&apos;.fps-button&apos;);
  fpsButtons.forEach(button =&gt; {
    button.addEventListener(&apos;click&apos;, function() {
      const newFps = parseInt(this.dataset.fps);
      window.globalRefreshRate = newFps;

      // Update active state for all FPS buttons
      document.querySelectorAll(&apos;.fps-button&apos;).forEach(btn =&gt; {
        btn.classList.toggle(&apos;active&apos;, btn.dataset.fps === this.dataset.fps);
      });


      // Update blur visualization
      renderBlurredSign();
      updateAnimatedSign();

      // Update other visualizations
      if (window.updateDiscreteVisualization) {
        window.updateDiscreteVisualization();
      }
      if (window.updateFrameVisualization) {
        window.updateFrameVisualization();
      }

      // Update description text in other figures
      const descFps = document.getElementById(&apos;descriptionFps&apos;);
      const descMs = document.getElementById(&apos;descriptionMs&apos;);
      const descSlowFps = document.getElementById(&apos;descriptionSlowFps&apos;);
      if (descFps) descFps.textContent = newFps + &apos; fps&apos;;
      if (descMs) descMs.textContent = (1000 / newFps).toFixed(1);
      const slowdownFactor = window.getSlowdownFactor();
      if (descSlowFps) descSlowFps.textContent = Math.round(newFps / slowdownFactor);

      // Update all slowdown factors by class
      document.querySelectorAll(&apos;.slowdown-factor&apos;).forEach(el =&gt; {
        el.textContent = slowdownFactor;
      });
      
      // Update number of frames in discrete visualization description
      const descNumFrames = document.getElementById(&apos;descriptionNumFrames&apos;);
      if (descNumFrames) {
        let numFrames;
        switch(newFps) {
          case 30: numFrames = 3; break;
          case 60: numFrames = 6; break;
          case 120: numFrames = 12; break;
          default: numFrames = 12;
        }
        descNumFrames.textContent = numFrames;
      }
    });
  });

  // Expose blur render function globally
  window.renderBlurredSign = renderBlurredSign;

  // Initialize
  updateHighwayAnimation();

  // Ensure animated sign starts paused with overlay visible
  if (highwayAnimatedSign) {
    highwayAnimatedSign.classList.remove(&apos;animating&apos;);
  }
  if (highwayAnimatedOverlay) {
    highwayAnimatedOverlay.classList.remove(&apos;playing&apos;);
  }

  // Update when global animation state changes
  const originalUpdateAll = window.updateAllAnimations;
  window.updateAllAnimations = function(playing) {
    originalUpdateAll(playing);

    // Update overlay
    if (highwayOverlay) {
      highwayOverlay.classList.toggle(&apos;playing&apos;, playing);
    }

    if (playing) {
      highwaySign.classList.add(&apos;animating&apos;);
      if (highwayAnimatedSign) {
        highwayAnimatedSign.classList.add(&apos;animating&apos;);
      }
      if (highwayAnimatedOverlay) {
        highwayAnimatedOverlay.classList.add(&apos;playing&apos;);
      }
    } else {
      highwaySign.classList.remove(&apos;animating&apos;);
      if (highwayAnimatedSign) {
        highwayAnimatedSign.classList.remove(&apos;animating&apos;);
      }
      if (highwayAnimatedOverlay) {
        highwayAnimatedOverlay.classList.remove(&apos;playing&apos;);
      }
    }
  };
})();
&lt;/script&gt;

&lt;p&gt;This might mean that the soldier or police officer of the future will not use VR with video passthrough.&lt;/p&gt;

&lt;p&gt;What about AR glasses? Passthrough VR can create amazing experiences, but it has the above problem. Glasses let the real world pass directly through, then add their own additional photons.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/2025-08-04-meta-orion.jpeg&quot; alt=&quot;AR scene, showing actual food ingredients with AR text overlays on them&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(Source: &lt;a href=&quot;https://about.fb.com/news/2024/09/introducing-orion-our-first-true-augmented-reality-glasses/&quot;&gt;Meta&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;As you’re working in the kitchen, these text overlays will be blurry. You’ll be forced to pause more often. AR devices have to render motion that otherwise isn’t there, because &lt;em&gt;you&lt;/em&gt; are always moving. A screen in the kitchen can actually be useful, because it sits still. But with AR glasses, everything that “sits still” is actually constantly moving on your AR screen.&lt;/p&gt;

&lt;p&gt;I wonder what the AR / VR people will come up with. Do their hardware labs have secret rotating screens that move with your eyes? Or will they treat this as a design constraint, and carefully design experiences around the limitations of screens?&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(Thanks to Rosanne Liu and Charlie Liu Lewis for reading and listening to drafts of this post.)&lt;/em&gt;&lt;/p&gt;
</description>
        <pubDate>Mon, 04 Aug 2025 00:00:00 -0700</pubDate>
        <link>https://probablymarcus.com/blocks/2025/08/04/screens-break-your-smooth-pursuit.html</link>
        <guid isPermaLink="true">https://probablymarcus.com/blocks/2025/08/04/screens-break-your-smooth-pursuit.html</guid>
        
        
        <category>blocks</category>
        
      </item>
    
      <item>
        <title>The web could use machine code</title>
        <description>&lt;style&gt;
article &gt; h1, article &gt; h2, article &gt; h3 {
  margin-top:30px;
}
&lt;/style&gt;

&lt;p&gt;Think of all the client-side code that runs on your devices. Most technical people would say that it falls into two categories:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Native apps, which are written for a specific platform and compiled to machine code.&lt;/li&gt;
  &lt;li&gt;The web, which is written in cross-platform interpreted code.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This mental model is a misconception. These categories are real, but nothing I mentioned about them is fundamental. The actual thing that distinguishes between these categories is whether the code can render a top-level window. Native apps can do this, while the web is confined to a “visually sandboxed” area. All those other details are orthogonal.&lt;/p&gt;

&lt;p&gt;Web-like platforms will remain relevant for the foreseeable future, because we will always need a safe space to run other people’s code. Curated app stores are good for certain classes of code, like apps or games, but they aren’t appropriate for other use cases like scientific visualizations.&lt;/p&gt;

&lt;p&gt;Today, a core feature of the web is that it lets you &lt;em&gt;“write once, run everywhere”&lt;/em&gt;, but this could change. As AI code generation becomes cheaper and better, future web-like platforms could be based on actual machine code, compiled from any language. Each visualization would include compiled code for multiple device platforms. This would free us from the current “lowest common denominator” web, and instead would let visualizations use the full power of the device, which will be essential for making the leap to low-power devices like AR glasses.&lt;/p&gt;

&lt;p&gt;Rather than waiting for the web to change, today we can create new analogs of the &lt;em&gt;web view&lt;/em&gt; that support machine code. I call these &lt;strong&gt;outerframes&lt;/strong&gt;. They could show up in lots of apps, new and old:&lt;/p&gt;

&lt;div style=&quot;display: grid; grid-template-columns: 1fr 1fr; gap: 10px; margin-bottom:12px;&quot;&gt;
  &lt;img src=&quot;/images/2025-06-01-example-jupyter.png&quot; title=&quot;Jupyter, which currently uses web views&quot; alt=&quot;Jupyter, with a box showing that the generated figures would display in this web-like-view&quot; style=&quot;width: 100%;&quot; /&gt;
  &lt;img src=&quot;/images/2025-06-01-example-claude.png&quot; title=&quot;Claude, which currently uses web views&quot; alt=&quot;Claude, with a box showing that the generated figures would display in this web-like-view&quot; style=&quot;width: 100%;&quot; /&gt;
  &lt;img src=&quot;/images/2025-06-01-example-chatgpt.png&quot; title=&quot;ChatGPT, which currently only shows static images&quot; alt=&quot;ChatGPT, with a box showing that the generated figures would display in this web-like-view&quot; style=&quot;width: 100%;&quot; /&gt;
  &lt;img src=&quot;/images/2025-06-01-example-models.png&quot; title=&quot;An actual custom app that I&apos;ve built for machine learning model comparison, using machine code&quot; alt=&quot;An actual custom app that I&apos;ve built for machine learning model comparison, using machine code&quot; style=&quot;width: 100%;&quot; /&gt;
&lt;/div&gt;
&lt;p style=&quot;margin-top:-10px;&quot;&gt;
&lt;em&gt;Four example apps that could use outerframes. The blue dotted line shows which part of the app could be an outerframe. The first two apps are Jupyter and Claude, which use the web. The third is ChatGPT, which is native and uses only static visualizations. The fourth is an actual app I&apos;ve built using outerframes.&lt;/em&gt;
&lt;/p&gt;

&lt;p&gt;An outerframe is a safe space for running untrusted code within an app. As you would expect, the untrusted code is sandboxed in its own process, unable to access your data. Importantly, the untrusted code is also restricted from taking over the screen, and instead is “visually sandboxed” by the app, similar to how a website is visually sandboxed to a browser frame. Unlike in a conventional web browser, this code can include any compiled machine code, targeting both the CPU and the GPU, with full support for threading. Thus, the outerframe eliminates the “lowest common denominator” overhead of the web.&lt;/p&gt;

&lt;p&gt;I built an outerframe for macOS. To demo it, I built a toy browser that, instead of using HTML and JavaScript, uses a cross-platform binary file format as its “page” and platform-specific machine code as its “script”.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/2025-06-08-machine-code-web.png?v2&quot; alt=&quot;&quot; style=&quot;max-width:100%; margin-bottom:-20px;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This app is open-source, you can &lt;a href=&quot;https://github.com/outergroup/outerframe&quot;&gt;build and run it yourself&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In this post, I try to quantify the efficiency gains of moving to a machine code web on today’s devices.&lt;/p&gt;

&lt;h2 id=&quot;benchmark&quot;&gt;Benchmark&lt;/h2&gt;

&lt;p&gt;In a previous post, I used a set of pragmatic model visualizations:&lt;/p&gt;

&lt;div style=&quot;text-align: center; margin-bottom: 16px; max-width:100%; overflow: hidden;&quot;&gt;
  &lt;img src=&quot;/images/2025-06-01-figure-screenshot.png&quot; alt=&quot;&quot; style=&quot;max-width: 400px; width: 100%; border:1px solid lightgray; padding:12px; display:block; margin:0 auto; box-sizing: border-box;&quot; /&gt;
  &lt;div style=&quot;margin-top: 10px;&quot;&gt;
    &lt;a href=&quot;/blocks/2024/01/10/expressions-are-pragmatic-visualizations.html&quot;&gt;See old post for full version&lt;/a&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;When I wrote that post, I found that these visualizations were far too expensive to animate, at least when using a standard 2D canvas. Now, I use this task as my benchmark. I plot large distributions of machine learning model parameters at 120 frames per second. (120 hz is the refresh rate of modern phone and laptop screens.)&lt;/p&gt;

&lt;p&gt;For the web, I tested using JavaScript and the browser’s GPU shader language, testing both WebGL and WebGPU. For the outerframe, I used a mix of Swift, C, and Metal Shading Language.&lt;/p&gt;

&lt;p&gt;This benchmark plays to the web’s performance strengths in a couple of ways. First, I measure the visualization’s render loop, not its initial loading and display, where I think the outerframe will have a larger advantage. Second, this task is GPU-heavy, so a lot of the work is being done by compiled shader code, whereas in CPU-heavy tasks I think the outerframe would again have a larger advantage. For these reasons, I think it’s okay that I relied on JavaScript and didn’t test WebAssembly (Wasm), which would be more important for benchmarks of initialization or of CPU-heavy tasks.&lt;/p&gt;

&lt;p&gt;I measure CPU utilization, GPU utilization, and macOS Activity Monitor’s “Energy Impact”. I use this Energy Impact as the measure of efficiency. (CPU utilization is a very noisy measure of efficiency, because “100% utilization” only sometimes means high power consumption. For example, when coordinating with the GPU, a CPU will often run a special “spin loop” that uses little energy but still uses the CPU at 100% for a non-negligible fraction of every frame.)&lt;/p&gt;

&lt;h2 id=&quot;result-1-machine-code-can-be-more-web-like-than-the-web&quot;&gt;Result 1: Machine code can be more web-like than the web&lt;/h2&gt;

&lt;p&gt;The web is built on &lt;em&gt;documents&lt;/em&gt;, and I first approached this visualization as a document. I implemented it using text with inline canvas elements. In the outerframe with machine code, this worked great.&lt;/p&gt;

&lt;video src=&quot;/images/2025-06-08-result-1.mp4&quot; poster=&quot;/images/2025-06-08-result-1-poster.png&quot; controls=&quot;&quot; loop=&quot;&quot; playsinline=&quot;&quot; style=&quot;width: 100%;&quot;&gt;&lt;/video&gt;

&lt;p&gt;In a conventional web browser, it was terribly inefficient.&lt;/p&gt;

&lt;style&gt;

.warning-message {
  margin-top: 10px;
  font-size: 14px;
  color: #cc6600;
}
&lt;/style&gt;

&lt;div style=&quot;margin: 20px 0;&quot;&gt;
  &lt;div style=&quot;display: grid; grid-template-columns: 1fr 1fr; gap: 20px;&quot;&gt;
    &lt;div style=&quot;border: 1px solid #ccc; border-radius: 5px; padding: 15px; text-align: center;&quot;&gt;
      &lt;h4 style=&quot;margin-top: 0; margin-bottom: 10px;&quot;&gt;WebGL&lt;/h4&gt;
      &lt;a href=&quot;/stuff/ml-params-at-120hz/ScalarDistributionList-WebGL.html&quot; target=&quot;_blank&quot; style=&quot;color: #0066cc; text-decoration: none; font-size: 16px;&quot;&gt;
        View in browser &lt;span style=&quot;font-size: 12px;&quot;&gt;↗&lt;/span&gt;
      &lt;/a&gt;
      &lt;div class=&quot;frame-rate-message warning-message&quot; style=&quot;display: none;&quot;&gt;&lt;/div&gt;
    &lt;/div&gt;
    &lt;div style=&quot;border: 1px solid #ccc; border-radius: 5px; padding: 15px; text-align: center;&quot;&gt;
      &lt;h4 style=&quot;margin-top: 0; margin-bottom: 10px;&quot;&gt;WebGPU&lt;/h4&gt;
      &lt;a href=&quot;/stuff/ml-params-at-120hz/ScalarDistributionList-WebGPU.html&quot; target=&quot;_blank&quot; style=&quot;color: #0066cc; text-decoration: none; font-size: 16px;&quot;&gt;
        View in browser &lt;span style=&quot;font-size: 12px;&quot;&gt;↗&lt;/span&gt;
      &lt;/a&gt;
      &lt;div class=&quot;frame-rate-message warning-message&quot; style=&quot;display: none;&quot;&gt;&lt;/div&gt;
      &lt;div class=&quot;webgpu-status warning-message&quot;&gt;&lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;div style=&quot;margin: 20px 0;&quot;&gt;
  &lt;table style=&quot;border-collapse: collapse; width: 100%; border: 1px solid #ccc; font-size: 14px;&quot;&gt;
    &lt;thead&gt;
      &lt;tr style=&quot;background-color: #f5f5f5;&quot;&gt;
        &lt;th style=&quot;border: 1px solid #ccc; padding: 12px; text-align: left;&quot;&gt;Platform&lt;/th&gt;
        &lt;th style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center; width: 80px;&quot;&gt;Frames per second&lt;/th&gt;
        &lt;th style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;CPU&lt;/th&gt;
        &lt;th style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;GPU&lt;/th&gt;
        &lt;th style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;Energy Impact&lt;br /&gt;(relative)&lt;/th&gt;
      &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody&gt;
      &lt;tr&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px;&quot;&gt;Firefox (WebGL)&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center; color:red;&quot;&gt;5&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;67%&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;10%&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center; background-color: #fff3cd; border: 2px solid #856404;&quot;&gt;1550&lt;br /&gt;(86x)&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px;&quot;&gt;Chrome (WebGL)&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center; color:red;&quot;&gt;61&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;101%&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;37%&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center; background-color: #fff3cd; border: 2px solid #856404;&quot;&gt;1610&lt;br /&gt;(90x)&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px;&quot;&gt;Safari (WebGL)&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;120&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;76%&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;56%&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center; background-color: #fff3cd; border: 2px solid #856404;&quot;&gt;1005&lt;br /&gt;(56x)&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px;&quot;&gt;Safari (WebGPU)&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;120&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;55%&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;16%&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center; background-color: #fff3cd; border: 2px solid #856404;&quot;&gt;710&lt;br /&gt;(39x)&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px;&quot;&gt;Chrome (WebGPU)&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;120&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;55%&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;20%&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center; background-color: #fff3cd; border: 2px solid #856404;&quot;&gt;435&lt;br /&gt;(24x)&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px;&quot;&gt;outerframe (macOS)&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;120&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;16%&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;14%&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center; background-color: #d4edda; border: 2px solid #28a745;&quot;&gt;18&lt;br /&gt;(baseline)&lt;/td&gt;
      &lt;/tr&gt;
    &lt;/tbody&gt;
  &lt;/table&gt;
&lt;/div&gt;

&lt;p&gt;Rendering on the web consumed roughly 24x more power, at best, than rendering with machine code.&lt;/p&gt;

&lt;p&gt;The issue here is that WebGL and WebGPU are not designed to be used as figures within a document. When you use these, you are supposed to use very few canvases, maybe 1-2. This was particularly true in WebGL, where using more than 16 WebGL canvases isn’t even possible, but it continues to be very expensive with WebGL’s successor, WebGPU. Meanwhile, in an outerframe I embrace the native platform, and macOS will happily let you draw to as many canvases (a.k.a. CALayers) as you want.&lt;/p&gt;

&lt;p&gt;So my first result was: I learned that building this visualization for the web requires moving away from the natural approach, and instead using tricks like treating the canvas as a background behind the text, and carefully drawing to the correct locations. To be fair, when building native document apps, this can be a performant way of doing things, and I also see some efficiency wins when I adopt this approach in an outerframe. But it was striking how machine code and the native platform allowed me to build something in the way that felt natural and web-like, whereas the web did not.&lt;/p&gt;

&lt;h2 id=&quot;result-2-machine-code-was-about-6x-more-efficient&quot;&gt;Result 2: Machine code was about 6x more efficient&lt;/h2&gt;

&lt;p&gt;I changed to a single-canvas approach, and the web was finally able to handle it.&lt;/p&gt;

&lt;div style=&quot;margin: 20px 0;&quot;&gt;
  &lt;div style=&quot;display: grid; grid-template-columns: 1fr 1fr; gap: 20px;&quot;&gt;
    &lt;div style=&quot;border: 1px solid #ccc; border-radius: 5px; padding: 15px; text-align: center;&quot;&gt;
      &lt;h4 style=&quot;margin-top: 0; margin-bottom: 10px;&quot;&gt;WebGL&lt;/h4&gt;
      &lt;a href=&quot;/stuff/ml-params-at-120hz/ScalarDistributionList-WebGL-SingleCanvas.html&quot; target=&quot;_blank&quot; style=&quot;color: #0066cc; text-decoration: none; font-size: 16px;&quot;&gt;
        View in browser &lt;span style=&quot;font-size: 12px;&quot;&gt;↗&lt;/span&gt;
      &lt;/a&gt;
      &lt;div class=&quot;frame-rate-message warning-message&quot; style=&quot;display: none;&quot;&gt;&lt;/div&gt;
    &lt;/div&gt;
    &lt;div style=&quot;border: 1px solid #ccc; border-radius: 5px; padding: 15px; text-align: center;&quot;&gt;
      &lt;h4 style=&quot;margin-top: 0; margin-bottom: 10px;&quot;&gt;WebGPU&lt;/h4&gt;
      &lt;a href=&quot;/stuff/ml-params-at-120hz/ScalarDistributionList-WebGPU-SingleCanvas.html&quot; target=&quot;_blank&quot; style=&quot;color: #0066cc; text-decoration: none; font-size: 16px;&quot;&gt;
        View in browser &lt;span style=&quot;font-size: 12px;&quot;&gt;↗&lt;/span&gt;
      &lt;/a&gt;
      &lt;div class=&quot;frame-rate-message warning-message&quot; style=&quot;display: none;&quot;&gt;&lt;/div&gt;
      &lt;div class=&quot;webgpu-status warning-message&quot;&gt;&lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;div style=&quot;margin: 20px 0;&quot;&gt;
  &lt;table style=&quot;border-collapse: collapse; width: 100%; border: 1px solid #ccc; font-size: 14px;&quot;&gt;
    &lt;thead&gt;
      &lt;tr style=&quot;background-color: #f5f5f5;&quot;&gt;
        &lt;th style=&quot;border: 1px solid #ccc; padding: 12px; text-align: left;&quot;&gt;Platform&lt;/th&gt;
        &lt;th style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;  width: 80px;&quot;&gt;Frames per second&lt;/th&gt;
        &lt;th style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;CPU&lt;/th&gt;
        &lt;th style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;GPU&lt;/th&gt;
        &lt;th style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;Energy Impact&lt;br /&gt;(relative)&lt;/th&gt;
      &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody&gt;
      &lt;tr&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px;&quot;&gt;Safari (WebGL)&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;120&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;55%&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;18%&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center; background-color: #fff3cd; border: 2px solid #856404;&quot;&gt;63&lt;br /&gt;8x&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px;&quot;&gt;Firefox (WebGL)&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;120&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;44%&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;15%&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center; background-color: #fff3cd; border: 2px solid #856404;&quot;&gt;53&lt;br /&gt;7x&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px;&quot;&gt;Chrome (WebGL)&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;120&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;52%&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;22%&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center; background-color: #fff3cd; border: 2px solid #856404;&quot;&gt;57&lt;br /&gt;7x&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px;&quot;&gt;Safari (WebGPU)&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;120&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;35%&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;13%&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center; background-color: #fff3cd; border: 2px solid #856404;&quot;&gt;49&lt;br /&gt;6x&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px;&quot;&gt;Chrome (WebGPU)&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;120&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;51%&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;13%&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center; background-color: #fff3cd; border: 2px solid #856404;&quot;&gt;47&lt;br /&gt;6x&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px;&quot;&gt;outerframe&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;120&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;13%&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;13%&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center; background-color: #d4edda; border: 2px solid #28a745;&quot;&gt;8&lt;br /&gt;(baseline)&lt;/td&gt;
      &lt;/tr&gt;
    &lt;/tbody&gt;
  &lt;/table&gt;
&lt;/div&gt;

&lt;p&gt;With this approach, the outerframe was 6x more efficient than the the web. The outerframe version improved on its Result 1 numbers because rendering one canvas saves us from lots of copying, and macOS likes it when you draw to one CALayer, rather than drawing to 20 CALayers.&lt;/p&gt;

&lt;p&gt;I suspect Results 1 and 2 are a representative story. The story is: there’s a set of scenarios that the web does a pretty good job of supporting, and sometimes you need to tweak your design to fit within that set of scenarios. But even then, it’s about 6x less efficient. (With error bars, of course. The actual number might be 3x, or it might be 10x.)&lt;/p&gt;

&lt;h2 id=&quot;result-3-the-operating-systems-native-text-view-is-not-good-at-fast-dynamic-text-updates&quot;&gt;Result 3: The operating system’s native text view is not good at fast dynamic text updates&lt;/h2&gt;

&lt;p&gt;From that same previous blog post, here’s a variant of the visualization which plots parameters in one dimension:&lt;/p&gt;

&lt;div style=&quot;text-align: center; margin-bottom: 16px; max-width:100%; overflow: hidden;&quot;&gt;
  &lt;img src=&quot;/images/2025-06-01-figure-screenshot2.png&quot; alt=&quot;&quot; style=&quot;max-width: 400px; width: 100%; border:1px solid lightgray; padding:12px; display:block; margin:0 auto; box-sizing: border-box;&quot; /&gt;
  &lt;div style=&quot;margin-top: 10px;&quot;&gt;
    &lt;a href=&quot;/blocks/2024/01/10/expressions-are-pragmatic-visualizations.html&quot;&gt;See old post for full version&lt;/a&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;In that visualization, the min / max number labels are actual text in the page, not text rendered into the canvas. When I implemented this visualization in an outerframe, I used the operating system’s &lt;a href=&quot;https://developer.apple.com/documentation/appkit/nstextview&quot;&gt;native text view&lt;/a&gt; and I &lt;a href=&quot;https://developer.apple.com/documentation/foundation/nsmutableattributedstring/beginediting()&quot;&gt;batch-updated&lt;/a&gt; each subrange of the text in the text view, rather than rendering it myself. I found something disappointing: the performance was abysmal. In fact, even just &lt;em&gt;this&lt;/em&gt; is terrible:&lt;/p&gt;

&lt;video src=&quot;/images/2025-06-08-numbers.mp4&quot; poster=&quot;/images/2025-06-08-numbers-poster.png&quot; controls=&quot;&quot; loop=&quot;&quot; playsinline=&quot;&quot; style=&quot;width: 100%;&quot;&gt;&lt;/video&gt;

&lt;div style=&quot;margin: 0 0 20px 0;&quot;&gt;
  &lt;table style=&quot;border-collapse: collapse; width: 100%; border: 1px solid #ccc; font-size: 14px;&quot;&gt;
    &lt;thead&gt;
      &lt;tr style=&quot;background-color: #f5f5f5;&quot;&gt;
        &lt;th style=&quot;border: 1px solid #ccc; padding: 12px; text-align: left;&quot;&gt;Platform&lt;/th&gt;
        &lt;th style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;CPU utilization&lt;/th&gt;
        &lt;th style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;GPU utilization&lt;/th&gt;
        &lt;th style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;Energy Impact&lt;/th&gt;
      &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody&gt;
      &lt;tr style=&quot;background-color: #f9f9f9;&quot;&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px;&quot;&gt;outerframe&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;60%&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;0%&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center; background-color: #fff3cd; border: 2px solid #856404;&quot;&gt;1550&lt;/td&gt;
      &lt;/tr&gt;
    &lt;/tbody&gt;
  &lt;/table&gt;
&lt;/div&gt;

&lt;p&gt;Not surprisingly, the operating system’s text view was not designed for dynamically updating text many times per second. I’m sure this is one reason Safari’s WebKit has a different text stack.&lt;/p&gt;

&lt;p&gt;So, I moved the text into the canvas, rendering it myself. With this change, the text view’s text is now static, with all changes occurring inside of the canvas. Now it’s much better:&lt;/p&gt;

&lt;video src=&quot;/images/2025-06-08-result-3.mp4&quot; poster=&quot;/images/2025-06-08-result-3-poster.png&quot; style=&quot;max-width:100%;&quot; controls=&quot;&quot; loop=&quot;&quot; playsinline=&quot;&quot;&gt;&lt;/video&gt;

&lt;div style=&quot;margin: 20px 0;&quot;&gt;
  &lt;table style=&quot;border-collapse: collapse; width: 100%; border: 1px solid #ccc; font-size: 14px;&quot;&gt;
    &lt;thead&gt;
      &lt;tr style=&quot;background-color: #f5f5f5;&quot;&gt;
        &lt;th style=&quot;border: 1px solid #ccc; padding: 12px; text-align: left;&quot;&gt;Platform&lt;/th&gt;
        &lt;th style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;CPU utilization&lt;/th&gt;
        &lt;th style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;GPU utilization&lt;/th&gt;
        &lt;th style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;Energy Impact&lt;/th&gt;
      &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody&gt;
      &lt;tr style=&quot;background-color: #f9f9f9;&quot;&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px;&quot;&gt;outerframe&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;21%&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center;&quot;&gt;5%&lt;/td&gt;
        &lt;td style=&quot;border: 1px solid #ccc; padding: 12px; text-align: center; background-color: #d4edda; border: 2px solid #28a745;&quot;&gt;17&lt;/td&gt;
      &lt;/tr&gt;
    &lt;/tbody&gt;
  &lt;/table&gt;
&lt;/div&gt;

&lt;p&gt;Takeaway: If you’re going to create an outerframe, you probably shouldn’t base it on the operating system’s native text view.&lt;/p&gt;

&lt;h2 id=&quot;how-outerframes-work&quot;&gt;How outerframes work&lt;/h2&gt;

&lt;p&gt;The point of the outerframe (and the point of the web) is to enable playful sharing. To support this, outerframes must perform “visual sandboxing”. Without that, a rogue actor could take over your whole screen, show you a pixel-perfect replica of any app, and trick you into doing anything.&lt;/p&gt;

&lt;p&gt;There is no single correct way to implement an outerframe. There are two natural ways of building it:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Create a viewer for a data structure. Let the untrusted code modify that data structure, while your trusted viewer displays it.&lt;/li&gt;
  &lt;li&gt;Let the untrusted code write directly to a framebuffer / canvas / graphics context that the parent app controls.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The web uses both of these approaches, using the DOM and canvas elements, respectively.&lt;/p&gt;

&lt;p&gt;This blog post’s &lt;a href=&quot;https://github.com/outergroup/outerframe&quot;&gt;demo outerframe&lt;/a&gt; is very web-like, in that it offers untrusted code analogous ways to display stuff to the user:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Create and update a rich text “attributed string” document, which the app displays.&lt;/li&gt;
  &lt;li&gt;Render to a “canvas” that is inline in the document, editing pixels directly.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Using this trusted document viewer approach, I got basic functionality like text selection for free. I hosted it in Apple’s native text view, using text attachments to host the canvases. You could imagine hosting it in another text stack, e.g. Zed’s GPUI, or a custom stack.&lt;/p&gt;

&lt;p&gt;Since the native text stack is inefficient at dynamic text updates, I might simplify the interface and make the outerframe hold nothing but one big canvas. For outerframe content that wants to be document-like, it can implement the document affordances itself, via the canvas. This means implementing basic features like text selection, but it might not be a crazy idea, especially when you consider that it could become a reusable component, analogous to how JavaScript libraries are reused today.&lt;/p&gt;

&lt;p&gt;Operating system vendors could choose to provide their own frameworks for hosting “visually sandboxed” UI in background processes. That would be great.&lt;/p&gt;

&lt;p&gt;Classically, browsers allowed machine code plugins to write to canvas-like elements using a scheme called NPAPI. Browsers moved away from that, with the Chrome team describing it as a &lt;a href=&quot;https://blog.chromium.org/2013/09/saying-goodbye-to-our-old-friend-npapi.html&quot;&gt;“90s-era architecture”&lt;/a&gt;. I like that. Now that we have LLM code generation, I think a few 90s-era architectures are worth trying again. This time around, from the user’s perspective it wouldn’t be a plugin model, but would be much lower friction, like JavaScript.&lt;/p&gt;

&lt;p&gt;The whole point here is to lean in and embrace the machine, so I’m not trying to provide any standard interface. The outerframe is the opposite of a platform; I’m trying to remove layers, not add them. The platform is the platform.&lt;/p&gt;

&lt;h2 id=&quot;how-would-an-outerdoc-work&quot;&gt;How would an “outerdoc” work?&lt;/h2&gt;

&lt;p&gt;The outerframe solves the problem of running untrusted code on a particular platform. We still need a good shareable document file format. How can we write documents, presentations, blog posts, etc., that can be viewed from any device, while using the full power of each device?&lt;/p&gt;

&lt;p&gt;I designed a format that enables this, but I’m still working on it. Two useful tricks:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Let documents provide preference-ordered implementations. This would be similar to how the HTML &amp;lt;video&amp;gt; element lets you provide preference-ordered &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Elements/source&quot;&gt;&amp;lt;source&amp;gt;&lt;/a&gt; tags. Documents would provide a list of platform-implementation pairs, and they would typically include a HTML/JavaScript implementation as a fallback.&lt;/li&gt;
  &lt;li&gt;Provide “content types” like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;com.probablymarcus.ScalarDistributionList&lt;/code&gt;, letting viewer apps provide implementations for any content types that they want, either directly or via extensions. This would let apps provide first-party “trusted” implementations of certain content types, which would be useful if you want to bridge to the operating system’s native UI controls (e.g. NSTableView on macOS), and it could enable native apps to render a limited set of outerframe visualizations on platforms like iOS that don’t currently allow unvetted machine code.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;rebalancing-trade-offs&quot;&gt;Rebalancing trade-offs&lt;/h2&gt;

&lt;p&gt;With AI code generation, I think table stakes are changing. The “optimal way to build something” is always a function of development cost, and we’re experiencing a step change downward in development cost. The “write once, run everywhere” web made sense in the previous era, but now it seems extravagant.&lt;/p&gt;

&lt;p&gt;Chip designers and chip manufacturers are pushing against the laws of physics to eke out every last bit of efficiency, and there’s a wave of new low-power devices that are within a couple years of being viable. A machine code web would give us an immediate ~6x software efficiency boost, and that’s on today’s hardware that has compromised its design to be good at JavaScript. Larger gains would come when hardware designers are freed from this constraint. I doubt we’re going to leave all of these gains on the table.&lt;/p&gt;

&lt;p&gt;There is a good path forward for a machine code web. Today, anyone can design their own outerframe and use it in their apps. If you do this, you won’t be able to publish your app in most app stores, since they don’t allow untrusted machine code. But that’s okay! On those platforms, just fall back to a conventional web view. The experience won’t be as good, but that’s the platform’s choice, and they will have to compete with other platforms that do have outerframes. They’ll come around eventually.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(Thanks to Rosanne Liu, Jason Yosinski, Adam Zethraeus, and Mirko Klukas for reading drafts of this post.)&lt;/em&gt;&lt;/p&gt;

&lt;script&gt;
(function() {
  let frameCount = 0;
  let startTime = performance.now();
  let measuring = true;
  let firstCheckDone = false;

  function updateFrameRateMessage(roundedFps) {
    const messageElements = document.getElementsByClassName(&apos;frame-rate-message&apos;);
    for (let i = 0; i &lt; messageElements.length; i++) {
      messageElements[i].style.display = &apos;block&apos;;
      messageElements[i].textContent = &apos;Your browser runs JavaScript at &apos; + roundedFps + &apos; FPS&apos;;
    }
  }

  function countFrames() {
    if (!measuring) return;

    frameCount++;
    const elapsed = performance.now() - startTime;

    // First check at 1000ms
    if (!firstCheckDone &amp;&amp; elapsed &gt;= 1000) {
      const fps = Math.round(frameCount * 1000 / elapsed);
      const roundedFps = Math.round(fps / 10) * 10;

      if (roundedFps &gt;= 100) {
        // Don&apos;t show the message for 120hz
        const messageElements = document.getElementsByClassName(&apos;frame-rate-message&apos;);
        for (let i = 0; i &lt; messageElements.length; i++) {
          messageElements[i].style.display = &apos;none&apos;;
        }
        measuring = false;
        return;
      } else {
        // Show warning immediately for lower frame rates
        updateFrameRateMessage(roundedFps);
      }
      firstCheckDone = true;
    }

    // Final check at 5000ms
    if (elapsed &gt;= 5000) {
      const fps = Math.round(frameCount * 1000 / elapsed);
      const roundedFps = Math.round(fps / 10) * 10;

      // Update the message with final measurement
      updateFrameRateMessage(roundedFps);
      measuring = false;
    } else {
      requestAnimationFrame(countFrames);
    }
  }

  function startMeasuring() {
    frameCount = 0;
    startTime = performance.now();
    measuring = true;
    firstCheckDone = false;
    requestAnimationFrame(countFrames);

    if (!navigator.gpu) {
      const webgpuElements = document.getElementsByClassName(&apos;webgpu-status&apos;);
      for (let i = 0; i &lt; webgpuElements.length; i++) {
        webgpuElements[i].innerHTML = &apos;Your browser doesn\&apos;t have WebGPU enabled&apos;;
      }
    }
  }

  if (document.readyState === &apos;loading&apos;) {
    document.addEventListener(&apos;DOMContentLoaded&apos;, startMeasuring);
  } else {
    startMeasuring();
  }
})();
&lt;/script&gt;
</description>
        <pubDate>Sun, 08 Jun 2025 00:00:00 -0700</pubDate>
        <link>https://probablymarcus.com/blocks/2025/06/08/the-web-could-use-machine-code.html</link>
        <guid isPermaLink="true">https://probablymarcus.com/blocks/2025/06/08/the-web-could-use-machine-code.html</guid>
        
        
        <category>blocks</category>
        
      </item>
    
      <item>
        <title>Extended material for &apos;Expressions are Pragmatic Model Visualizations&apos;</title>
        <description>&lt;style&gt;
article &gt; h1, article &gt; h2, article &gt; h3 {
  margin-top:30px;
}
.string {
color: brown;
}
.comment {
color: green;
}
.r2p-code {
color: gray;
}
circle.point {
fill: blue;
opacity: 0.2;
}

pre.vexpr {
  padding: 0;
  border: 0;
  background-color: initial;
}
&lt;/style&gt;

&lt;script src=&quot;/stuff/2023-12-13-extra-bundle.js?v5&quot; defer=&quot;&quot;&gt;&lt;/script&gt;

&lt;p&gt;This is the extended material for &lt;a href=&quot;/blocks/2024/01/10/expressions-are-pragmatic-visualizations.html&quot;&gt;Expressions are Pragmatic Model Visualizations&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;followup-on-example-1&quot;&gt;Followup on Example 1&lt;/h2&gt;

&lt;p&gt;Let’s further loosen the prior on the parameters.&lt;/p&gt;

&lt;div id=&quot;5ffe2e7c-954e-11ee-8fc1-daa4de339950&quot; style=&quot;padding: 10px; border: 1px solid black; border-radius: 5px; margin-bottom:30px; font-weight:100; position:relative;&quot;&gt;
&lt;h2 style=&quot;text-transform: uppercase;&quot;&gt;Model Visualization&lt;/h2&gt;
&lt;p&gt;Gaussian Process with the following kernel.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Mean:&lt;/strong&gt; constant&lt;/p&gt;
&lt;pre data-key=&quot;mean&quot; class=&quot;mean vexpr&quot; style=&quot;padding-bottom:5px&quot;&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Covariance:&lt;/strong&gt; Use distance between points as follows:&lt;/p&gt;
&lt;pre id=&quot;cov-kernel2&quot; class=&quot;vexpr&quot; style=&quot;overflow: hidden;&quot;&gt;
&lt;div id=&quot;cov-kernel-content2&quot; class=&quot;kernel r2p-code&quot; style=&quot;transform: scale(0.48); transform-origin: top left; height:48%;&quot;&gt;&lt;span class=&quot;scale&quot; data-key=&quot;0&quot;&gt;&lt;/span&gt; * sum([
   
   &lt;span class=&quot;comment&quot;&gt;# Kernel: Factorized scalar vs choice parameters&lt;/span&gt;
  &lt;span class=&quot;mixing_weight&quot; data-key=&quot;1&quot;&gt;&lt;/span&gt; * sum([
     
     &lt;span class=&quot;comment&quot;&gt;# Scalar parameters&lt;/span&gt;
    &lt;span class=&quot;mixing_weight&quot; data-key=&quot;2&quot;&gt;&lt;/span&gt; * matern_25(
      norm_l2([
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_epochs&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;3&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_batch_size&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;4&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv1_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;5&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv2_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;6&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv3_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;7&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_dense1_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;8&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_dense2_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;9&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_initial_lr_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;10&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_final_lr_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;11&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_pct_warmup&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;12&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_max_lr&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;13&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_momentum_max_damping_factor&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;14&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_momentum_min_damping_factor_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;15&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_beta1_max_damping_factor&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;16&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_beta1_min_damping_factor_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;17&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_beta2_damping_factor&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;18&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv1_channels&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;19&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv2_channels&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;20&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv3_channels&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;21&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_dense1_units&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;22&quot;&gt;&lt;/span&gt;])),
     
     &lt;span class=&quot;comment&quot;&gt;# Choice parameters&lt;/span&gt;
    &lt;span class=&quot;mixing_weight&quot; data-key=&quot;23&quot;&gt;&lt;/span&gt; * exp(
      -norm_l1([
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot0&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;24&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot1&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;25&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot2&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;26&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot3&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;27&quot;&gt;&lt;/span&gt;]))]),
   
   &lt;span class=&quot;comment&quot;&gt;# Kernel: Joint scalar and choice parameters&lt;/span&gt;
  &lt;span class=&quot;mixing_weight&quot; data-key=&quot;28&quot;&gt;&lt;/span&gt; * prod([
     matern_25(
      norm_l2([
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_epochs&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;29&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_batch_size&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;30&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv1_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;31&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv2_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;32&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv3_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;33&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_dense1_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;34&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_dense2_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;35&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_initial_lr_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;36&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_final_lr_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;37&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_pct_warmup&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;38&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_max_lr&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;39&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_momentum_max_damping_factor&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;40&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_momentum_min_damping_factor_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;41&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_beta1_max_damping_factor&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;42&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_beta1_min_damping_factor_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;43&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_beta2_damping_factor&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;44&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv1_channels&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;45&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv2_channels&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;46&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv3_channels&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;47&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_dense1_units&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;48&quot;&gt;&lt;/span&gt;])),
     exp(
      -norm_l1([
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot0&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;49&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot1&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;50&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot2&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;51&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot3&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;52&quot;&gt;&lt;/span&gt;]))])])&lt;/div&gt;
&lt;/pre&gt;
&lt;p&gt;When comparing a point to itself, add noise value: &lt;em&gt;(log scale)&lt;/em&gt;&lt;/p&gt;
&lt;pre data-key=&quot;noise&quot; class=&quot;noise vexpr&quot; style=&quot;padding-bottom:5px; margin-bottom:0;&quot;&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p style=&quot;margin-top:-20px;&quot;&gt;
&lt;em&gt;&lt;strong&gt;Visualization 3:&lt;/strong&gt; Model parameters now that the priors on the &quot;lengthscales&quot; have been loosened even further.&lt;/em&gt;
&lt;/p&gt;

&lt;p&gt;Compared to Visualization 2 in the original post, the parameters are more free. The discrete change in behavior as the dataset grows as been further reduced. Parameters are now reaching higher absolute values than before.&lt;/p&gt;

&lt;p&gt;How were results impacted?&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/2023-12-04-even-looser.svg&quot; alt=&quot;Cross-validation results with looser priors&quot; /&gt;&lt;/p&gt;
&lt;p style=&quot;margin-top:-20px;&quot;&gt;
&lt;em&gt;&lt;strong&gt;Chart 1:&lt;/strong&gt; Hold-one-out cross-validation results, now adding a third configuration.&lt;/em&gt;
&lt;/p&gt;

&lt;p&gt;There is a significant benefit for very small datasets. However, for most dataset sizes, this leads to worse results. It seems that having priors is important, especially as the dataset gets larger.&lt;/p&gt;

&lt;h2 id=&quot;followup-on-example-2&quot;&gt;Followup on Example 2&lt;/h2&gt;

&lt;p&gt;Here’s a plot of the training run, using batch training again. This time I overlay the mean of all of the models’ losses.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/2023-12-13-parallel-training.svg&quot; alt=&quot;Schematic&quot; /&gt;
&lt;em&gt;&lt;strong&gt;Chart 2:&lt;/strong&gt; Each of the 60 models’ negative losses after each step of training.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We find something even more interesting. Yes, the vast majority of the models converge hundreds of steps before the final model converges. But we also see that individual models often get &lt;em&gt;worse&lt;/em&gt; on single training steps; only the average of all scores is monotonically improving. This may seem normal if you are accustomed to optimization in neural networks, but BoTorch’s optimization (&lt;a href=&quot;https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_l_bfgs_b.html#scipy.optimize.fmin_l_bfgs_b&quot;&gt;scipy.optimize with L-BFGS-B&lt;/a&gt;) is only ever supposed to take a step if it improves the model.&lt;/p&gt;

&lt;p&gt;BoTorch trains multiple models in parallel by &lt;a href=&quot;https://github.com/pytorch/botorch/blob/8f1df5a169f0ff559b232a264aaff7fd3d236d64/botorch/optim/closures/core.py#L65&quot;&gt;adding all of their loss functions&lt;/a&gt; to create a single scalar loss. This type of composability is a valid strategy with optimizers like SGD or Adam, since those follow the gradient wherever it leads. But other optimizers, even if they are gradient-based, decide whether to accept a parameter update depending on the loss at the destination point. For these optimizers, you can’t simply add loss functions, or you will get a change in the optimizer’s behavior. After adding losses, only the &lt;em&gt;sum&lt;/em&gt; of those losses is guaranteed to improve monotonically, while updates can harm the individual losses. For some model types like neural networks, we might describe this as “regularization” and treat it as a &lt;em&gt;good&lt;/em&gt; thing, but in those cases we should just use a different optimizer.&lt;/p&gt;

&lt;p&gt;So, not only are we evaluating converged models unnecessarily; we are taking indirect paths to the optimum. When I count model evaluations, 18,817 total evaluations happen when training sequentially, while 92,820 happen when training in parallel, so &lt;strong&gt;we are doing approximately 5 times too many operations&lt;/strong&gt;. In my experiments, on GPUs it is still worth training in batch, rather than sequential, but on CPUs it is better to do simply loop over models and optimize them independently.&lt;/p&gt;

&lt;p&gt;The effect of all of this is that cross-validation in BoTorch is much slower than it needs to be. As you increase the size of the dataset, the following slowdowns occur:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;More models are trained.&lt;/li&gt;
  &lt;li&gt;Each of those models is more expensive, since it has a larger dataset.&lt;/li&gt;
  &lt;li&gt;The optimization trajectory becomes longer and longer as more models are trained in parallel. &lt;strong&gt;(Unnecessary)&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;More pointless evaluations of converged models occur, due to the increased likelihood of a single random slow-to-converge model. &lt;strong&gt;(Unnecessary)&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These four factors &lt;em&gt;multiply&lt;/em&gt; to create a slow experience.&lt;/p&gt;

&lt;p&gt;I am inclined to implement batch training differently, maybe by implementing a single training run and then using something like JAX’s &lt;a href=&quot;https://jax.readthedocs.io/en/latest/_autosummary/jax.vmap.html&quot;&gt;vmap&lt;/a&gt;. This would eliminate the factor 3, and maybe it could be used in conjunction with JAX’s &lt;a href=&quot;https://jax.readthedocs.io/en/latest/_autosummary/jax.lax.while_loop.html&quot;&gt;while_loop&lt;/a&gt; to also solve factor 4.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(That’s it for the appendix! Here’s a link back to the &lt;a href=&quot;/blocks/2024/01/10/expressions-are-pragmatic-visualizations.html&quot;&gt;main post&lt;/a&gt;.)&lt;/em&gt;&lt;/p&gt;
</description>
        <pubDate>Wed, 10 Jan 2024 01:00:00 -0800</pubDate>
        <link>https://probablymarcus.com/blocks/2024/01/10/expressions-pragmatic-visualizations-extra.html</link>
        <guid isPermaLink="true">https://probablymarcus.com/blocks/2024/01/10/expressions-pragmatic-visualizations-extra.html</guid>
        
        
        <category>blocks</category>
        
      </item>
    
      <item>
        <title>Expressions are Pragmatic Model Visualizations</title>
        <description>&lt;style&gt;
article &gt; h1, article &gt; h2, article &gt; h3 {
  margin-top:30px;
}
&lt;/style&gt;

&lt;script src=&quot;/stuff/2023-12-13-bundle.js?v11&quot; defer=&quot;&quot;&gt;&lt;/script&gt;

&lt;link rel=&quot;preload&quot; href=&quot;/stuff/2023-12-09-visual1.json?v1&quot; as=&quot;fetch&quot; /&gt;

&lt;link rel=&quot;preload&quot; href=&quot;/stuff/2023-12-09-visual2.json?v1&quot; as=&quot;fetch&quot; /&gt;

&lt;link rel=&quot;preload&quot; href=&quot;/stuff/2023-12-09-visual3.json?v1&quot; as=&quot;fetch&quot; /&gt;

&lt;p&gt;Most machine learning models are never visualized. Visualizing a model and its parameters often leads to immediate insights or bugfixes, but getting a good visual requires a lot of one-off work.&lt;/p&gt;

&lt;p&gt;How do we get useful visualizations without requiring too much human overhead? I think &lt;em&gt;code&lt;/em&gt; is underrated as a visualization. In this blog post I show a family of pragmatic visualizations that are each created by simply printing code expressions and rendering their parameters inline. Often, the printed code is not runnable, but is instead a &lt;em&gt;visually optimized&lt;/em&gt; version of the model’s code.&lt;/p&gt;

&lt;p&gt;I start with two real-life examples where these visualizations provided valuable insights, using these examples to demonstrate the visualizations. Then I turn to my main focus: eliminating the one-off work. I share &lt;a href=&quot;https://rows2prose.org&quot;&gt;rows2prose&lt;/a&gt;, a Python library that generates these visuals from a dataframe of model parameters and any styled text (not just code). This leaves only the problems of tracing model parameters to a dataframe and printing visually optimized model code. I show how scientific computing frameworks can help with this, in this case using a Lisp-macro-like approach which I demonstrate with &lt;a href=&quot;https://vexpr.org&quot;&gt;Vexpr&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;example-1-the-model-that-was-good-then-bad-then-good&quot;&gt;Example 1: The Model That Was Good, Then Bad, Then Good&lt;/h2&gt;

&lt;p&gt;Last year I &lt;a href=&quot;/blocks/2022/11/30/hands-on-bayesian-optimization.html&quot;&gt;ran a machine learning experiment&lt;/a&gt; and noticed that the baseline model from &lt;a href=&quot;https://botorch.org&quot;&gt;BoTorch&lt;/a&gt; had a weird curve.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/2023-12-04-just-baseline.svg?v2&quot; alt=&quot;Schematic&quot; /&gt;&lt;/p&gt;

&lt;p style=&quot;margin-top:-20px;&quot;&gt;
&lt;em&gt;&lt;strong&gt;Chart 1:&lt;/strong&gt; Scaling curve for a Gaussian Process performing hold-one-out cross-validation. Given a dataset, the model is trained on all but one point in the dataset, then it predicts the output for the held-out point. I repeat the experiment on 50 different random subsets of a larger dataset and plot the 10th percentile, 90th percentile, and geometric mean.&lt;/em&gt;
&lt;/p&gt;

&lt;p&gt;The model is a Gaussian Process, but that detail really doesn’t matter for this blog post. This is a strange scaling curve for any machine learning model. We would expect a model to continually get better at prediction as it receives larger datasets, with diminishing returns &lt;em&gt;only&lt;/em&gt; at the end. Instead, this one has a big lull in the middle, giving it an almost staircase shape. What could this mean? &lt;em&gt;(Consider pausing here and guessing the reason. I had a guess, and my guess turned out to be wrong.)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;An obvious next step is to look at the model itself, not just the model’s results. How do the final trained model parameters change as we scale up the size of the dataset? I visualize each individual parameter in diagrams like this:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/2023-12-04-visual-legend.svg?v2&quot; alt=&quot;Schematic&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The vertical axis separates different experiments—in this case, different dataset sizes—from top to bottom. The horizontal axis shows parameter values across different repetitions of that experiment. To visualize the entire model, I print a simplified version of the the model’s code, rendering each parameter in-place using one of these diagrams for each parameter.&lt;/p&gt;

&lt;p&gt;Here’s what the model looked like for each point in Chart 1. Feel free to zoom in, but don’t get bogged down in the details, just observe that each parameter had interesting changes about halfway down, coinciding with the interesting changes from the experiment.&lt;/p&gt;

&lt;style&gt;
.string {
color: brown;
}
.comment {
color: green;
}
.r2p-code {
color: gray;
}
circle.point {
fill: blue;
opacity: 0.2;
}

pre.vexpr {
  padding: 0;
  border: 0;
  background-color: initial;
}
&lt;/style&gt;

&lt;div id=&quot;9d89fed0-92f4-11ee-bec3-daa4de339950&quot; style=&quot;padding: 10px; border: 1px solid black; border-radius: 5px; margin-bottom:30px; font-weight:100; position:relative;&quot;&gt;
&lt;h2 style=&quot;text-transform: uppercase;&quot;&gt;Model Visualization&lt;/h2&gt;
&lt;p&gt;Predict using a Gaussian Process with the following covariance and mean.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Covariance kernel:&lt;/strong&gt; Use distance between points as follows:&lt;/p&gt;
&lt;pre id=&quot;cov-kernel&quot; class=&quot;vexpr&quot; style=&quot;overflow: hidden;&quot;&gt;
&lt;div id=&quot;cov-kernel-content&quot; class=&quot;kernel r2p-code&quot; style=&quot;transform: scale(0.48); transform-origin: top left; height:48%;&quot;&gt;&lt;span class=&quot;scale&quot; data-key=&quot;0&quot;&gt;&lt;/span&gt; * sum([
   
   &lt;span class=&quot;comment&quot;&gt;# Kernel: Factorized scalar vs choice parameters&lt;/span&gt;
  &lt;span class=&quot;mixing_weight&quot; data-key=&quot;1&quot;&gt;&lt;/span&gt; * sum([
     
     &lt;span class=&quot;comment&quot;&gt;# Scalar parameters&lt;/span&gt;
    &lt;span class=&quot;mixing_weight&quot; data-key=&quot;2&quot;&gt;&lt;/span&gt; * matern_25(
      norm_l2([
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_epochs&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;3&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_batch_size&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;4&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv1_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;5&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv2_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;6&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv3_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;7&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_dense1_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;8&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_dense2_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;9&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_initial_lr_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;10&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_final_lr_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;11&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_pct_warmup&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;12&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_max_lr&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;13&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_momentum_max_damping_factor&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;14&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_momentum_min_damping_factor_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;15&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_beta1_max_damping_factor&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;16&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_beta1_min_damping_factor_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;17&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_beta2_damping_factor&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;18&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv1_channels&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;19&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv2_channels&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;20&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv3_channels&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;21&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_dense1_units&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;22&quot;&gt;&lt;/span&gt;])),
     
     &lt;span class=&quot;comment&quot;&gt;# Choice parameters&lt;/span&gt;
    &lt;span class=&quot;mixing_weight&quot; data-key=&quot;23&quot;&gt;&lt;/span&gt; * exp(
      -norm_l1([
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot0&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;24&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot1&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;25&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot2&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;26&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot3&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;27&quot;&gt;&lt;/span&gt;]))]),
   
   &lt;span class=&quot;comment&quot;&gt;# Kernel: Joint scalar and choice parameters&lt;/span&gt;
  &lt;span class=&quot;mixing_weight&quot; data-key=&quot;28&quot;&gt;&lt;/span&gt; * prod([
     matern_25(
      norm_l2([
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_epochs&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;29&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_batch_size&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;30&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv1_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;31&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv2_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;32&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv3_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;33&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_dense1_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;34&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_dense2_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;35&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_initial_lr_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;36&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_final_lr_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;37&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_pct_warmup&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;38&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_max_lr&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;39&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_momentum_max_damping_factor&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;40&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_momentum_min_damping_factor_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;41&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_beta1_max_damping_factor&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;42&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_beta1_min_damping_factor_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;43&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_beta2_damping_factor&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;44&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv1_channels&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;45&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv2_channels&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;46&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv3_channels&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;47&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_dense1_units&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;48&quot;&gt;&lt;/span&gt;])),
     exp(
      -norm_l1([
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot0&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;49&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot1&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;50&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot2&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;51&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot3&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;52&quot;&gt;&lt;/span&gt;]))])])&lt;/div&gt;
&lt;/pre&gt;
&lt;p&gt;When comparing a point to itself, add noise value: &lt;em&gt;(log scale)&lt;/em&gt;&lt;/p&gt;
&lt;pre data-key=&quot;noise&quot; class=&quot;noise vexpr&quot; style=&quot;padding-bottom:5px;&quot;&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Mean:&lt;/strong&gt; constant&lt;/p&gt;
&lt;pre data-key=&quot;mean&quot; class=&quot;mean vexpr&quot; style=&quot;padding-bottom:5px; margin-bottom:0;&quot;&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p style=&quot;margin-top:-20px;&quot;&gt;
&lt;em&gt;&lt;strong&gt;Visualization 1:&lt;/strong&gt; All of the parameters of a Gaussian Process model rendered in context. The covariance kernel at the top contains a few parameter types: a multiplicative positive scale (top-left), four multiplicative mixing weights (the other four parameters along the left side), while the rest are &quot;lengthscale&quot; parameters that are used to scale distances. Additionally there are noise and mean parameters (bottom). The noise parameter always ends up being very low, due to the &lt;a href=&quot;https://github.com/pytorch/botorch/blob/449b91180f8b0bd97775fa1f8a1df00e77dfe403/botorch/models/gp_regression_mixed.py#L130&quot;&gt;default prior&lt;/a&gt; pushing it toward zero, and because all the points in the dataset are spaced apart by pseudorandom Sobol generation so the model is never forced to incorporate variance at a single location.&lt;/em&gt;
&lt;/p&gt;

&lt;p&gt;The sudden jump in accuracy in Chart 1 corresponds to a number of jumps in the parameters. Perhaps the most dramatic change was in the top half of the covariance kernel, where we see that a number of parameters stay fixed at about 0.3, until they suddenly jump up to large values.&lt;/p&gt;

&lt;p&gt;This strongly suggests a theory: the model’s priors are too strong. With small dataset sizes, the gradients from better predicting the dataset are not powerful enough to overpower the gradient from the priors. After the dataset size crosses some threshold, the parameters are able to break free.&lt;/p&gt;

&lt;p&gt;Let’s loosen the prior on the parameters and see if the issue is solved.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/2023-12-04-looser-prior.svg?v3&quot; alt=&quot;Cross-validation results with looser priors&quot; /&gt;&lt;/p&gt;
&lt;p style=&quot;margin-top:-20px; margin-bottom:20px;&quot;&gt;
&lt;em&gt;&lt;strong&gt;Chart 2:&lt;/strong&gt; Hold-one-out cross-validation results, comparing the BoTorch baseline to a new configuration with a weaker lengthscale prior.&lt;/em&gt;
&lt;/p&gt;

&lt;p&gt;The model improved significantly. How do its parameters look?&lt;/p&gt;

&lt;div id=&quot;d8f79b18-93bb-11ee-89d6-daa4de339950&quot; style=&quot;padding: 10px; border: 1px solid black; border-radius: 5px; margin-bottom:30px; font-weight:100; position:relative; margin-top:20px;&quot;&gt;
&lt;h2 style=&quot;text-transform: uppercase;&quot;&gt;Model Visualization&lt;/h2&gt;
&lt;p&gt;Predict using a Gaussian Process with the following covariance and mean.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Covariance kernel:&lt;/strong&gt; Use distance between points as follows:&lt;/p&gt;
&lt;pre id=&quot;cov-kernel2&quot; class=&quot;vexpr&quot; style=&quot;overflow: hidden;&quot;&gt;
&lt;div id=&quot;cov-kernel-content2&quot; class=&quot;kernel r2p-code&quot; style=&quot;transform: scale(0.48); transform-origin: top left; height:48%;&quot;&gt;&lt;span class=&quot;scale&quot; data-key=&quot;0&quot;&gt;&lt;/span&gt; * sum([
   
   &lt;span class=&quot;comment&quot;&gt;# Kernel: Factorized scalar vs choice parameters&lt;/span&gt;
  &lt;span class=&quot;mixing_weight&quot; data-key=&quot;1&quot;&gt;&lt;/span&gt; * sum([
     
     &lt;span class=&quot;comment&quot;&gt;# Scalar parameters&lt;/span&gt;
    &lt;span class=&quot;mixing_weight&quot; data-key=&quot;2&quot;&gt;&lt;/span&gt; * matern_25(
      norm_l2([
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_epochs&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;3&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_batch_size&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;4&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv1_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;5&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv2_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;6&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv3_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;7&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_dense1_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;8&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_dense2_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;9&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_initial_lr_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;10&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_final_lr_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;11&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_pct_warmup&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;12&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_max_lr&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;13&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_momentum_max_damping_factor&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;14&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_momentum_min_damping_factor_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;15&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_beta1_max_damping_factor&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;16&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_beta1_min_damping_factor_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;17&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_beta2_damping_factor&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;18&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv1_channels&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;19&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv2_channels&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;20&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv3_channels&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;21&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_dense1_units&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;22&quot;&gt;&lt;/span&gt;])),
     
     &lt;span class=&quot;comment&quot;&gt;# Choice parameters&lt;/span&gt;
    &lt;span class=&quot;mixing_weight&quot; data-key=&quot;23&quot;&gt;&lt;/span&gt; * exp(
      -norm_l1([
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot0&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;24&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot1&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;25&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot2&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;26&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot3&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;27&quot;&gt;&lt;/span&gt;]))]),
   
   &lt;span class=&quot;comment&quot;&gt;# Kernel: Joint scalar and choice parameters&lt;/span&gt;
  &lt;span class=&quot;mixing_weight&quot; data-key=&quot;28&quot;&gt;&lt;/span&gt; * prod([
     matern_25(
      norm_l2([
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_epochs&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;29&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_batch_size&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;30&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv1_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;31&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv2_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;32&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv3_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;33&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_dense1_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;34&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_dense2_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;35&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_initial_lr_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;36&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_final_lr_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;37&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_pct_warmup&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;38&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_max_lr&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;39&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_momentum_max_damping_factor&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;40&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_momentum_min_damping_factor_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;41&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_beta1_max_damping_factor&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;42&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_beta1_min_damping_factor_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;43&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_beta2_damping_factor&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;44&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv1_channels&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;45&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv2_channels&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;46&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv3_channels&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;47&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_dense1_units&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;48&quot;&gt;&lt;/span&gt;])),
     exp(
      -norm_l1([
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot0&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;49&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot1&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;50&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot2&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;51&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot3&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;52&quot;&gt;&lt;/span&gt;]))])])&lt;/div&gt;
&lt;/pre&gt;
&lt;p&gt;When comparing a point to itself, add noise value: &lt;em&gt;(log scale)&lt;/em&gt;&lt;/p&gt;
&lt;pre data-key=&quot;noise&quot; class=&quot;noise vexpr&quot; style=&quot;padding-bottom:5px;&quot;&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Mean:&lt;/strong&gt; constant&lt;/p&gt;
&lt;pre data-key=&quot;mean&quot; class=&quot;mean vexpr&quot; style=&quot;padding-bottom:5px; margin-bottom:0;&quot;&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p style=&quot;margin-top:-20px; margin-bottom:20px;&quot;&gt;
&lt;em&gt;&lt;strong&gt;Visualization 2:&lt;/strong&gt; Model parameters now that the priors on the &quot;lengthscales&quot; have been loosened. Specifically, I changed the prior on the lengthscales—the parameters along the right side—from its &lt;a href=&quot;https://github.com/pytorch/botorch/blob/dccda59d8ef51d8074de82fdb5614bad2db0ee96/botorch/models/utils/gpytorch_modules.py#L30&quot;&gt;default value&lt;/a&gt; &lt;code&gt;Gamma(3.0, 6.0)&lt;/code&gt; to &lt;code&gt;Gamma(1.125, 0.375)&lt;/code&gt;, a distribution with the same mode but higher variance, so during training there is a weaker gradient pushing each parameter toward the mode. (The model tunes its mixing weights to favor the top half of the kernel, so the parameters in the top half are the ones that increase, while those in the bottom are still pulled toward 0.3.)&lt;/em&gt;
&lt;/p&gt;

&lt;p&gt;The discrete change in parameters is now mostly gone. I ran &lt;a href=&quot;/blocks/2024/01/10/expressions-pragmatic-visualizations-extra.html&quot;&gt;further experiments with extra weak priors&lt;/a&gt; to make the discrete change disappear more, and I found that it worked, further improving results for small datasets, however it began harming results for large datasets.&lt;/p&gt;

&lt;p&gt;So I’ve learned that with this model, I’ll get best results if I use weaker priors, at least for small-to-medium datasets. BoTorch / Ax’s built-in priors did not serve me well. That doesn’t mean the default priors are &lt;em&gt;wrong&lt;/em&gt;, rather it suggests that users need to be willing to look closely at their machine learning models if they want to get good results. If using a machine learning model always gave users visualizations like these, I think many more people would use them well.&lt;/p&gt;

&lt;h2 id=&quot;example-2-the-pitfalls-of-parallel-cross-validation&quot;&gt;Example 2: The Pitfalls of Parallel Cross-Validation&lt;/h2&gt;

&lt;p&gt;I think people ought to always see their model, including while it trains. I built this experience for myself, and it quickly provided an insight. Here is a visualization I watched in realtime as my model above trained. This is a batch cross-validation task with 60 datapoints, so I am training 60 models in parallel and visualizing their parameters (hence, multiple dots per parameter). &lt;span id=&quot;click-or-tap&quot;&gt;Click&lt;/span&gt; the button below to watch the models train.&lt;/p&gt;

&lt;script&gt;
if (&quot;ontouchstart&quot; in document.documentElement) { document.getElementById(&quot;click-or-tap&quot;).textContent = &quot;Tap&quot;; }
&lt;/script&gt;

&lt;div style=&quot;padding: 10px; border: 1px solid black; border-radius: 5px; margin-bottom:30px; font-weight:100; position:relative;&quot; id=&quot;e71ccb0e-970e-11ee-93a3-daa4de339950&quot;&gt;
&lt;h2 style=&quot;text-transform: uppercase;&quot;&gt;Model Visualization&lt;/h2&gt;
&lt;div class=&quot;timesteps&quot;&gt;&lt;/div&gt;
&lt;p&gt;Predict using a Gaussian Process with the following covariance and mean.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Covariance kernel:&lt;/strong&gt; Use distance between points as follows:&lt;/p&gt;
&lt;pre id=&quot;cov-kernel3&quot; class=&quot;vexpr&quot; style=&quot;max-height: 40vh; overflow: scroll; border: 1px solid silver; padding: 5px; border-radius: 5px;&quot;&gt;
&lt;div id=&quot;cov-kernel-content3&quot; class=&quot;kernel r2p-code&quot; style=&quot;transform: scale(0.48); transform-origin: top left; height:48%;&quot;&gt;&lt;span class=&quot;scale&quot; data-key=&quot;0&quot;&gt;&lt;/span&gt; * sum([
   
   &lt;span class=&quot;comment&quot;&gt;# Kernel: Factorized scalar vs choice parameters&lt;/span&gt;
  &lt;span class=&quot;mixing_weight&quot; data-key=&quot;1&quot;&gt;&lt;/span&gt; * sum([
     
     &lt;span class=&quot;comment&quot;&gt;# Scalar parameters&lt;/span&gt;
    &lt;span class=&quot;mixing_weight&quot; data-key=&quot;2&quot;&gt;&lt;/span&gt; * matern_25(
      norm_l2([
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_epochs&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;3&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_batch_size&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;4&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv1_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;5&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv2_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;6&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv3_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;7&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_dense1_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;8&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_dense2_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;9&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_initial_lr_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;10&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_final_lr_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;11&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_pct_warmup&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;12&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_max_lr&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;13&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_momentum_max_damping_factor&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;14&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_momentum_min_damping_factor_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;15&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_beta1_max_damping_factor&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;16&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_beta1_min_damping_factor_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;17&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_beta2_damping_factor&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;18&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv1_channels&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;19&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv2_channels&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;20&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv3_channels&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;21&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_dense1_units&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;22&quot;&gt;&lt;/span&gt;])),
     
     &lt;span class=&quot;comment&quot;&gt;# Choice parameters&lt;/span&gt;
    &lt;span class=&quot;mixing_weight&quot; data-key=&quot;23&quot;&gt;&lt;/span&gt; * exp(
      -norm_l1([
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot0&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;24&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot1&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;25&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot2&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;26&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot3&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;27&quot;&gt;&lt;/span&gt;]))]),
   
   &lt;span class=&quot;comment&quot;&gt;# Kernel: Joint scalar and choice parameters&lt;/span&gt;
  &lt;span class=&quot;mixing_weight&quot; data-key=&quot;28&quot;&gt;&lt;/span&gt; * prod([
     matern_25(
      norm_l2([
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_epochs&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;29&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_batch_size&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;30&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv1_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;31&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv2_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;32&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv3_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;33&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_dense1_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;34&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_dense2_weight_decay&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;35&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_initial_lr_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;36&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_final_lr_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;37&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_pct_warmup&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;38&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_max_lr&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;39&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_momentum_max_damping_factor&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;40&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_momentum_min_damping_factor_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;41&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_beta1_max_damping_factor&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;42&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_1cycle_beta1_min_damping_factor_pct&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;43&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_beta2_damping_factor&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;44&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv1_channels&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;45&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv2_channels&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;46&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_conv3_channels&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;47&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;log_dense1_units&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;48&quot;&gt;&lt;/span&gt;])),
     exp(
      -norm_l1([
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot0&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;49&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot1&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;50&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot2&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;51&quot;&gt;&lt;/span&gt;,
         compare(&lt;span class=&quot;string&quot;&gt;&apos;choice_nhot3&apos;&lt;/span&gt;) / &lt;span class=&quot;scale&quot; data-key=&quot;52&quot;&gt;&lt;/span&gt;]))])])&lt;/div&gt;
&lt;/pre&gt;
&lt;p&gt;When comparing a point to itself, add noise value: &lt;em&gt;(log scale)&lt;/em&gt;&lt;/p&gt;
&lt;pre data-key=&quot;noise&quot; class=&quot;noise vexpr&quot; style=&quot;padding-bottom:5px;&quot;&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Mean:&lt;/strong&gt; constant&lt;/p&gt;
&lt;pre data-key=&quot;mean&quot; class=&quot;mean vexpr&quot; style=&quot;padding-bottom:5px; margin-bottom:0;&quot;&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p style=&quot;margin-top:-20px;&quot;&gt;
&lt;em&gt;&lt;strong&gt;Visualization 3:&lt;/strong&gt; Cross-validation with dataset size 60. I used the looser lengthscale prior from above, and I used a &lt;a href=&quot;https://github.com/pytorch/botorch/blob/8f1df5a169f0ff559b232a264aaff7fd3d236d64/botorch/models/utils/gpytorch_modules.py#L45&quot;&gt;different noise prior&lt;/a&gt; that doesn&apos;t endlessly push noise toward 0.&lt;/em&gt;
&lt;/p&gt;

&lt;p&gt;As I watched this in my notebook, I got a sudden impression: many of these points are converging much faster than others. In fact, I think there are hundreds of steps where 59 of the 60 models have converged, and we’re just waiting for the last one. This is particularly evident if you watch the parameters in the lower half of the kernel, where one single faint blue dot slowly approaches the cluster of overlapping dots. This is concerning because all 60 models are being evaluated on every step, even though the last few hundred steps are unnecessary for most of the models.&lt;/p&gt;

&lt;p&gt;I tested this theory by comparing the two training approaches.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/2023-12-13-parallel-vs-sequential-training.svg?v1&quot; alt=&quot;Schematic&quot; /&gt;&lt;br /&gt;
&lt;em&gt;&lt;strong&gt;Chart 3:&lt;/strong&gt; Training trajectory for 60 models trained in batch, compared to that of training each of them separately.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The problem is worse than I thought. Not only do some optimizations finish well before others, but every optimization takes many more steps when trained in batch. When I count model evaluations, 18,817 total evaluations happen when training models one at a time, while 92,820 happen when training in parallel, so &lt;strong&gt;we are doing approximately 5 times too many operations&lt;/strong&gt;. I describe this in more depth in this post’s &lt;a href=&quot;/blocks/2024/01/10/expressions-pragmatic-visualizations-extra.html&quot;&gt;extended material&lt;/a&gt;. I am inclined to implement batch training differently, maybe by implementing a single training run and then using something like JAX’s &lt;a href=&quot;https://jax.readthedocs.io/en/latest/_autosummary/jax.vmap.html&quot;&gt;vmap&lt;/a&gt; in conjunction with JAX’s &lt;a href=&quot;https://jax.readthedocs.io/en/latest/_autosummary/jax.lax.while_loop.html&quot;&gt;while_loop&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I wouldn’t have noticed this problem if I hadn’t been able to see my model during training. Of course, other standard visualizations could have revealed this issue; Chart 3 is fairly standard, and it would have also done the job. But I didn’t have Chart 3, I didn’t know I should be building it, and bulding it is actually difficult and inefficient with BoTorch. I think a visualized expression is a useful jumping off point, and maybe we should try to always have it available to us.&lt;/p&gt;

&lt;h2 id=&quot;a-recipe-for-pragmatic-model-visualization&quot;&gt;A Recipe for Pragmatic Model Visualization&lt;/h2&gt;

&lt;p&gt;How do we give ourselves a playful environment where our models are always visualized by default?&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/2024-01-06-pragmatic-solution/framing.svg&quot; style=&quot;margin-bottom:10px;&quot; alt=&quot;schematic&quot; /&gt;&lt;/p&gt;

&lt;p&gt;I solve part of the problem with a new Python library called &lt;a href=&quot;https://rows2prose.org&quot;&gt;rows2prose&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You give rows2prose two things:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;a dataframe containing scalars that should be visualized&lt;/li&gt;
  &lt;li&gt;a visually-useful string that describes the model (for example, the model’s code), including placeholder text for visualized values&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;/images/2024-01-06-pragmatic-solution/solution1.svg&quot; style=&quot;margin-top:10px;margin-bottom:10px;&quot; alt=&quot;schematic&quot; /&gt;&lt;/p&gt;

&lt;p&gt;You can use rows2prose in Jupyter notebooks, and it can output HTML visualization files from arbitrary Python scripts. It generated every visual in this blog post, and you can &lt;a href=&quot;https://rows2prose.org&quot;&gt;use it today&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;the-remaining-gap-has-multiple-solutions&quot;&gt;The remaining gap has multiple solutions&lt;/h3&gt;

&lt;p&gt;How do you take your model and get a visually-useful summary of it? There are, of course, many ways to do this, including crazy new approaches like asking an LLM to generate one for you.&lt;/p&gt;

&lt;p&gt;But here’s how I did it.&lt;/p&gt;

&lt;p&gt;I created &lt;a href=&quot;https://vexpr.org&quot;&gt;Vexpr&lt;/a&gt;, a Python library that takes inspiration from Lisp. In Vexpr, you build up expression data structures (“Vexprs”) similar to Lisp S-expressions. The expressions you see in these visualizations are simply printed Vexprs.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(&lt;a href=&quot;https://jax.readthedocs.io/en/latest/index.html&quot;&gt;JAX&lt;/a&gt; fans may be familiar with “Jaxprs”; Vexprs are similar, but they are more user-facing. A user of Vexpr is intentionally building up an elegant Vexpr, while a user of JAX doesn’t really care what their Jaxpr looks like. Vexprs and Jaxprs two solve different problems and I plan to use them together.)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Just like Lisp, Vexpr lets you use macros to modify these expressions. For my &lt;a href=&quot;/blocks/2023/10/19/vectorizing-wide-pytorch-expressions.html&quot;&gt;previous post&lt;/a&gt;, I used macros to vectorize expressions, and for this post, I used them to &lt;em&gt;visually optimize&lt;/em&gt; the expressions. For example, the actual Vexpr program for my model uses an elementwise division between two arrays to divide &lt;em&gt;N&lt;/em&gt; different distances by &lt;em&gt;N&lt;/em&gt; different lengthscales, but I wanted to visualize this as &lt;em&gt;N&lt;/em&gt; different divisions, with each parameter rendered next to its corresponding “compare” feature. I changed the code using a macro, thus I actually &lt;em&gt;unvectorized&lt;/em&gt; the division operations to make them prettier. Here’s another example: my printed Vexpr was more verbose than I wanted to be, so I used macros to convert it into pseudocode! Each visualization above contains the function call &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;compare(&apos;log_epochs&apos;)&lt;/code&gt;, which doesn’t actually exist. In the runnable expression, this &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;compare&lt;/code&gt; term is replaced with a larger subexpression that extracts a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&quot;log_epochs&quot;&lt;/code&gt; feature from two vectors (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x1&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x2&lt;/code&gt;) and computes the distance. I wanted a succinct visual, so I “visually optimized” the expression, removing those details, and I never implemented &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;compare&lt;/code&gt;. Code can be much more succinct when it doesn’t actually need to run.&lt;/p&gt;

&lt;p&gt;In addition to macros, another useful idea that filled this gap was &lt;em&gt;partial evaluation&lt;/em&gt;. I take a Vexpr, plug in its parameters, then evaluate all parts of the expression that are ready to be evaluated. This computes the “unvectorize” from the previous paragraph, taking arrays of divisors and indexing into them. This is also useful when machine learning models put constraints on parameters; often they implement constraints by storing “raw” versions of the parameters and passing them through an &lt;em&gt;exp&lt;/em&gt; or &lt;em&gt;sigmoid&lt;/em&gt; to move them into a constrained interval. Partial evaluation moves the values into the constrained interval so that they are ready to be visualized. One thing that made me laugh, putting the ideas of these two paragraphs together: even after converting my runnable Vexpr into pseudocode, I still ran partial evaluation on it, evaluating all expressions that &lt;em&gt;could&lt;/em&gt; be evaluated. It feels funny when you tell your computer to “evaluate the parts of this code that are not pseudocode”.&lt;/p&gt;

&lt;p&gt;Putting all of this together, here is the final architecture underlying these visualizations.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/2024-01-06-pragmatic-solution/solution2.svg&quot; style=&quot;margin-top:10px;margin-bottom:10px;&quot; alt=&quot;schematic&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The “Macros + Partial Evaluation” functionality I used is all present in &lt;a href=&quot;https://vexpr.org&quot;&gt;Vexpr&lt;/a&gt;, but the ideas are still baking. In a future post I might try to convince you to use them.&lt;/p&gt;

&lt;h2 id=&quot;but-what-about-deep-learning&quot;&gt;But what about Deep Learning?&lt;/h2&gt;

&lt;p&gt;This blog post featured human-comprehensible machine learning models like Gaussian Processes. In these models each parameter has a very clear meaning. Is this blog post applicable to Deep Learning?&lt;/p&gt;

&lt;p&gt;First, let me appeal to you that comprehensible models are important, and I think people playing with Deep Learning ought to be among the most enthusiastic users of comprehensible models. Suppose I grant you the extreme position that a deep learning model is a black box that isn’t worth looking into. In that extreme, you have a &lt;em&gt;great&lt;/em&gt; use case for comprehensible models: exploring the space of Deep Learning architectures and training regimes. You get to take the giant space of models and regimes, design your own hand-engineered features of that space like &lt;em&gt;“learning rate”&lt;/em&gt; or &lt;em&gt;“number of attention heads”&lt;/em&gt;, generate your own datasets of experiment results, and conduct symphonies of computers to explore the space. Deep Learning system design is what got me into these models in the first place.&lt;/p&gt;

&lt;p&gt;Regardless, I think it’s possible to build useful, pragmatic visuals for Deep Networks. My main design goals would be: (1.) enable the user to detect when something in the network is broken / not being used, and (2.) put an expression in front of the user to encourage playful tweaking of the architecture.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;These visualizations were immediately useful, and they are pragmatic because they are not specific to any model type. If you can extract a text description of your model, and if you can trace a set of useful-to-visualize scalars, then you can visualize your model.&lt;/p&gt;

&lt;p&gt;I think that &lt;em&gt;somehow&lt;/em&gt; we should give all users of machine learning models access to visuals like these. Using some combination of our shared frameworks, our example code, and our crazy LLM tools, we should take on the responsibility of not only performing our desired computation, but also rendering a useful expression of it.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(This post has &lt;a href=&quot;/blocks/2024/01/10/expressions-pragmatic-visualizations-extra.html&quot;&gt;an appendix&lt;/a&gt;. This project is supported by a GCP cloud compute grant from &lt;a href=&quot;https://mlcollective.org/wiki/ask-mlc-compute-assistance/&quot;&gt;ML Collective&lt;/a&gt;, which has been super helpful. Thanks, also, to Rosanne Liu for useful feedback on drafts of this post.)&lt;/em&gt;&lt;/p&gt;
</description>
        <pubDate>Wed, 10 Jan 2024 01:00:00 -0800</pubDate>
        <link>https://probablymarcus.com/blocks/2024/01/10/expressions-are-pragmatic-visualizations.html</link>
        <guid isPermaLink="true">https://probablymarcus.com/blocks/2024/01/10/expressions-are-pragmatic-visualizations.html</guid>
        
        
        <category>blocks</category>
        
      </item>
    
      <item>
        <title>What happens when you vectorize wide PyTorch expressions?</title>
        <description>&lt;style&gt;
h1, h2, h3 {
  margin-top:30px;
}
&lt;/style&gt;

&lt;p&gt;In scientific computing, code is often naturally expressed as wide, tree-like expressions. Often different branches of that tree contain similar chunks of logic, so there is potential to run many different branches together in parallel vectorized operations. What happens when you take your nice tree-like code and mangle it into hard-to-read vectorized code? How would a person do that?&lt;/p&gt;

&lt;p&gt;I created &lt;a href=&quot;https://vexpr.org&quot;&gt;Vexpr&lt;/a&gt; and used it to take &lt;a href=&quot;/blocks/2022/11/30/hands-on-bayesian-optimization.html&quot;&gt;real experiment&lt;/a&gt; code and convert its &lt;a href=&quot;https://github.com/outergroup/outer-loop-cookbook/blob/585b3b09fc7ac7f254a0cda8ef962670fb4f45fb/mnist_project/src/gp/vexpr_handson_gp.py#L55-L204&quot;&gt;readable expressions&lt;/a&gt; into &lt;a href=&quot;/stuff/2023-10-19-vexpr-compiled-code.txt&quot;&gt;vectorized expressions&lt;/a&gt; at runtime. In this post, I present the results. Topics include:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;What is the immediate impact?&lt;/li&gt;
  &lt;li&gt;How does this relate to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;torch.compile&lt;/code&gt;?&lt;/li&gt;
  &lt;li&gt;What is the more detailed breakdown of the impact on the GPU and CPU?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;introduction-wide-expressions&quot;&gt;Introduction: Wide expressions&lt;/h2&gt;

&lt;p&gt;Mathematical expressions naturally form trees. Here’s a toy example.&lt;/p&gt;

\[\sqrt{a^2 + b^2} + \sqrt{c^2 + d^2}\]

&lt;p&gt;Here’s Python code implementing this expression and highlighting its wide tree-like structure:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;nf&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;math&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;sqrt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;**&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                    &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;**&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])),&lt;/span&gt;
     &lt;span class=&quot;n&quot;&gt;math&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;sqrt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;**&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                    &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;**&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]))])&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;For this simple function, we can write vectorized PyTorch code by hand:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]).&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;pow&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;view&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)).&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dim&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;sqrt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The former code calls &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pow&lt;/code&gt; 4 times, calls &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sum&lt;/code&gt; twice, calls &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sqrt&lt;/code&gt; twice, then calls &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sum&lt;/code&gt; one last time. The latter code flattens the tree into a single pipeline so that one operation occurs for each &lt;em&gt;level&lt;/em&gt; of the tree. Previously we ran one Python function call per &lt;em&gt;node&lt;/em&gt; of the tree, and afterward we run one call per &lt;em&gt;level&lt;/em&gt; of the tree – a number that is exponentially smaller.&lt;/p&gt;

&lt;p&gt;PyTorch is often used in a pipelined way as shown above, but that’s usually because the expression is inherently &lt;em&gt;deep&lt;/em&gt;; neural networks are the obvious example. Here we are concerned with expressions that are &lt;em&gt;wide&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Imagine scaling up this toy example: use actual, larger expressions; let each variable represent a &lt;em&gt;list of vectors&lt;/em&gt;, not just a number; within the expression, call functions like SciPy’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cdist&lt;/code&gt; which compute pairwise distances and return large matrices; do all of this on &lt;em&gt;batches&lt;/em&gt; of inputs. With these changes, we now have a wide expression that is worth running on a GPU.&lt;/p&gt;

&lt;p&gt;This scenario comes up often in scientific computing. For example, the kernel of a Gaussian Process (GP) takes in two lists of vectors and returns pairwise similarities. We &lt;a href=&quot;/blocks/2022/11/30/hands-on-bayesian-optimization.html&quot;&gt;can encode some of our intuition into these kernels&lt;/a&gt; by composing them via weighted sums and products. This leads to giant tree-like expressions, and because those expressions have many similar operations in each parallel branch, there is a lot of potential for vectorization.&lt;/p&gt;

&lt;p&gt;Writing vectorized code for giant expressions is hard, so I created &lt;a href=&quot;https://vexpr.org&quot;&gt;Vexpr&lt;/a&gt; to make it easy. Vexpr takes readable-but-slow PyTorch, NumPy, and JAX expressions and compiles them into fast, &lt;a href=&quot;/stuff/2023-10-19-vexpr-compiled-code.txt&quot;&gt;delightfully ugly&lt;/a&gt; vectorized expressions.&lt;/p&gt;

&lt;h2 id=&quot;how-to-vectorize-one-level-of-an-expression-tree&quot;&gt;How to vectorize one level of an expression tree&lt;/h2&gt;

&lt;p&gt;Wide expressions naturally form a tree. When we vectorize that tree, we create a new narrower expression that invokes a series of Python functions, one for each level of the original tree.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/2023-10-19-tree.svg&quot; alt=&quot;Trees collapsing&quot; /&gt;
&lt;br /&gt;
&lt;br /&gt;
How do we collapse a tree and reduce operations to be one per level? Each of the operations now will have to take in a tensor that contains every input to that level and return a tensor that contains all of the outputs.&lt;/p&gt;

&lt;p&gt;We can see some operations already support such a change.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Elementwise operations like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pow&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sqrt&lt;/code&gt; can run on multiple inputs as-is.&lt;/li&gt;
  &lt;li&gt;Parallel sums can be implemented as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.view(...).sum(dim=0)&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What about more difficult cases? For example, what if the parallel sums are of different lengths? On GPUs, fast parallel reductions only work when inputs all have the same length. Would providing a single-input single-output interface still have a benefit? The answer is yes, even if the operation internally just loops over the sums. Collapsing one level of the tree allows subsequent levels to be collapsed, and low-hanging fruit tends to appear in the deeper levels. Moreover, we can do better than looping over the sums, even when the lengths are different. Vexpr’s vectorizer groups the inputs by length and performs a reduced number of operations—one for each unique length. For example, this decreases the number of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;torch.cdist&lt;/code&gt; operations in my &lt;a href=&quot;/blocks/2022/11/30/hands-on-bayesian-optimization.html&quot;&gt;hands-on kernel&lt;/a&gt; from 49 to 5. These 5 calls happen invisibly inside of a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cdist_multi&lt;/code&gt; function that the code calls once.&lt;/p&gt;

&lt;p&gt;These collapsed operations often introduce some overhead. Batch operations often require permuting tensors (e.g. for batched pairwise distances) or reordering values (e.g. putting equal-length sums next to each other). For operations that are grouped by size, we must split input tensors then re-concatenate the output tensors. Thus, while vectorizing has many benefits, it also introduces extra work for the GPU that is not necessary when using a non-vectorized expression. Is this overhead worth it? Let’s turn to experiments to find out.&lt;/p&gt;

&lt;h2 id=&quot;vectorizing-leads-to-4x-7x-speed-up-on-one-set-of-benchmarks&quot;&gt;Vectorizing leads to 4x-7x speed-up on one set of benchmarks&lt;/h2&gt;

&lt;p&gt;Here I run a pair of benchmarks on an NVIDIA V100 GPU. I describe the benchmarks within the context of real Gaussian Process use cases, but you don’t need to understand Gaussian Processes to understand these results; I am simply running &lt;a href=&quot;/stuff/2023-10-19-vexpr-compiled-code.txt&quot;&gt;this big vectorized expression&lt;/a&gt; on inputs of different shapes.&lt;/p&gt;

&lt;p&gt;When you use Gaussian Processes for Bayesian Optimization, you first fit a GP’s parameters to the training data, then you optimize a set of candidate points to maximize expected improvement. Both of these steps include backpropagating gradients through the GP kernel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benchmark 1:&lt;/strong&gt; Fit the kernel’s parameters. I run the forward and backward pass of the kernel, using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x1.shape == x2.shape == (379, 26)&lt;/code&gt;, i.e. a training set of 379 26-dimensional vectors. I repeat 100 times, then wait for a final CUDA synchronize. I test with and without &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;torch.compile&lt;/code&gt;.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt;Don’t compile&lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt;Compile&lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt; &lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Baseline&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;6.43s&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;3.0s&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;     &lt;em&gt;Compile speed-up: 2.1x&lt;/em&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Vectorized&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;0.99s&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;0.76s&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;     &lt;em&gt;Compile speed-up: 1.3x&lt;/em&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt; &lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;   &lt;em&gt;Vectorize speed-up: 6.5x&lt;/em&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;   &lt;em&gt;Vectorize speed-up: 3.9x&lt;/em&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;     &lt;em&gt;Combined speed-up: 8.5x&lt;/em&gt;&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;br /&gt;
&lt;strong&gt;Benchmark 2:&lt;/strong&gt; The optimization loop. In this experiment, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x1.shape == (60, 2, 26)&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x2.shape == (60, 381, 26)&lt;/code&gt;. That means we’re searching for 60 single candidate points, with an additional point included in each (hence the 2, not 1) because the Noisy Expected Improvement algorithm also gets predictions for a set of previous potentially best points. These two points are appended to 379 training points to give us 381 points. Again I run the kernel’s forward and backward pass 100 times.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt;Don’t compile&lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt;Compile&lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt; &lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Baseline&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;6.03s&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;2.57s&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;     &lt;em&gt;Compile speed-up: 2.3x&lt;/em&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Vectorized&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;0.83s&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;0.63s&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;     &lt;em&gt;Compile speed-up: 1.3x&lt;/em&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt; &lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;   &lt;em&gt;Vectorize speed-up: 7.2x&lt;/em&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;   &lt;em&gt;Vectorize speed-up: 4.1x&lt;/em&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;     &lt;em&gt;Combined speed-up: 9.6x&lt;/em&gt;&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;br /&gt;
In these experiments, vectorizing my kernel caused a 4x speed-up when I used &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;torch.compile&lt;/code&gt;, and a ~7x speed-up with straight pytorch code.&lt;/p&gt;

&lt;h2 id=&quot;but-this-speed-up-doesnt-happen-in-all-benchmarks&quot;&gt;…but this speed-up doesn’t happen in all benchmarks&lt;/h2&gt;

&lt;p&gt;Now I test the kernel’s performance in another scenario: hold-one-out cross-validation. During cross-validation, we fit \(N\) models on \(N\) slightly different datasets. We can test this by just rerunning Benchmark 1 on \(N\) models in parallel. To demonstrate a surprising phenomenon, I set \(N=20\).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benchmark 3:&lt;/strong&gt; Cross-validation. Repeat Benchmark 1, but train 20 models in parallel rather than 1. So we have &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x1.shape == x2.shape == (20, 379, 26)&lt;/code&gt;, and the shapes of every parameter, e.g. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lengthscale&lt;/code&gt;,  have &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(20,)&lt;/code&gt; prepended to them.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt;Don’t compile&lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt;Compile&lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt; &lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Baseline&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;15.6s&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;7.50s&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;   &lt;em&gt;Compile speed-up: 2.1x&lt;/em&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Vectorized&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;17.7s&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;6.15s&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;   &lt;em&gt;Compile speed-up: 2.9x&lt;/em&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt; &lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;   &lt;em&gt;Vectorize &lt;strong&gt;slow-down&lt;/strong&gt;: 1.1x&lt;/em&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;   &lt;em&gt;Vectorize speed-up: 1.2x&lt;/em&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;   &lt;em&gt;Combined speed-up: 2.5x&lt;/em&gt;&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;br /&gt;
This result surprised me. The vectorized version is sometimes &lt;em&gt;slower&lt;/em&gt; than the baseline, at least when you examine only the GP kernel in an isolated benchmark. The end-to-end performance of the larger system is still often faster due to the freed up CPU (see next section), but it is noteworthy that vectorizing code doesn’t always directly speed up that code.&lt;/p&gt;

&lt;p&gt;To understand where the change occurs, I rerun this benchmark for different \(N\). Note that \(N=1\) and \(N=20\) correspond to the original Benchmark 1 and Benchmark 3 results, respectively. I used the same hardware, but I enabled some additional profiling, hence the slower times compared to above.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/2023-10-19-benchmark_fit_VexprHandsOnGP_benchmarkonly.svg&quot; alt=&quot;Benchmark showing diminishing improvement from vectorization&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The vectorized kernel initially has a huge advantage over the baseline, but this advantage diminishes as we give it more parallel work, and without &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;torch.compile&lt;/code&gt; the vectorized kernel is eventually &lt;em&gt;slower&lt;/em&gt; when viewed in isolation. Why does this happen?&lt;/p&gt;

&lt;h2 id=&quot;the-cpu-is-a-bottleneck-vectorizing-removes-that-bottleneck&quot;&gt;The CPU is a bottleneck. Vectorizing removes that bottleneck.&lt;/h2&gt;

&lt;p&gt;To really understand the performance impact of vectorization, we need to understand the CPU and GPU usage before and after. First, note that CUDA / PyTorch are built on an asynchronous relationship between the CPU and GPU, where the CPU ideally should always run ahead, always queueing the GPU’s future work, while the GPU is always working through its queue. However, there are two events that cause the CPU to wait for the GPU:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;When the CPU chooses to read a value from the GPU. This is a decision made by your code.&lt;/li&gt;
  &lt;li&gt;When the CPU gets too far ahead of the GPU. This is a decision made by CUDA.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I &lt;a href=&quot;https://github.com/outergroup/outer-loop-cookbook/blob/585b3b09fc7ac7f254a0cda8ef962670fb4f45fb/mnist_project/profile_performance_test.sh#L86&quot;&gt;profiled the kernel using NVIDIA’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nsys&lt;/code&gt;&lt;/a&gt;, and I used that trace to obtain GPU and CPU active time. My so-called CPU “active” time is actually an inferred value; PyTorch &lt;a href=&quot;https://github.com/pytorch/pytorch/issues/28224&quot;&gt;uses CUDA in a way that spins the CPU 100% constantly&lt;/a&gt;, even when the CPU is just waiting for the GPU, so I use &lt;a href=&quot;https://github.com/outergroup/outer-loop-cookbook/blob/585b3b09fc7ac7f254a0cda8ef962670fb4f45fb/mnist_project/print_nsys_stats.py#L94&quot;&gt;heuristics&lt;/a&gt; to detect these waits and subtract it them from the actual active time. &lt;em&gt;(2023-10-31 update: Thanks to gregjm for &lt;a href=&quot;https://news.ycombinator.com/item?id=38027407&quot;&gt;pointing out&lt;/a&gt; that CUDA offers a way to avoid this CPU-spinning, and that this is actually a PyTorch issue. The initial version of this blog post blamed CUDA for this unnecessary power consumption.)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Here is the same plot from above, but with the CPU and GPU time overlaid.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/2023-10-19-benchmark_fit_VexprHandsOnGP.svg&quot; alt=&quot;Benchmark with CPU and GPU time included. The baseline is CPU-bound until we reach larger input sizes.&quot; /&gt;&lt;/p&gt;

&lt;p&gt;First, to understand the result of Benchmark 1 above, look at the left side of both plots. Both the baseline and vectorized models have low total active GPU time. But the baseline model puts a much larger workload on the CPU, which is responsible for orchestrating the set of operations that are sent to the GPU. Thus, the GPU spends the vast majority of the time idle, waiting for the CPU to give it more work. For the vectorized model the amount of CPU work is almost always less than the amount of GPU work, so the GPU is almost never idle; the benchmark time is roughly equal to the GPU active time.&lt;/p&gt;

&lt;p&gt;Now we focus on the surprising result from Benchmark 3, where the baseline model did slightly &lt;em&gt;better&lt;/em&gt; than the vectorized model. As we scale up the number of models, we see an interesting phenomenon. When training many models simultaneously, even the baseline model is able to keep the GPU busy. Once the GPU active time exceeds CPU active time. one of the key selling points of vectorized code is eliminated, because the nonvectorized code becomes good enough. This was a fun, surprising fact; even unvectorized code can outrun the GPU if you pass in large enough tensors. I expect this phenomenon to occur in other large-batch scenarios like training on very large datasets or doing Bayesian Optimizations with very large sets of candidate points. Of course, the vectorized code is still superior when using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;torch.compile&lt;/code&gt;, and in all cases its CPU usage is far superior.&lt;/p&gt;

&lt;p&gt;Finally, let’s look at the GPU workload. Independent of the effect on CPU, what is the impact of vectorization on the total amount of work that the GPU has to do? Looking at slopes of the “GPU time” lines, we see that for some models, e.g. my non-compiled model, vectorization increases the total workload, and for other models it decreases the workload. I studied the CUDA traces closely and found that vectorization does indeed reduce many aspects of the GPU workload, greatly reducing the number of operations and decreasing the total amount of time spent on the fundamental computations of the algorithm. However it also introduces overhead (mentioned above) by interspersing operations that permute and reorder the tensors, or splitting them into groups then concatenating results. Sometimes the reduced “fundamental” time outweighs the additional overhead, while other times the overhead outweighs the reduction in fundamental time.&lt;/p&gt;

&lt;p&gt;So we see that vectorization has three effects:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;It lets us keep the GPU busy even when inputs are small.&lt;/li&gt;
  &lt;li&gt;It frees up the CPU to do other work.&lt;/li&gt;
  &lt;li&gt;It can slightly change the total amount of GPU work, sometimes for the worse.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The speed-up from vectorization can be great, but it can be underwhelming in scenarios where none of the benefits are needed.&lt;/p&gt;

&lt;h2 id=&quot;the-benefits-of-vectorization-increase-as-gpu-speed-increases&quot;&gt;The benefits of vectorization increase as GPU speed increases&lt;/h2&gt;

&lt;p&gt;Here is a point that follows naturally from everything above, but it might not be immediately obvious. It is quite striking when you experience it.&lt;/p&gt;

&lt;p&gt;As I built Vexpr, I tested on an NVIDIA T4. Then for this experiment, I upgraded to a much-faster NVIDIA V100, and the benefits of vectorization greatly improved.&lt;/p&gt;

&lt;p&gt;You always want your CPU to stay ahead of the GPU so that the GPU is never idle. Code that is good enough to stay ahead of this year’s GPU might not be good enough for next year’s GPU. With each upgrade, you need the dotted CPU line from these charts to be lower and lower.&lt;/p&gt;

&lt;p&gt;This means that vectorizing your code is good strategy for future-proofing it.&lt;/p&gt;

&lt;h2 id=&quot;gpytorchs-structure-kernels-show-similar-results&quot;&gt;GPyTorch’s “structure” kernels show similar results&lt;/h2&gt;

&lt;p&gt;I also tested GPyTorch’s limited support for vectorization. GPyTorch lets you take sets of identically-shaped kernels and run them as a single vectorized kernel. This capability is easy to use when summing single-feature kernels, so I created a partially vectorized kernel by &lt;a href=&quot;https://github.com/outergroup/outer-loop-cookbook/blob/585b3b09fc7ac7f254a0cda8ef962670fb4f45fb/mnist_project/src/gp/botorch_partial_handson_gp.py#L43&quot;&gt;replacing sums of single-feature Matern kernels with single Additive Structure&lt;/a&gt; kernels.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/2023-10-19-benchmark_fit_BotorchPartialHandsOnGP.svg&quot; alt=&quot;Benchmark showing vectorized version is faster than baseline, but this advantage goes away at higher input sizes&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(Don’t focus too much on comparing absolute speed of the Vexpr and the GPyTorch kernels. There are many small differences between the two that have nothing to do with vectorization. For example, unlike my Vexpr kernel, GPyTorch has a nice optimization of putting &lt;a href=&quot;https://github.com/cornellius-gp/gpytorch/blob/43383c2411569bfd3e4417a3918cf53f2c9dbe40/gpytorch/kernels/kernel.py#L492&quot;&gt;this line&lt;/a&gt; before &lt;a href=&quot;https://github.com/cornellius-gp/gpytorch/blob/43383c2411569bfd3e4417a3918cf53f2c9dbe40/gpytorch/kernels/kernel.py#L506&quot;&gt;this line&lt;/a&gt;.)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Vexpr has been optimized much more for this use case than GPyTorch, but we see that the same fundamental phenomena occur. Vectorization leads to wins at small batch sizes, but the advantage diminishes at large batch sizes. In this experiment, vectorization increased the total GPU workload, so with large batch sizes the only advantage of vectorization is a freed up CPU.&lt;/p&gt;

&lt;p&gt;Interestingly, neither GPyTorch kernel ever reaches a point where GPU time is equal to the benchmark time. The GPU always spends at least 1-2 seconds idle. This happens because GPyTorch’s kernels do a &lt;a href=&quot;https://github.com/cornellius-gp/gpytorch/blob/43383c2411569bfd3e4417a3918cf53f2c9dbe40/gpytorch/kernels/kernel.py#L345&quot;&gt;synchronous equality check&lt;/a&gt; on every call, forcing a GPU synchronize, which causes the CPU to fall behind the GPU immediately afterward. So when you use GPyTorch kernels on GPUs, you don’t get the ideal fully asynchronous execution that you’re supposed to get with CUDA / PyTorch. (There is good news: PyTorch has recently introduced &lt;a href=&quot;https://pytorch.org/docs/stable/generated/torch.cuda.set_sync_debug_mode.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;torch.cuda.set_debug_mode(2)&lt;/code&gt;&lt;/a&gt; which detects these unwanted synchronize events. Like &lt;a href=&quot;https://x.com/ID_AA_Carmack/status/1616525615041216513?s=20&quot;&gt;others&lt;/a&gt;, I think every PyTorch library developer should become friends with this API.)&lt;/p&gt;

&lt;h2 id=&quot;closing-thoughts-on-compilation&quot;&gt;Closing thoughts on compilation&lt;/h2&gt;

&lt;p&gt;Everybody agrees that vectorization is good. Is it worth doing even in cases where it requires extra work, like with wide expressions? The results above suggest: most of the time, yes, but with interesting nuances. Vectorization is especially great for making the most of a GPU when your task is running many iterations on smaller batches of input data. For larger-batch scenarios, maybe you’ll only benefit from vectorization after you add JIT compilation, or after you upgrade your GPU, or after you’ve found some use for all the newfound idle CPU time. But, to a first approximation, vectorization is good.&lt;/p&gt;

&lt;p&gt;The real open question for the field remains: for “wide” computation graphs, what is a practical strategy for getting vectorized code?&lt;/p&gt;

&lt;p&gt;My experiments above show that this is not solved by JIT tools like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;torch.compile&lt;/code&gt;, nor do I expect it to be. I think a compiler would need to become a huge slow unreliable hairball to solve this type of auto-vectorization problem. Vexpr’s vectorizer is a compiler that solves this, but &lt;strong&gt;it does so by making the programmer meet the compiler in the middle&lt;/strong&gt;. The programmer gives Vexpr a tree-like expression and tells Vexpr, “You should try vectorizing this.” Both of those pieces of information are valuable: the programmer structures the logic in a certain way, and the programmer indicates that there is opportunity for vectorization there. The programmer doesn’t need to do anything too difficult, and neither does the compiler.&lt;/p&gt;

&lt;p&gt;I think this “meet-in-the-middle” approach between programmer and compiler is a good design principle that leads to good systems. And I think this will become even more true in the age of Large Language Models; rather than cramming too much magic into our compilers, let’s rely on humans-with-LLMs to meet the compiler in the middle.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(This project is supported by a GCP cloud compute grant from &lt;a href=&quot;https://mlcollective.org/wiki/ask-mlc-compute-assistance/&quot;&gt;ML Collective&lt;/a&gt;, which has been super helpful. Thanks, also, to Rosanne Liu for useful feedback on drafts of this post.)&lt;/em&gt;&lt;/p&gt;

&lt;script type=&quot;text/javascript&quot; src=&quot;//cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js&quot;&gt;&lt;/script&gt;

</description>
        <pubDate>Thu, 19 Oct 2023 07:00:00 -0700</pubDate>
        <link>https://probablymarcus.com/blocks/2023/10/19/vectorizing-wide-pytorch-expressions.html</link>
        <guid isPermaLink="true">https://probablymarcus.com/blocks/2023/10/19/vectorizing-wide-pytorch-expressions.html</guid>
        
        
        <category>blocks</category>
        
      </item>
    
      <item>
        <title>Gaussian Processes Extrapolate, Sometimes in Goofy Ways</title>
        <description>&lt;p&gt;Here is a toy function. &lt;em&gt;(To see the code and more plots, check out &lt;a href=&quot;/stuff/2023-03-28-extrapolate-notebook.html&quot;&gt;this notebook&lt;/a&gt;.)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/2023-03-28-problem-small.svg&quot; alt=&quot;A plot showing a set of dots making an arc shape, with some stochastic dropoffs toward the edges of the arch, and a very high orange point amongst these stochastic dropoffs&quot; /&gt;&lt;/p&gt;

&lt;p style=&quot;margin-top:-20px;&quot;&gt;
&lt;em&gt;&lt;strong&gt;Figure 1:&lt;/strong&gt; 80 random observations of a deterministic function (black) and the predicted maximal point in that function (orange), according to a Gaussian process trained on those 80 observations.&lt;/em&gt;
&lt;/p&gt;

&lt;p&gt;Intuitively, it seems clear that this function’s highest value probably occurs when &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; is in the center region. But a Gaussian Process (GP) thinks the highest value is out in a more mediocre region. This isn’t just a explore-exploit trade-off; the GP thinks the &lt;em&gt;expected&lt;/em&gt; value is high at that orange point – higher than any value the GP has ever seen before!&lt;/p&gt;

&lt;p&gt;I ran into this issue while exploring a real loss function, tuning hyperparameters with Bayesian Optimization. I found that my GP kept insisting there would be excellent results at unpromising points like the one above. The model sent me through hundreds of iterations of whack-a-mole, so I investigated the issue and came up with this toy example that makes the issue obvious.&lt;/p&gt;

&lt;p&gt;This issue occurs because:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Gaussian Processes extrapolate, and they do it over an inferred distance, or “lengthscale”.&lt;/li&gt;
  &lt;li&gt;Sudden drop-offs in the function cause the GP to choose a low lengthscale. &lt;em&gt;(This issue is most pronounced when testing deterministic functions. With noisy functions, these sudden drop-offs can be characterized as noise.)&lt;/em&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p style=&quot;margin-bottom:30px;&quot;&gt;
So the GP makes local extrapolations that appear goofy given more global context.
&lt;/p&gt;

&lt;h2 id=&quot;intuition-why-gaussian-processes-extrapolate&quot;&gt;Intuition: Why Gaussian Processes extrapolate&lt;/h2&gt;

&lt;p&gt;Consider a simple scenario with two observations and one prediction.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/2023-03-28-simple-extrapolate-white.svg&quot; style=&quot;display:block; margin-left: auto; margin-right: auto;&quot; alt=&quot;Three random variables along an axis. A and B are black dots increasing upward. C has no observation, just an orange rectangle with a question mark.&quot; /&gt;&lt;/p&gt;
&lt;p style=&quot;margin-top:-5px;&quot;&gt;
&lt;em&gt;&lt;strong&gt;Figure 2:&lt;/strong&gt; If you were a Gaussian process, what would you predict? You are given three random variables, A, B, and C, and the observed values for the first two. What is your prediction for C?&lt;/em&gt;
&lt;/p&gt;

&lt;p&gt;Gaussian Processes work by modeling this as a 3D multivariate Gaussian, using distance to determine correlation between variables. This clever trick transports us from thinking about functions over arbitrary spaces to thinking about Gaussian distributions with a finite number of variables.&lt;/p&gt;

&lt;p&gt;Imagine another multivariate Gaussian that matches this correlation structure. Consider the performance of athletes in three different sports: Swimming (&lt;em&gt;A&lt;/em&gt;), Cycling (&lt;em&gt;B&lt;/em&gt;), and Running (&lt;em&gt;C&lt;/em&gt;). Suppose correlations match distances in the chart above, with cov(&lt;em&gt;A&lt;/em&gt;, &lt;em&gt;B&lt;/em&gt;) = high, cov(&lt;em&gt;B&lt;/em&gt;, &lt;em&gt;C&lt;/em&gt;) = high, cov(&lt;em&gt;A&lt;/em&gt;, &lt;em&gt;C&lt;/em&gt;) = medium-to-low. In other words, elite swimmers are usually good cyclists but aren’t always great at running. Elite cyclists are usually good swimmers and runners. Elite runners are usually good cyclists but might not be great at swimming. A large set of unobserved latent causal factors underlie &lt;em&gt;A&lt;/em&gt;, &lt;em&gt;B&lt;/em&gt;, and &lt;em&gt;C&lt;/em&gt;, and we don’t attempt to model these explicitly.&lt;/p&gt;

&lt;p&gt;Suppose an athlete is a good cyclist. If we are given the extra context that she is a bad swimmer, does that increase the probability that she is a good runner? It depends on the actual covariance values, but there are definitely possible realities where that answer is yes. She doesn’t have the latent causal factors that make a good swimmer, and these latent factors also tend to make a good cyclist, yet she is still a good cyclist, so we expect she &lt;em&gt;especially&lt;/em&gt; has the latent causal factors that make a good runner, to compensate.&lt;/p&gt;

&lt;p&gt;If &lt;em&gt;B&lt;/em&gt; is a medium value and &lt;em&gt;A&lt;/em&gt; is low, we expect &lt;em&gt;C&lt;/em&gt; to compensate for &lt;em&gt;A&lt;/em&gt;. That&apos;s why Gaussian Processes extrapolate. &lt;em&gt;(You can also show this by inverting a covariance matrix, but I like this playful explanation. See the appendix for something more rigorous.)&lt;/em&gt;&lt;/p&gt;

&lt;p style=&quot;margin-bottom:30px;&quot;&gt;This is a good thing, usually.&lt;/p&gt;

&lt;h2 id=&quot;why-extrapolation-sometimes-causes-goofy-predictions&quot;&gt;Why extrapolation sometimes causes goofy predictions&lt;/h2&gt;

&lt;p&gt;Let’s zoom in on the section with the goofy prediction.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/2023-03-28-zoom-small.svg&quot; alt=&quot;The same plot as the first figure, with an indicator overlaying it showing the region we are zooming into, the space around the organge point. Below this top plot is the zoomed plot, which shows only five points: 4 black points, and the high orange point. Together, they make a bit of an arch shape.&quot; /&gt;&lt;/p&gt;
&lt;p style=&quot;margin-top:-20px;&quot;&gt;
&lt;em&gt;&lt;strong&gt;Figure 3:&lt;/strong&gt; Zooming into the previous chart, we can see that the high prediction can be viewed locally as an extrapolation. Moreover, it is extrapolating from both left and right, which explains why the predicted value is extra high.&lt;/em&gt;
&lt;/p&gt;

&lt;p&gt;The GP is essentially making predictions using only local data. It is doing this because fitting the GP on this dataset caused the GP to treat everything extremely locally, i.e. it selected a very small “lengthscale”. It chose a small lengthscale because the function contains discrete drop-off points. Tiny changes in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; sometimes yield large changes in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;f(x)&lt;/code&gt;, and the GP is trying to accommodate this.&lt;/p&gt;

&lt;p style=&quot;margin-bottom:30px;&quot;&gt;This leads to something that is undesirable in hyperparameter tuning: if ever you discover a sudden drop-off in performance from mediocre results to very-bad results, the GP will get excited that maybe there are very-good results immediately on the opposite side of the mediocre results. In my experience, it ends up spending nearly all of its time testing these crazy theories.&lt;/p&gt;

&lt;h2 id=&quot;what-does-this-mean-what-can-i-do&quot;&gt;What does this mean? What can I do?&lt;/h2&gt;

&lt;p&gt;I’m not sure yet what is the best way to handle this.&lt;/p&gt;

&lt;p&gt;One solution is to switch to using a Matern kernel with \(\nu = 0.5\), rather than the often-used value \(\nu = 2.5\). This specifies that the underlying function is not differentiable, allowing for sharp changes in direction on predictions.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/2023-03-28-matern-0.5.svg&quot; alt=&quot;A plot whether the predictions are much mare aligned with the observations&quot; /&gt;
&lt;em&gt;&lt;strong&gt;Figure 4:&lt;/strong&gt; Modifying the Matern kernel allows for sharp changes in the prediction’s slope.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Here’s a &lt;a href=&quot;https://andrewcharlesjones.github.io/journal/matern-kernels.html&quot;&gt;nice blog post&lt;/a&gt; on that subject. However, making this change runs the risk of hurting predictions – after all, these sudden dropoffs only occur in some parts of the space, and often those parts of the space are irrelevant.&lt;/p&gt;

&lt;p&gt;It might be worth adopting the following view:  a deterministic function with sudden discrete changes in output are &lt;strong&gt;bad news&lt;/strong&gt; for a Gaussian Process. A broad solution is: figure out how to get rid of them.&lt;/p&gt;

&lt;p&gt;You could do this using clamping / winsorizing tricks on your data; imagine taking the whole bottom row of points from Figure 1 and clamping them to be equal to the middle row.&lt;/p&gt;

&lt;p&gt;Alternately, you could change your function so that it is not deterministic, for example by randomly initializing parameters or introducing something analogous to stochastic gradient descent. Then, presumably, these random dropoffs will occur randomly across a whole range, not at specific points, and the GP will capture this as observation noise. Moving away from a deterministic function has other benefits anyway, e.g. greatly reducing the risk of overfitting your hyperparameters to the validation set.&lt;/p&gt;

&lt;p style=&quot;margin-bottom:30px;&quot;&gt;One takeaway is that you really need to pay attention to what your Gaussian Process is doing. It will not always automatically do what you consider intuitive.&lt;/p&gt;

&lt;h2 id=&quot;appendix&quot;&gt;Appendix&lt;/h2&gt;

&lt;p&gt;In case you’re interested, here are the GP predictions at other points, with standard deviations:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/2023-03-28-full-gp-small2.svg&quot; alt=&quot;The same plot as figure 1, now with a continuous prediction line and standard deviations&quot; /&gt;&lt;/p&gt;

&lt;p&gt;To get more rigorous with the simple A, B, C example, suppose we assign constant prior mean 0, as is common in Gaussian Processes. Denoting \(\Sigma_{XY} : cov(X,Y)\), the expected value for C is:&lt;/p&gt;

\[E\left[C \mid A=a, B=b\right] = \frac{a(\Sigma_{AC}\Sigma_{BB} - \Sigma_{AB}\Sigma_{BC}) + b(\Sigma_{BC}\Sigma_{AA} - \Sigma_{BA}\Sigma_{AC})}{\Sigma_{AA}\Sigma_{BB} - \Sigma_{AB}^2}\]

&lt;p&gt;This can be derived from &lt;a href=&quot;http://krasserm.github.io/2018/03/19/gaussian-processes/&quot;&gt;these equations&lt;/a&gt;, specifically the one for \(\boldsymbol \mu_*\).&lt;/p&gt;

&lt;p&gt;It is difficult to understand this function at a glance, but it is interesting to simply ask, “What happens as \(a\) increases?” We see that: if the correlative chain from \(A \to B \to C\) is strong, but the direct correlation between \(A \to C\) is weak, i.e. \(\Sigma_{AB}\Sigma_{BC} &amp;gt; \Sigma_{AC}\Sigma_{BB}\), then increasing \(a\) will decrease \(E[C \mid ...]\). Thus, reducing \(a\) in isolation will increase \(E[C \mid ...]\), which is an example of extrapolation.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(This work was supported by &lt;a href=&quot;https://mlcollective.org&quot;&gt;ML Collective&lt;/a&gt; via their &lt;a href=&quot;https://mlcollective.org/wiki/ask-mlc-compute-assistance/&quot;&gt;donated GCP compute resources&lt;/a&gt; which have been super helpful. Thanks, also, to Rosanne Liu for useful feedback on early drafts of this post.)&lt;/em&gt;&lt;/p&gt;

&lt;script type=&quot;text/javascript&quot; src=&quot;//cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js&quot;&gt;&lt;/script&gt;

</description>
        <pubDate>Tue, 28 Mar 2023 02:00:00 -0700</pubDate>
        <link>https://probablymarcus.com/blocks/2023/03/28/gp-extrapolation-sometimes-goofy.html</link>
        <guid isPermaLink="true">https://probablymarcus.com/blocks/2023/03/28/gp-extrapolation-sometimes-goofy.html</guid>
        
        
        <category>blocks</category>
        
      </item>
    
      <item>
        <title>Maybe Bayesian Optimization Should Be Harder, Not Easier</title>
        <description>&lt;p&gt;&lt;em&gt;(2023-11-01 updates: Refreshed the charts, other small tweaks, posted &lt;a href=&quot;https://github.com/outergroup/outer-loop-cookbook/tree/main/mnist_project&quot;&gt;reproducible experiments&lt;/a&gt;.)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This post has two parts:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;A fun point of view that I find compelling&lt;/li&gt;
  &lt;li&gt;A project that follows from that point of view, with some early results&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;point-of-view&quot;&gt;Point of view&lt;/h2&gt;

&lt;p&gt;When you tune an AI model’s design and its training regime, you are exploring a search space. Bayesian Optimization is a framework that naturally arises when you try to automate your intuitive process of exploring a search space. The promise of Bayesian Optimization is: &lt;em&gt;“If I can tell a computer my beliefs about this search space, then the computer can perform the search better than I can.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;From what I’ve seen, AI / ML people tend to feel burned by their experiences with Bayesian Optimization. Often I hear stories where they once invested a bunch of time into trying it on toy projects, but it never managed to cross over and be useful in their actual work where they continue to use manual search and non-sophisticated brute force searches. Often they feel guilty about all the time they wasted trying Bayesian Optimization.&lt;/p&gt;

&lt;p&gt;Here is a slightly contrarian view on why Bayesian Optimization has not yet become ubiquitous.&lt;/p&gt;

&lt;p&gt;Lots of people have tried to increase adoption of Bayesian Optimization by making it easier, creating products and libraries that hide the messy details. My hunch is that existing Bayesian Optimization solutions block / discourage the user from taking ownership of the process. In particular, there is a lot of unrealized potential in giving users more powerful ways to state their beliefs about a search space. I think what is needed is a set of tools / documentation / cookbooks designed for people who are willing to put in the time to master their tools. Existing tools could grow to fill this role.&lt;/p&gt;

&lt;p&gt;I think Bayesian Optimization ought to be less one-size-fits-all. I think it should be more &lt;em&gt;hands-on&lt;/em&gt;.&lt;/p&gt;

&lt;h3 id=&quot;taking-ownership-of-your-search&quot;&gt;Taking ownership of your search&lt;/h3&gt;

&lt;p&gt;If you use a one-size-fits-all Bayesian model, your search will be data-intensive and will often be much less efficient than a manual search. Results will be much better if you invest some time telling the model your priors about your parameters. This is more fun than you might expect.&lt;/p&gt;

&lt;p&gt;Consider: when you manually explore a search space, you have some intuition about which details of the model are “orthogonal”. For example, hyperparameters describing a neural network’s architecture seem like they are roughly independent of hyperparameters that describe the training regime, and we can afford to optimize these groups of parameters almost independently. Meanwhile, hyperparameters like &lt;em&gt;“momentum”&lt;/em&gt; and &lt;em&gt;“learning rate”&lt;/em&gt; are intertwined, and it is important to explore their joint space. A one-size-fits-all Bayesian Optimization model totally lacks this intuition. Fortunately, it is easy to input these beliefs to a Gaussian Process (GP). If you have two groups of parameters that are independent, simply dedicate a kernel to each one, then add the kernels together. If they are dependent, multiply the kernels together. If it’s somewhere in between, do both, and allow the model to adapt to the data. If you run your own Bayesian Optimization code locally, this is all trivial to implement, but it is impossible in any Bayesian Optimization products that I’ve seen. (Some products use a fully AutoML approach to infer these dependencies, but this is a data-intensive process.)&lt;/p&gt;

&lt;p&gt;In owning your search, another important degree of freedom is choosing useful features for the GP kernel. Your hyperparameters &lt;a href=&quot;/blocks/2022/06/26/gaussian-processes-basis-dependent.html&quot;&gt;might not be a good basis for a GP&lt;/a&gt;. You might want to configure your GP to transform the hyperparameter vector into a basis that enables the GP to make better predictions with fewer data points, maybe simply by adding a few redundant features to the hyperparameter vectors that &lt;em&gt;may&lt;/em&gt; prove useful. This insight can be combined with the “orthogonal” trick above to greatly reduce the effective size of the search space.&lt;/p&gt;

&lt;p&gt;Opportunities also lie in having fine-grained control over the optimization loop. For example, I have found it useful to create a &lt;a href=&quot;/outerloop/bayesian-ticktock-loss-cost.html&quot;&gt;“tick-tock” optimization loop that alternates between multiple objectives&lt;/a&gt; (accuracy and training time).&lt;/p&gt;

&lt;h2 id=&quot;the-project-is-hands-on-bayesian-optimization-worthwhile&quot;&gt;The project: Is hands-on Bayesian Optimization worthwhile?&lt;/h2&gt;

&lt;p&gt;This question has multiple facets, each involving wearing a different hat:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Is it usable? What is the right user interface? What is the overall on-ramp / learning process for a person motivated to learn hands-on Bayesian Optimization?&lt;/li&gt;
  &lt;li&gt;Is it fast? I am proposing we use more complicated kernels. Is that tractible, from a performance standpoint?&lt;/li&gt;
  &lt;li&gt;Does it work? Are there considerable benefits in hands-on Bayesian Optimization, relative to a one-size-fits-all approach? Is it much better than random search?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;is-it-usable&quot;&gt;Is it usable?&lt;/h3&gt;

&lt;p&gt;There is a user interface problem here that needs solving. The first UI to build is a first-class programming interface. Later we can consider whether a graphical UI is needed.&lt;/p&gt;

&lt;p&gt;I draw inspiration from &lt;a href=&quot;https://d3js.org&quot;&gt;D3&lt;/a&gt;, a library for data visualization. In a world of people trying to build “easy” visualization tools, D3 took a different approach. D3 doesn’t generate visualizations; D3 helps you program your own visualizations. Rather than exposing some magic function that outputs a customized pre-built visualization, D3 provides a useful set of primitives, then it gets out of the way and lets you write code. It takes time to learn, but once you have learned it, you are powerful. It’s usually a tortoise-and-the-hare phenomenon; the person who chooses D3 ends up winning the race (and winning with style).&lt;/p&gt;

&lt;p&gt;I think &lt;a href=&quot;http://botorch.org&quot;&gt;BoTorch&lt;/a&gt; (with &lt;a href=&quot;https://gpytorch.ai&quot;&gt;gpytorch&lt;/a&gt;) is a good start at filling this role for Bayesian Optimization, but it’s not finished. It provides the core algorithmic tools of Bayesian Optimization, but doesn’t provide a way for describing a search space (for example). When you read botorch tutorials, they essentially recommend that you don’t use botorch directly, but instead use a wrapper like &lt;a href=&quot;https://ax.dev&quot;&gt;Ax&lt;/a&gt;, but Ax is more of a one-size-fits-all framework, adding a lot of unnecessary friction to hands-on Bayesian Optimization. I would rather see botorch grow outward into a usable library, giving you all the pieces you need to write your own Bayesian Optimization. They could aim to be the D3 of Bayesian Optimization. I found myself needing to “finish” botorch with my own makeshift library that adds search spaces with conditional parameters, an expanded set of input transform utilities, and random search utilities.&lt;/p&gt;

&lt;h3 id=&quot;is-it-fast&quot;&gt;Is it fast?&lt;/h3&gt;

&lt;p&gt;I am proposing that we take one-size-fits-all GP kernels and replace them with customized compositions of kernels. I was worried that these compositions of kernels would be slow, and, yes, by default, they are. But if you write them efficiently, they are very fast. I ended up writing a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WeightedSPSMaternKernel&lt;/code&gt;, i.e. a “weighted sum of products of sums Matern kernel” which runs many kernels in parallel. This satisfied my needs.&lt;/p&gt;

&lt;p&gt;Otherwise, I’m also proposing we start performing transformations on feature vectors. This is already a supported use case and is fast.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Optional reading:&lt;/em&gt; In case you want details, here is my code that describes a 22-dimensonal hyperparameter space and how it is transformed into a search space (for random search / optimization) and then into a feature space (for the GP). &lt;a href=&quot;https://github.com/outergroup/vexpr&quot;&gt;Vexpr&lt;/a&gt; and &lt;a href=&quot;https://github.com/outergroup/outerloop&quot;&gt;outerloop&lt;/a&gt; are libraries I created. Vexpr makes it possible to write fast, readable compositional GP kernels, and outerloop attempts to “finish” botorch.&lt;/p&gt;

&lt;script&gt;
function toggle_code_display() {
    let checkbox = document.getElementById(&quot;expand-code-checkbox&quot;),
        node = document.getElementById(&quot;search-space-code&quot;);

    node.style.maxHeight = checkbox.checked ? null : &quot;160px&quot;;
};
&lt;/script&gt;

&lt;div style=&quot;text-align:right; position:relative; top:10px; right:10px; height: 0;&quot;&gt;
&lt;input id=&quot;expand-code-checkbox&quot; type=&quot;checkbox&quot; name=&quot;expand_code&quot; onclick=&quot;toggle_code_display();&quot; /&gt; &lt;label for=&quot;expand-code-checkbox&quot;&gt;Expand code view&lt;/label&gt;
&lt;/div&gt;

&lt;div id=&quot;search-space-code&quot; style=&quot;overflow:scroll; margin-bottom:20px; max-height:172px;&quot;&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;botorch&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gpytorch&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;outerloop&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;outerloop.vexpr.torch&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ovt&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vexpr&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vp&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vexpr.torch&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vtorch&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vexpr.custom.torch&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vctorch&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;parameter_space&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Choice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;optimizer&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;adam&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;sgd&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Choice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;nesterov&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;False&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
              &lt;span class=&quot;n&quot;&gt;condition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;lambda&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;choices&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;choices&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;optimizer&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;sgd&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;epochs&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;60&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;batch_size&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4096&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Scalar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;conv1_weight_decay&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1e-7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;3e-1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Scalar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;conv2_weight_decay&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1e-7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;3e-1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Scalar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;conv3_weight_decay&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1e-7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;3e-1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Scalar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;dense1_weight_decay&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1e-7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;3e-1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Scalar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;dense2_weight_decay&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1e-7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;3e-1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Scalar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;1cycle_initial_lr_pct&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;80&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Scalar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;1cycle_final_lr_pct&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;30000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Scalar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;1cycle_pct_warmup&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.01&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Scalar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;1cycle_max_lr&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.01&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;20.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Scalar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;1cycle_max_momentum&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.9999&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
              &lt;span class=&quot;n&quot;&gt;condition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;lambda&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;choices&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;choices&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;optimizer&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;sgd&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Scalar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;1cycle_min_momentum_pct&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
              &lt;span class=&quot;n&quot;&gt;condition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;lambda&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;choices&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;choices&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;optimizer&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;sgd&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;conv1_channels&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;conv2_channels&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;128&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;conv3_channels&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;256&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;dense1_units&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;256&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# Transform to log-space
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;xform&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;transforms&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;ToScalarSpace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;space&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;transforms&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;log&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;({&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;epochs&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_epochs&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;batch_size&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_batch_size&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;1cycle_initial_lr_pct&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_1cycle_initial_lr_pct&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;1cycle_final_lr_pct&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_1cycle_final_lr_pct&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;1cycle_max_lr&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_1cycle_max_lr&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;1cycle_pct_warmup&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_1cycle_pct_warmup&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;1cycle_momentum_max_damping_factor&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_1cycle_momentum_max_damping_factor&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;1cycle_momentum_min_damping_factor_pct&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_1cycle_momentum_min_damping_factor_pct&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;1cycle_beta1_max_damping_factor&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_1cycle_beta1_max_damping_factor&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;1cycle_beta1_min_damping_factor_pct&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_1cycle_beta1_min_damping_factor_pct&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;beta2_damping_factor&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_beta2_damping_factor&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;conv1_weight_decay&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv1_weight_decay&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;conv2_weight_decay&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv2_weight_decay&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;conv3_weight_decay&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv3_weight_decay&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;dense1_weight_decay&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_dense1_weight_decay&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;dense2_weight_decay&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_dense2_weight_decay&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;conv1_channels&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv1_channels&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;conv2_channels&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv2_channels&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;conv3_channels&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv3_channels&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;dense1_units&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_dense1_units&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;})&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;


&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;VexprHandsOnLossModel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;botorch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;models&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SingleTaskGP&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;__init__&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...)&lt;/span&gt;

       &lt;span class=&quot;c1&quot;&gt;# Transforms for the kernel
&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;xforms&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;transforms&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;append_mean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
                &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv1_channels&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv2_channels&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                 &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv3_channels&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_dense1_units&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
                &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_gmean_channels_and_units&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;transforms&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;subtract&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
                &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv1_channels&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv1_channels_div_gmean&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                 &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv2_channels&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv2_channels_div_gmean&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                 &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv3_channels&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv3_channels_div_gmean&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                 &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_dense1_units&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_dense1_units_div_gmean&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;
                &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_gmean_channels_and_units&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;transforms&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
                &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_1cycle_initial_lr_pct&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_1cycle_initial_lr&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                 &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_1cycle_final_lr_pct&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_1cycle_final_lr&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;
                &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_1cycle_max_lr&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;transforms&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
                &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_1cycle_momentum_min_damping_factor_pct&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
                 &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_1cycle_momentum_min_damping_factor&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;
                &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_1cycle_momentum_max_damping_factor&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;transforms&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
                &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_1cycle_beta1_min_damping_factor_pct&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
                 &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_1cycle_beta1_min_damping_factor&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;
                &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_1cycle_beta1_max_damping_factor&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;transforms&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;append_mean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
                &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv1_weight_decay&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv2_weight_decay&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                 &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv3_weight_decay&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_dense1_weight_decay&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                 &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_dense2_weight_decay&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
                &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_gmean_weight_decay&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;transforms&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;subtract&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
                &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv1_weight_decay&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv1_wd_div_gmean&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                 &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv2_weight_decay&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv2_wd_div_gmean&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                 &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv3_weight_decay&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv3_wd_div_gmean&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                 &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_dense1_weight_decay&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_dense1_wd_div_gmean&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                 &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_dense2_weight_decay&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_dense2_wd_div_gmean&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;
                &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_gmean_weight_decay&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
            &lt;span class=&quot;nf&quot;&gt;partial&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;transforms&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ChoiceNHotProjection&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                    &lt;span class=&quot;n&quot;&gt;out_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;N_HOT_PREFIX&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;

        &lt;span class=&quot;c1&quot;&gt;# ...
&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;make_handson_kernel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;space&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;batch_shape&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()):&lt;/span&gt;
    &lt;span class=&quot;sh&quot;&gt;&quot;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;
    This kernel attempts to group parameters into orthogonal groups, while
    also always allowing for the model to learn to use the joint space.
    &lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&quot;&quot;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;zero_one_exclusive&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;partial&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gpytorch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;constraints&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Interval&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                                 &lt;span class=&quot;mf&quot;&gt;1e-6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                                 &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1e-6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;state&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;State&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;batch_shape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;ialloc&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;IndexAllocator&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;lengthscale&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;symbol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;lengthscale&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;x1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;symbol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;x1&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;x2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;symbol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;x2&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;index_for_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;p&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;enumerate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;space&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;scalar_kernel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;names&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;ls_indices&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ialloc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;allocate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;names&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;indices&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;index_for_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;names&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ovt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;matern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;vtorch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;cdist&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[...,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;indices&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lengthscale&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ls_indices&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
                         &lt;span class=&quot;n&quot;&gt;x2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[...,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;indices&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lengthscale&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ls_indices&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
                         &lt;span class=&quot;n&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;nu&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;2.5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;choice_kernel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;names&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;ls_indices&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ialloc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;allocate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;names&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;indices&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;index_for_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;names&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ovt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;matern&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;vtorch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;cdist&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[...,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;indices&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lengthscale&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ls_indices&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
                         &lt;span class=&quot;n&quot;&gt;x2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[...,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;indices&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lengthscale&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ls_indices&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
                         &lt;span class=&quot;n&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;nu&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;2.5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;


    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;scalar_factorized_and_joint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;names&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;suffix&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;w_additive&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;symbol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;w_additive&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;suffix&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;alpha_factorized_or_joint&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;symbol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;alpha_factorized_or_joint&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;
                                              &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;suffix&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;state&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;allocate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;w_additive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;names&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),),&lt;/span&gt;
                       &lt;span class=&quot;nf&quot;&gt;zero_one_exclusive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt;
                       &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;priors&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;DirichletPrior&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;torch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;full&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;names&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),),&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;2.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)))&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;state&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;allocate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;alpha_factorized_or_joint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt;
                       &lt;span class=&quot;nf&quot;&gt;zero_one_exclusive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt;
                       &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;priors&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;BetaPrior&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;4.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vtorch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;vctorch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;heads_tails&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;alpha_factorized_or_joint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vtorch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;stack&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;vtorch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
                    &lt;span class=&quot;n&quot;&gt;w_additive&lt;/span&gt;
                    &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vtorch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;stack&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;scalar_kernel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
                                    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;names&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
                                   &lt;span class=&quot;n&quot;&gt;dim&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
                    &lt;span class=&quot;n&quot;&gt;dim&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
                &lt;span class=&quot;nf&quot;&gt;scalar_kernel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;names&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
            &lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dim&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;dim&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;regime_kernels&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;():&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;
            &lt;span class=&quot;c1&quot;&gt;# kernel: regime choice parameters
&lt;/span&gt;            &lt;span class=&quot;nf&quot;&gt;choice_kernel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;sa&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;N_HOT_PREFIX&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;
                           &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)]),&lt;/span&gt;

            &lt;span class=&quot;c1&quot;&gt;# kernel: lr schedule
&lt;/span&gt;            &lt;span class=&quot;nf&quot;&gt;scalar_factorized_and_joint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
                &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_1cycle_initial_lr&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_1cycle_final_lr&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                 &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_1cycle_max_lr&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_1cycle_pct_warmup&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
                &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;_lr&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;

            &lt;span class=&quot;c1&quot;&gt;# kernel: momentum schedule
&lt;/span&gt;            &lt;span class=&quot;nf&quot;&gt;scalar_factorized_and_joint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
                &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_1cycle_momentum_max_damping_factor&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                 &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_1cycle_momentum_min_damping_factor&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                 &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_1cycle_beta1_max_damping_factor&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                 &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_1cycle_beta1_min_damping_factor&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                 &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_beta2_damping_factor&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
                &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;_momentum&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;

            &lt;span class=&quot;c1&quot;&gt;# kernel: relative weight decay
&lt;/span&gt;            &lt;span class=&quot;nf&quot;&gt;scalar_factorized_and_joint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
                &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv1_wd_div_gmean&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv2_wd_div_gmean&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                 &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv3_wd_div_gmean&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_dense1_wd_div_gmean&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                 &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_dense2_wd_div_gmean&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
                &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;_wd&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;regime_joint_names&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_epochs&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_batch_size&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                          &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_gmean_weight_decay&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;architecture_kernels&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;():&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;
            &lt;span class=&quot;c1&quot;&gt;# kernel: lr schedule
&lt;/span&gt;            &lt;span class=&quot;nf&quot;&gt;scalar_factorized_and_joint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv1_channels_div_gmean&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                                         &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv2_channels_div_gmean&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                                         &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_conv3_channels_div_gmean&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                                         &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_dense1_units_div_gmean&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
                                        &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;_units_channels&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;architecture_joint_names&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;log_gmean_channels_and_units&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;regime_kernel&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vctorch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;fast_prod_positive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;vtorch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;stack&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(([&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;scalar_kernel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;regime_joint_names&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)]&lt;/span&gt;
                      &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;regime_kernels&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()),&lt;/span&gt;
                     &lt;span class=&quot;n&quot;&gt;dim&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;dim&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;architecture_kernel&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vctorch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;fast_prod_positive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;vtorch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;stack&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(([&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;scalar_kernel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;architecture_joint_names&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)]&lt;/span&gt;
                      &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;architecture_kernels&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()),&lt;/span&gt;
                     &lt;span class=&quot;n&quot;&gt;dim&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;dim&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;joint_kernel&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vctorch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;fast_prod_positive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;vtorch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;stack&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(([&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;scalar_kernel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;regime_joint_names&lt;/span&gt;
                                     &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;architecture_joint_names&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)]&lt;/span&gt;
                      &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;regime_kernels&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
                      &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;architecture_kernels&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()),&lt;/span&gt;
                     &lt;span class=&quot;n&quot;&gt;dim&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;dim&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;alpha_regime_vs_architecture&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;symbol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;alpha_regime_vs_architecture&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;alpha_factorized_vs_joint&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;symbol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;alpha_factorized_vs_joint&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;scale&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;symbol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;scale&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;state&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;allocate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;alpha_regime_vs_architecture&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt;
                   &lt;span class=&quot;nf&quot;&gt;zero_one_exclusive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt;
                   &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;priors&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;BetaPrior&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;2.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;2.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;state&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;allocate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;alpha_factorized_vs_joint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt;
                   &lt;span class=&quot;nf&quot;&gt;zero_one_exclusive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt;
                   &lt;span class=&quot;n&quot;&gt;ol&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;priors&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;BetaPrior&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;4.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;state&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;allocate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;scale&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt;
                   &lt;span class=&quot;n&quot;&gt;gpytorch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;constraints&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;GreaterThan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;1e-4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
                   &lt;span class=&quot;n&quot;&gt;gpytorch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;priors&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;GammaPrior&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;2.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.15&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;kernel&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;scale&lt;/span&gt;
              &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vtorch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vctorch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;heads_tails&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;alpha_factorized_vs_joint&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
                           &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vtorch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;stack&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
                               &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vtorch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
                                   &lt;span class=&quot;n&quot;&gt;vctorch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;heads_tails&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
                                       &lt;span class=&quot;n&quot;&gt;alpha_regime_vs_architecture&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
                                   &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vtorch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;stack&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;
                                       &lt;span class=&quot;n&quot;&gt;regime_kernel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                                       &lt;span class=&quot;n&quot;&gt;architecture_kernel&lt;/span&gt;
                                   &lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dim&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
                                   &lt;span class=&quot;n&quot;&gt;dim&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
                                &lt;span class=&quot;n&quot;&gt;joint_kernel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
                               &lt;span class=&quot;n&quot;&gt;dim&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
                           &lt;span class=&quot;n&quot;&gt;dim&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;state&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;allocate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lengthscale&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ialloc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,),&lt;/span&gt;
                   &lt;span class=&quot;n&quot;&gt;gpytorch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;constraints&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;GreaterThan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;1e-4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
                   &lt;span class=&quot;n&quot;&gt;gpytorch&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;priors&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;GammaPrior&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;3.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;6.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kernel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;state&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;modules&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;/div&gt;

&lt;p&gt;You can read the rest of the code &lt;a href=&quot;https://github.com/outergroup/outer-loop-cookbook/tree/main/mnist_project&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;does-it-work&quot;&gt;Does it work?&lt;/h3&gt;

&lt;p&gt;My initial results are promising, but my baseline (the default BoTorch model) has some issues that might need to be ironed out.&lt;/p&gt;

&lt;p&gt;I ran 160 semi-random MNIST LeNet training experiments on a 22-dimensional hyperparameter space. I fed the results to a Gaussian Process and checked its predictive power for held out experiments, performing leave-one-out cross-validation.&lt;/p&gt;

&lt;p&gt;For a baseline one-size-fits-all model, I was hoping to just use Botorch’s default &lt;a href=&quot;https://github.com/pytorch/botorch/blob/dcb2ba401ccc246e97c8aac48a7c7792f1ce3621/botorch/models/gp_regression_mixed.py&quot;&gt;MixedSingleTaskGP&lt;/a&gt; off the shelf, but it does not use priors on its parameters and is hence prone to overfit and output extreme low log probabilities for held-out results. I &lt;a href=&quot;https://github.com/outergroup/outer-loop-cookbook/blob/main/mnist_project/src/gp/botorch_mixed_gp.py&quot;&gt;added&lt;/a&gt; priors from one of their &lt;a href=&quot;https://github.com/pytorch/botorch/blob/dcb2ba401ccc246e97c8aac48a7c7792f1ce3621/botorch/models/gp_regression.py&quot;&gt;other models&lt;/a&gt;, and this partially solved the issue.&lt;/p&gt;

&lt;p&gt;For the hands-on model, I designed a kernel off the top of my head, trying to capture some of my intuition about my hyperparameters. This was without any iteration, I just guessed at what kernel might work, chose priors that seemed sensible, and now I’m sharing my first results.&lt;/p&gt;

&lt;p&gt;As a sanity-check, here is the raw data for cross-validation 60 of the 160 experiments. To generate each point, the model is trained on 59 configurations and outputs, then must predict the 60th output, given its configuration. (Click or zoom to see details.)&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/images/2023-11-01-raw-data.svg&quot;&gt;&lt;img src=&quot;/images/2023-11-01-raw-data.svg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here are the results in aggregate, using two different success metrics, and plotted for different subsets of the 160 experiments. I gathered this on 50 different shuffles of the data and I plot the mean.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/images/2023-11-01-aggregate.svg&quot;&gt;&lt;img src=&quot;/images/2023-11-01-aggregate.svg&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On average, for small-to-medium datasets the “hands-on” model assigned higher probability density to the “correct” output for the held-out data. (It is important to use geometric mean, i.e. take the mean of the &lt;em&gt;log&lt;/em&gt; probability, so that we are sure to penalize very low probabilities.) When using the model in MLE mode (maximum likelihood estimator) to output a single prediction, the hands-on model on average has lower prediction error.&lt;/p&gt;

&lt;p&gt;Toward the end, the hands-on model falls behind the baseline. The MLE predictions aren’t obviously worse, but the mean log probability is, suggesting that hands-on model gives more confident predictions than it ought to. Thus, there is room for improvement. Still, it is promising that my first try was this good. I suspect a little bit of iteration on the kernel would lead to one that is superior for all dataset sizes.&lt;/p&gt;

&lt;p&gt;Early in the chart, the baseline displays a strange phenomenon where it fails to improve as it receives more data. It is worth doing some due diligence here and try to understand what is going on. Maybe the baseline has a low-hanging possible improvement, and maybe that improvement could also be brought to the hands-on model.&lt;/p&gt;

&lt;p&gt;My conclusions from this experiment are:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The initial results are promising, but not conclusive.&lt;/li&gt;
  &lt;li&gt;The weird baseline phenomenon illustrates my point that you should be careful treating a Bayesian Optimization as a black box. I suspect Bayesian Optimization is only worthwhile if we figure out how to make users take ownership of their search.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;(This project is supported by a GCP cloud compute grant from &lt;a href=&quot;https://mlcollective.org/wiki/ask-mlc-compute-assistance/&quot;&gt;ML Collective&lt;/a&gt;, which has been super helpful.)&lt;/em&gt;&lt;/p&gt;
</description>
        <pubDate>Wed, 30 Nov 2022 01:00:00 -0800</pubDate>
        <link>https://probablymarcus.com/blocks/2022/11/30/hands-on-bayesian-optimization.html</link>
        <guid isPermaLink="true">https://probablymarcus.com/blocks/2022/11/30/hands-on-bayesian-optimization.html</guid>
        
        
        <category>blocks</category>
        
      </item>
    
  </channel>
</rss>
