One source, three GPUs, and a browser: putting a native UI on WebGPU

The Vel playground at vel.kadarla.com/play is the same engine that draws the native macOS app, compiled to WebAssembly and pointed at the browser's GPU. Not a re-implementation, not a canvas2d fallback, not a screenshot service — the literal C++ widget tree, running in your tab, rendering through WebGPU.

The surprising part isn't that it works. It's how little code the browser target needed, and one deployment property that makes it genuinely cheap to host.

Dawn is the portability layer, so the browser is just another backend

I wrote about the platform seam being two functions. The web is the cleanest demonstration of why that design pays off. Native platforms hand the GPU a window handle; the browser binds the GPU to an HTML canvas by CSS selector. So SurfaceWeb.cpp barely does anything:

// Web (Emscripten) surface glue. There is no window handle — the surface is
// bound to the "#canvas" element in Surface.cpp. So we only return a non-null
// sentinel so the validity check passes; resize is driven by the JS host.
void* attachNativeSurface(GLFWwindow* window) {
    if (!window) return nullptr;
    return reinterpret_cast<void*>(0x1);  // sentinel: "canvas-backed"
}
void resizeNativeSurface(void*, int, int) {}

The reason this is enough: I build Lume on Dawn, Google's WebGPU implementation. Natively, Dawn translates my wgpu:: calls to Metal, D3D12, or Vulkan. On the web, Emscripten ships emdawnwebgpu — a port of the exact same webgpu.h API that forwards to the browser's real WebGPU device. So the engine code doesn't change. The WGSL shaders don't change. The instanced-rect pipeline that draws every shape doesn't change. They all compile to WASM and talk to a GPU that happens to live behind the browser instead of behind the kernel.

There's no #ifdef __EMSCRIPTEN__ in the paint code. The web is a backend, not a rewrite — the same way Windows was.

The blocking loop problem, and the header you don't need

A native app loop is allowed to block. Vel's idle path literally parks the thread in glfwWaitEventsTimeout and burns ~0 CPU until an event arrives. You cannot do that on the web: blocking the main thread freezes the tab.

The usual answer is threads — run your loop on a Web Worker, use SharedArrayBuffer to talk to the main thread. But SharedArrayBuffer is the expensive choice, and not for the reason people expect. Since Spectre, browsers only expose it when the page is cross-origin isolated, which means you must serve these two headers on every response:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

Those headers are quietly hostile. require-corp means every cross-origin resource the page loads — fonts, images, analytics, an embedded iframe — must opt in with its own CORP/COEP headers or it's blocked. It breaks third-party embeds. It means you can't just drop the build on a static CDN and link it. And it makes the playground hard to embed in another page (like the docs).

So I went the other way: single-threaded, with ASYNCIFY. ASYNCIFY is an Emscripten transform that rewrites the WASM so a "blocking" call can actually unwind the stack, yield to the browser's event loop, and resume later. The engine keeps its natural blocking-loop shape in C++; ASYNCIFY makes that cooperate with the event loop instead of freezing it. No worker, no SharedArrayBuffer, and therefore no COOP/COEP headers at all.

The payoff is operational: the playground is four static files (index.html, index.js, a ~3 MB index.wasm, and an app.html). It hosts on plain Vercel with no special headers, and it embeds in the docs as an ordinary <iframe>. The preview pane you see is a real nested iframe running its own WebGPU device, with source streamed in over postMessage — which is only possible because nothing requires cross-origin isolation.

HiDPI falls out for free

One detail I like: the canvas backing size is set to CSS size × devicePixelRatio by the JS host, and Surface reconfigures the wgpu surface to match. That's the same physical-pixel rule the native text rasterizer uses — so text on the web is snapped to device pixels and stays crisp on Retina, using the identical code path as the desktop app. Cross-platform consistency isn't a goal I chase; it's a consequence of there being one renderer.

What it costs

ASYNCIFY isn't free. It instruments the binary, which adds size and a small per-call overhead on the functions that can unwind — you don't want it everywhere, so you scope which calls it applies to. Single-threaded also means exactly that: no offloading layout or decode to a worker, so a genuinely heavy frame has nowhere to hide. For a UI that idles at ~0 CPU and lays out 10k rows in ~2 ms that's fine; for a compute-heavy app it would be a real ceiling.

And the honest caveat: this is WebGPU, so it needs a recent browser. Chrome and Edge have had it on by default since 113; Safari shipped it; Firefox is partial. A blank canvas almost always means "this browser doesn't have WebGPU enabled," which is a worse failure mode than a 2D fallback would be — I chose fidelity over reach.

But the thing I set out to prove held up: porting a native GPU UI to the browser was a 20-line surface file and a build-flag decision, not a parallel web codebase. The hard part of "write once, run everywhere" was never the rendering. It was refusing to let the platforms leak into the parts that aren't platform-specific.