catswarm: 1000 GPU-rendered procedural cats on your desktop

How I built a transparent desktop overlay that renders 1000 procedural cats in a single draw call using Rust, wgpu, and DirectComposition.

Posted Feb 9, 2026 Updated Feb 9, 2026

By Trent Sterling

4 min read

My wife looked at my monitor and said “I want more cats.” So I built a transparent desktop overlay that renders 1000 procedural cats in a single draw call. They chase your mouse, play with each other, form nap clusters, and maintain personal space.

The transparency problem

Making a transparent overlay on Windows is surprisingly gnarly. The typical approach — WS_EX_LAYERED with UpdateLayeredWindow — gives you GDI-based alpha blending, which means no GPU acceleration and no hope of rendering 1000 anything at 60 FPS.

The trick is DirectComposition. Instead of compositing through GDI, you tell wgpu to create a DX12 swapchain via DxgiFromVisual, which plugs directly into the Desktop Window Manager’s composition tree. Combined with PreMultiplied alpha mode, you get true per-pixel transparency with full GPU rendering.

The recipe: No WS_EX_LAYERED. No with_transparent(true). Instead: Dx12SwapchainKind::DxgiFromVisual + CompositeAlphaMode::PreMultiplied + DwmExtendFrameIntoClientArea(-1). Clear to (0,0,0,0) each frame.

I also had to disable DWM’s NC rendering, border color, and corner rounding via DwmSetWindowAttribute — without this, Windows draws a thin white border around the “invisible” window. The final piece: WS_EX_TOOLWINDOW hides the window from the taskbar, and set_cursor_hittest(false) makes it click-through.

One draw call for all cats

Every cat shares a single quad (4 vertices, 6 indices). The cats themselves are differentiated entirely by a 24-byte instance buffer: position, size, color, and animation frame. The GPU reads one instance per cat and the fragment shader draws the appropriate SDF silhouette.

  
// Per-instance data: 24 bytes
pub struct CatInstance {
    pub position: [f32; 2],  // screen pixels
    pub size: f32,           // scale multiplier
    pub color: u32,          // RGBA packed
    pub frame: u32,          // 0=sitting, 1=walking, 2=sleeping
    pub _pad: u32,
}

The shader evaluates three different SDF poses — a sitting silhouette with pointed ears, a walking shape with extended legs, and a sleeping loaf. Each is a combination of circles, ellipses, and triangles composed with smooth-min blending. It’s not pixel art, it’s not sprites — it’s pure math running on the GPU.

ECS: data-oriented cat simulation

Each cat is a hecs entity with components for position, velocity, behavior state, personality, and appearance. The simulation runs at a fixed 60Hz timestep with interpolated rendering — the accumulator pattern ensures smooth visuals regardless of monitor refresh rate.

Behavior is a simple state machine: Idle, Walking, Running, Sleeping, Grooming, ChasingMouse, ChasingCat, Playing. Transitions are weighted by personality — lazy cats sleep more, energetic cats run more, curious cats chase the mouse more often.

This was the fun part. The spatial hash grid (128px cells, 1024 buckets) gets rebuilt every tick and enables O(1) neighbor queries. But the tricky bit is that hecs doesn’t allow mutable world access during iteration, so you can’t just “look at your neighbor and react.”

The snapshot cache pattern

During spatial hash rebuild, I cache a flat Vec<CatSnapshot> alongside the grid. Each snapshot holds entity handle, position, behavior state, and personality — about 40 bytes per cat, 40KB total for 1000 cats, which fits comfortably in L1 cache.

The interaction system runs in three phases:

Steer active — cats already in ChasingCat/Playing states update their velocity toward their target. Two-pass: read positions into a buffer, then write new velocities.
Phase read — iterate all cats, query neighbors from the spatial hash, compute separation forces and social interaction decisions. This phase never touches the ECS world — it reads only from the snapshot cache and writes commands to a buffer.
Phase write — apply separation velocities and execute interaction commands (start play, start chase, flee, join nap). State guards prevent overwriting important states like ChasingMouse.

The result: cats that play together, chase and flee from each other, form cozy nap clusters near sleeping neighbors, and push apart when they get too close. All with zero allocations per frame.

Performance

1000 cats, single draw call, ~47 FPS at 4096x2160 on an RTX 5070 Ti. The simulation budget stays under 2ms. Spatial hash rebuild is sub-millisecond. The bottleneck is purely fill rate at 4K — at 1080p it runs well over 60 FPS.

Key decisions that keep it fast:

Fixed-size instance buffer (pre-allocated for 4096 cats, write-only from CPU)
Spatial hash with multiplicative hashing — no tree traversal, no allocations
Snapshot cache eliminates per-neighbor ECS lookups
dist_sq early-out avoids sqrt for most separation checks
Pair deduplication (my_idx < neighbor_idx) halves social interaction evaluations
fastrand::Rng for all randomness — no allocation, no syscalls

What’s next

The cats need to become aware of your actual windows — walking on title bars, sitting on the taskbar, jumping between windows. I also want better procedural visuals (animation frames, tail swishing, eye blinks) and a system tray icon for settings.

But honestly? My wife is already happy with the current version. Sometimes the best feature is just making the cats bigger.

Try it

git clone https://github.com/TrentSterling/catswarm
cd catswarm
cargo run --release

Requires Windows 10/11 with a DX12 GPU and Rust toolchain. Press ESC to quit.

Links:

Projects, devblog