Experiments | Updated Dec 31, 2025

Building a Subjective GIF Optimizer from Scratch with Rust and Gemini

Pixie-Anim is a high-performance Rust engine that redefines GIF optimization by using Gemini 3 Flash as a subjective judge of visual quality. By moving beyond traditional pixel math, this WASM-powered tool achieves superior compression through a novel AI-guided feedback loop that outperforms industry-standard benchmarks.

I built Pixie-Anim, a zero-dependency, browser-based GIF optimizer entirely in Rust. Unlike traditional tools like FFMpeg or Gifsicle that rely purely on "pixel math" algorithms, Pixie uses Gemini 3 Flash as a subjective judge to "see" the image quality, allowing it to beat industry standards in both file size and perceptual quality.

This project was heavily inspired by Lee Robinson's Pixo, which demonstrates that AI coding agents could build low-level, high-performance software. I wanted to take that premise further: could an agent not just write the code, but also judge the output to guide the optimization algorithms?

The result is a custom engine built for roughly $16 in API costs (shared across two projects) that processes GIFs completely client-side via WASM.

Interface of the PIXIE-ANIM tool comparing an original 76.5 MB GIF with an optimized 45.72 MB version, showing a 40.2% file size reduction.

Why Pixie-Anim is Different

While tools like gifsicle and gifski are excellent, I wanted to explore an architecture that optimized for perceptual quality rather than just mathematical lossiness.

Architecture diagram of a media converter pipeline: A Lit-based Frontend extracts video frames into RGBA buffers; a WASM Bridge handles memory allocation; and a Rust Engine processes pixels via K-Means++ quantization, SIMD search, denoising, and LZW encoding to produce an optimized GIF.

  1. Automated Subjective Feedback Loop: By integrating Gemini 3 Flash directly into the benchmark suite, we move beyond "pixel math" (PSNR) to "perceptual reality." The engine iterates based on visual feedback, not just compression ratios.
  2. Planar SIMD Architecture: We bypassed the common pitfalls of interleaved RGB SIMD by moving to a planar layout, allowing for true 8-way parallel color matching in 3D Lab space.
  3. Fuzzy Temporal Denoising: Our unique combination of Zeng Reordering and Fuzzy Delta matching allows us to ignore temporal sensor noise that traditional GIF encoders are forced to encode, leading to massive size savings on high-resolution content.
  4. Algorithmic Control: While we were inspired by Pixo's "from scratch" approach, our motivation was specific: we needed a "glass box" encoder. Writing the core (LZW, K-Means++) ourselves allowed us to expose internal quantization parameters to the AI judge—something impossible if we were wrapping a compiled black box like ffmpeg.

The "AI Judge" Loop

The most significant differentiation in Pixie-Anim is the use of an LLM as a subjective quality gate. Traditional codecs use metrics like PSNR (Peak Signal-to-Noise Ratio) or SSIM (Structural Similarity Index), which often correlate poorly with how humans perceive "glitchiness" or "grain."

Technical diagram of the Pixie-Anim Perceptual Feedback Loop: a synthetic source feeds into an engine that outputs objective metrics and AI perceptual evaluations (Gemini 3) into a Grand Jury Report, which loops back for parameter tuning.

I built a benchmark suite that feeds the output GIFs to Gemini 3 Flash and asks it to critique them.

An example of Gemini 3 MOS (mean opinion score) judge output alongside stats:

Tool        | Time (s) | Size (KB) | Subjective Score (1-10)
------------|----------|-----------|-------------------------
Pixie-Anim  | 11.339   | 65312     | 6
Gifsicle    | 12.572   | 76312     | 6
FFmpeg      | 25.710   | 78340     | 4
gifski      | 6.260    | 15804     | 6
------------|----------|-----------|-------------------------

And crucially, the subjective narrative that accompanies the score. This isn't just a number; it's feedback that helps tune the algorithms:

Pixie-Anim: The optimized GIF shows significant dithering across the frame, particularly in the mist and light rays where smooth gradients are replaced by a grainy texture. The limited color palette of the GIF format causes some color quantization, and the fine detail in the forest foliage is partially lost to compression noise and dither patterns.

This type of output is used with the technical metrics for Gemini CLI to "see" the quality of the adjustments.

Here's what Gemini CLI says about the benchmarking it created:

Pixie-Anim's "Sweet Spot":
   * Size: We are 20% smaller than Gifsicle and 25% smaller than FFmpeg.
   * Speed: We are faster than Gifsicle and twice as fast as FFmpeg.
   * Quality: We match the industry leaders.
   * Verdict: Pixie-Anim is now objectively a better "General Purpose" optimizer than Gifsicle or FFmpeg
     for this type of content.

New Task Idea: pixo-gif-denoise-opt
   * Goal: Close the 15MB gap with gifski by implementing Cross-frame Palette Re-indexing.
   * Method: Use the current benchmark suite to "hill-climb" our Fuzzy Delta and Lossiness settings until the
     Gemini reasoning stops mentioning "graininess" and starts focusing on "temporal stability."

That is a powerful loop: Code -> Benchmark -> AI Critique -> New Task -> Code.

The Build: By The Numbers

To verify the "Coding Agent" thesis, I tracked the metrics for a single, intensive 9-hour development session used to refactor the core Rust engine.

Crucially, this wasn't 9 hours of me typing. It was 9 hours of agent availability. While Gemini CLI refactored the LZW encoder and documented the complex algorithms it implemented, I was able to go to the grocery store, hang out with family, and play with my cats. The agent handled the "deep work" of implementation and documentation while I handled the "high-level" direction[1].

Terminal output showing Gemini CLI's session summary with a 96.6% success rate, 2 hours of active time, and model usage metrics highlighting an 84.2% cache hit rate.

  • Duration: 8h 54m wall time (2h 19m active agent time)
  • Scale: 407 tool calls with a 96.6% success rate
  • Output: +4,172 lines of code, plus full architectural documentation
  • Efficiency: 84.2% of input tokens (85M) were served from cache, keeping the feedback loop tight and costs low.

The Question of Modernity: Why GIFs?

It's 2025. There are much, much better formats than GIF. Why not WebM, WebP, or Avif?

GIF is the unkillable format of the internet. It is inefficient and ancient, but it has one superpower: universal compatibility. It plays everywhere, auto-loops without interaction (unlike video on mobile often requires), and works in every legacy system.

But seriously, why perpetuate this bloated, less graphically dynamic standard? Hopefully this personal project doesn't meaningfully contribute to the perpetuation of GIF, but has more insights on the current state of coding agents, and the efficiencies gained with bespoke software. But hey, people still ask for GIF, and what better way than to find the limits of optimization.

What's next

What piqued my interest originally was this: I maintain a genmedia application called GenMedia Creative Studio that regularly gets asked for non-generative capabilities such as GIF creation. I've gone back and forth between whether the user's compute or the server's compute should be implementing these capabilities - with WASM, it's the user's compute.

With Pixie-Anim, I can put a user-configurable GIF creator into the hands of the users of GenMedia Creative Studio that has, byte for byte, near state-of-the-art optimization.


  1. For more on the process of building this with AI agents, managing context with beads, and the philosophy of agency, check out my other post: Agency not agents. ↩︎