Building a Subjective GIF Optimizer from Scratch with Rust and Gemini

I built Pixie-Anim, a zero-dependency, browser-based GIF optimizer entirely in Rust. Unlike traditional tools like FFMpeg or Gifsicle that rely purely on "pixel math" algorithms, Pixie uses Gemini 3 Flash as a subjective judge to "see" the image quality, allowing it to beat industry standards in both file size and perceptual quality.

Try it: https://ghchinoy.github.io/pixie-anim/

Get it on crates.io https://crates.io/crates/pixie-anim

This project was heavily inspired by Lee Robinson's Pixo, which demonstrates that AI coding agents could build low-level, high-performance software. I wanted to take that premise further: could an agent not just write the code, but also judge the output to guide the optimization algorithms?

The result is a custom engine built for roughly $16 in API costs (shared across two projects) that processes GIFs completely client-side via WASM.

Interface of the PIXIE-ANIM tool comparing an original 76.5 MB GIF with an optimized 45.72 MB version, showing a 40.2% file size reduction.

Why Pixie-Anim is Different

While tools like gifsicle and gifski are excellent, I wanted to explore an architecture that optimized for perceptual quality rather than just mathematical lossiness.

Architecture diagram of a media converter pipeline: A Lit-based Frontend extracts video frames into RGBA buffers; a WASM Bridge handles memory allocation; and a Rust Engine processes pixels via K-Means++ quantization, SIMD search, denoising, and LZW encoding to produce an optimized GIF.

Automated Subjective Feedback Loop: By integrating Gemini 3 Flash directly into the benchmark suite, we move beyond "pixel math" (PSNR) to "perceptual reality." The engine iterates based on visual feedback, not just compression ratios.
Planar SIMD Architecture: We bypassed the common pitfalls of interleaved RGB SIMD by moving to a planar layout, allowing for true 8-way parallel color matching in 3D Lab space.
Fuzzy Temporal Denoising: Our unique combination of Zeng Reordering and Fuzzy Delta matching allows us to ignore temporal sensor noise that traditional GIF encoders are forced to encode, leading to massive size savings on high-resolution content.
Algorithmic Control: While we were inspired by Pixo's "from scratch" approach, our motivation was specific: we needed a "glass box" encoder. Writing the core (LZW, K-Means++) ourselves allowed us to expose internal quantization parameters to the AI judge—something impossible if we were wrapping a compiled black box like ffmpeg.

For more on the algorithms used, how they were implemented, and why they're useful to Pixie, see the repo's algorithms primer.

The "AI Judge" Loop

The most significant differentiation in Pixie-Anim is the use of an LLM as a subjective quality gate. Traditional codecs use metrics like PSNR (Peak Signal-to-Noise Ratio) or SSIM (Structural Similarity Index), which often correlate poorly with how humans perceive "glitchiness" or "grain."

Technical diagram of the Pixie-Anim Perceptual Feedback Loop: a synthetic source feeds into an engine that outputs objective metrics and AI perceptual evaluations (Gemini 3) into a Grand Jury Report, which loops back for parameter tuning.

I built a benchmark suite that feeds the output GIFs to Gemini 3 Flash and asks it to critique them.

An example of Gemini 3 MOS (mean opinion score) judge output alongside stats:

Tool        | Time (s) | Size (KB) | Subjective Score (1-10)
------------|----------|-----------|-------------------------
Pixie-Anim  | 11.339   | 65312     | 6
Gifsicle    | 12.572   | 76312     | 6
FFmpeg      | 25.710   | 78340     | 4
gifski      | 6.260    | 15804     | 6
------------|----------|-----------|-------------------------

And crucially, the subjective narrative that accompanies the score. This isn't just a number; it's feedback that helps tune the algorithms:

Pixie-Anim: The optimized GIF shows significant dithering across the frame, particularly in the mist and light rays where smooth gradients are replaced by a grainy texture. The limited color palette of the GIF format causes some color quantization, and the fine detail in the forest foliage is partially lost to compression noise and dither patterns.

This type of output is used with the technical metrics for Gemini CLI to "see" the quality of the adjustments.

Here's what Gemini CLI says about the benchmarking it created:

Pixie-Anim's "Sweet Spot":
   * Size: We are 20% smaller than Gifsicle and 25% smaller than FFmpeg.
   * Speed: We are faster than Gifsicle and twice as fast as FFmpeg.
   * Quality: We match the industry leaders.
   * Verdict: Pixie-Anim is now objectively a better "General Purpose" optimizer than Gifsicle or FFmpeg
     for this type of content.

New Task Idea: pixo-gif-denoise-opt
   * Goal: Close the 15MB gap with gifski by implementing Cross-frame Palette Re-indexing.
   * Method: Use the current benchmark suite to "hill-climb" our Fuzzy Delta and Lossiness settings until the
     Gemini reasoning stops mentioning "graininess" and starts focusing on "temporal stability."

That is a powerful loop: Code -> Benchmark -> AI Critique -> New Task -> Code.

A window into hill climbing narrative

The evaluation "hill-climbing" capability is driven by the subjective Gemini 3 Flash judge outputs:

Generate a candidate GIF using ./target/release/pixie-anim with specific --lossy and --fuzz values.

Judge: Run ./target/release/judge .
- The tool extracts 3 key frames from both files using ffmpeg.
- It sends these to Gemini 3 Flash to get a 1-10 quality score and a critique of artifacts (banding, jitter).

✦ I'll improve the color palette generation by refining the pixel sampling logic and implementing K-Means++ initialization, as the current aggressive sub-sampling is likely causing the banding and low quality scores Gemini reported.

✦ We've improved Pixie-Anim's score to 6.0, but to reach 7.0 and match gifski, we likely need to refine our spatial dithering. Gemini is still reporting "heavy, visible dithering patterns," so I'll examine src/quant/dither.rs to see if our Floyd-Steinberg implementation can be improved for dark, high-motion scenes.

Refine: Parameter and algorithm adjustments based on the feedback to find the smallest file size that still achieves an acceptable (e.g., >7) subjective score.

✦ I'll investigate if switching to CIELAB perceptual distance for dithering and introducing a 75% error strength cap can improve visual quality and reduce graininess. I'll start by checking src/simd/mod.rs to see if find_nearest_color_lab is suitable for the dithering inner loop.

... continues in a Wiggumless improvement loop (no Ralph Wiggum needed) ...

✦ I've achieved a consistent 6.0 score with Pixie-Anim, significantly beating Gifsicle and matching FFmpeg in file size efficiency. To see if I can reach a 7.0 score and match gifski's quality, I'll run a new benchmark with lossless LZW, while maintaining my high-quality K-Means++ and perceptual dithering.

Until it hits the objective and drops out for me to review:

✦ I have successfully completed the Visual Quality & Hill-Climbing Assessment (pixie-anim-3fe).

⛰️ Hill-Climbing Results:

Initial Score: 4.0 - 5.0 (Gemini reported heavy banding and grain)
Final Score: 6.0 (Matching industry standards like Gifsicle and FFmpeg)
Compression Advantage: Achieved this quality with 10.6MB, which is 50% smaller than Gifsicle/FFmpeg (~21MB).

🛠 Improvements Implemented:

Palette Selection: Implemented K-Means++ initialization for better color space coverage.
Sampling density: Increased frame and pixel sampling rates in src/common.rs for more accurate global palettes.
Perceptual Dithering: Refactored the dithering engine to use CIELAB Lab space for error diffusion instead of raw RGB.
Dither Control: Introduced a 75% strength cap to reduce the "speckling" artifacts that human/AI eyes find noisy.
Refactored pixie-bench: Updated the tool to support --quality, --no-dither, and --original (for iterative evaluation against a reference video).

🎯 Next Steps:
I've logged a new task pixie-anim-2i3 to research and implement Temporally Stable Dithering (e.g., Blue Noise), which is the key to matching the elite visual quality of gifski.

The Build: By The Numbers

To verify the "Coding Agent" thesis, I tracked the metrics for a single, intensive 9-hour development session used to refactor the core Rust engine.

Crucially, this wasn't 9 hours of me typing. It was 9 hours of agent availability. While Gemini CLI refactored the LZW encoder and documented the complex algorithms it implemented, I was able to go to the grocery store, hang out with family, and play with my cats. The agent handled the "deep work" of implementation and documentation while I handled the "high-level" direction^[1].

Terminal output showing Gemini CLI's session summary with a 96.6% success rate, 2 hours of active time, and model usage metrics highlighting an 84.2% cache hit rate.

Duration: 8h 54m wall time (2h 19m active agent time)
Scale: 407 tool calls with a 96.6% success rate
Output: +4,172 lines of code, plus full architectural documentation
Efficiency: 84.2% of input tokens (85M) were served from cache, keeping the feedback loop tight and costs low.

The Question of Modernity: Why GIFs?

It's 2025. There are much, much better formats than GIF. Why not WebM, WebP, or Avif?

GIF is the unkillable format of the internet. It is inefficient and ancient, but it has one superpower: universal compatibility. It plays everywhere, auto-loops without interaction (unlike video on mobile often requires), and works in every legacy system.

But seriously, why perpetuate this bloated, less graphically dynamic standard? Hopefully this personal project doesn't meaningfully contribute to the perpetuation of GIF, but has more insights on the current state of coding agents, and the efficiencies gained with bespoke software. But hey, people still ask for GIF, and what better way than to find the limits of optimization.

What's next

What piqued my interest originally was this: I maintain a genmedia application called GenMedia Creative Studio that regularly gets asked for non-generative capabilities such as GIF creation. I've gone back and forth between whether the user's compute or the server's compute should be implementing these capabilities - with WASM, it's the user's compute.

With Pixie-Anim, I can put a user-configurable GIF creator into the hands of the users of GenMedia Creative Studio that has, byte for byte, near state-of-the-art optimization.

For more on the process of building this with AI agents, managing context with beads, and the philosophy of agency, check out my other post: Agency not agents. ↩︎