VIDRAFT Kernel Acceleration Engine for Gemma

VKAE-Gemma-E4B is the first productized engine.

VKAE turns VIDRAFT's verified Gemma E4B kernel recipe into a private-preview serving product. E2B, 12B, 26B-A4B, 31B, and Qwen follow as separate model-specific engine ports.

Product claims are intentionally scoped: E4B is the launch lane; every additional model family gets its own benchmark gate.

5.32xShowcase acceleration view
GemmaFirst model-specialized engine
GuardedPPL and validity gates stay separate

Productization Status

VKAE-Gemma is moving from challenge proof to controlled customer delivery. The first commercial surface is E4B private preview; other models are sold as explicit porting tracks.

Private Preview

VKAE-Gemma-E4B

ProofChallenge-verified E4B speedup with quality guard.
BoundaryOther models require separate ports.
ControlL40S-managed qualification lane.

Engine Roadmap

Claim-Scoped Performance

Only claim-ready model profiles show before/after throughput. Packaged preview models remain visible, but their raw harness numbers are blocked from product-speed claims until a model-specific VKAE recipe is promoted.

Baseline-
VKAE-
QualityGuarded
Status-

Gemma 4 E4B

Baseline
0 TPS
VKAE
0 TPS
0xRelative throughput lift
0Additional tokens per second
0%Approximate token-cost reduction at same GPU rate

Gemma 4 Model Catalog

The acceleration plan tracks the official Google `gemma-4-*` BF16 serving surface: base, instruct, and assistant variants. QAT, GGUF, and mobile-specific packages are excluded from this benchmark scope.

15Official Google `gemma-4-*` BF16 serving repos
5Primary serving families
1Adjacent DiffusionGemma lane

GPU Sizing And Family Speed

Each Gemma 4 family gets a minimum and recommended BF16 serving lane. Measured TPS is shown only when available; otherwise the chart uses a planning speed index until hardware-specific runs replace it.

Full-Family Speed View

Minimum / Recommended GPU

Measured Benchmark Evidence

Measured rows are separated by claim level. E4B uses the challenge reference path as its before baseline. E2B, 12B, 26B-A4B, and 31B package rows are raw harness checks only, not public VKAE before/after speedups.

Declared baseline VKAE path Optimized showcase Preview / smoke

Kernel-Led Engine Stack

VKAE is positioned as a model-specialized engine: kernels, recipes, serving, and verifier gates are treated as one product surface.

Model Profile

Capture Gemma-specific serving assumptions, context windows, quality guardrails, and hardware lanes.

Kernel Path

Route hot decode work through custom kernel and runtime patches instead of generic defaults.

Serving Recipe

Bind model, memory, batching, speculative, and verifier settings into repeatable deployment recipes.

Verification

Keep artifact, private benchmark, PPL, and dashboard-valid status in separate lanes.

Request Private Preview

Choose the service path that matches your deployment: report, managed endpoint, private Docker evaluation, enterprise on-prem, or custom engine build.

Early access is staged to protect kernel IP while giving customers measurable before/after results.