Lilith Version 3 - tests and benchmarks

Design tests validate tool calling (`ToolCallingTests`), chat history, Kokoro TTS, and optional live Ollama sessions. Benchmark cards document console runs with tools enabled.

Bench machine

Current system specs

CPUAMD Ryzen 5 4500 6-Core Processor

GPUNVIDIA GeForce RTX 3060

RAM32 GB

OSWindows 11 (10.0.26200)

Ollama127.0.0.1:11434

Recorded2026-05-21 11:10

Validation

Recorded console sessions

Compare tool-invocation latency and model behavior before upgrading to workspace-enabled Version 4. Run `python DesignTests/run-design-tests.py --live` with Ollama and Kokoro available.

automated · 2.11s

Tools time date — gemma4

Model gemma4Time 2.11s

tools-time-date automated lilith run on v3 (gemma4). Total session time 2.11s.

View recording & transcript