Lilith Version 4 - tests and benchmarks
Design tests validate tool calling (`ToolCallingTests`), chat history, Kokoro TTS, and optional live Ollama sessions. Benchmark cards document console runs with tools enabled.
Bench machine
Current system specs
CPUAMD Ryzen 5 4500 6-Core Processor
GPUNVIDIA GeForce RTX 3060
RAM32 GB
OSWindows 11 (10.0.26200)
Ollama127.0.0.1:11434
Recorded2026-05-21 11:10
Validation
Recorded console sessions
Compare tool-invocation latency and model behavior before upgrading to workspace-enabled Version 5. Run `python DesignTests/run-design-tests.py --live` with Ollama and Kokoro available.
Tools time date — gemma4
Model gemma4Time 2.11s
tools-time-date automated lilith run on v4 (gemma4). Total session time 2.11s.
View recording & transcriptWorkspace file management — llama3.2
Model llama3.2Time 466.54s
workspace-file-management automated lilith run on v4 (llama3.2). Total session time 466.54s.
View recording & transcript