Domain-specialized models for Roblox/Luau game development on Apple Silicon
v0.5 Production Qwen3.5-4B MLX / Apple Silicon
| Site last updated | March 16, 2026 |
| Production adapter | v0.5-4b-curated — promoted March 15, 2026 |
| Benchmark version | v2 (real MCP tools) — revised March 15, 2026 |
| Training data | 3,301 curated examples — last expanded March 16, 2026 |
| OpenGameEval run | 47 tasks dry-run — March 16, 2026 |
| Base model | Qwen3.5-4B-4bit (Apache 2.0) |
| Repository | github.com/adpena/vertigo-lora |
| Next milestone | 128GB machine (March 19) — 9B training, full-rank 4B, Studio execution eval |
| Model | Type | Size | Link |
|---|---|---|---|
| Vertigo-Qwen3.5-4B-v0.5-4bit | Fused (ready to use) | 2.2 GB | HuggingFace |
| Vertigo-Qwen3.5-4B-v0.5-lora | LoRA adapter only | 62 MB | HuggingFace |
| Model | Params | Coding | Bugfix | Arch | MCP | Embody | Overall |
|---|---|---|---|---|---|---|---|
| Qwen3.5-27B dense | 27B | 80.6% | 96.7% | 79.1% | 100% | 95.0% | 88.7% |
| Vertigo-Qwen3.5-4B-v0.5 | 4B | 72.5% | 90.0% | 76.6% | 85.8% | 100% | 82.9% |
| Qwen3.5-35B-A3B | 3B active | 79.2% | 75.3% | 66.5% | 96.7% | 77.5% | 79.1% |
| Qwen3.5-4B base | 4B | 63.7% | 83.3% | 67.5% | 97.5% | 75.0% | 75.1% |
| Qwen3.5-2B | 2B | 45.0% | 81.7% | 54.2% | 70.0% | 95.0% | 65.1% |
| Qwen3.5-9B | 9B | 25.6% | 76.7% | 61.6% | 96.7% | 95.0% | 63.5% |
| Model | Pass@1 (dry) | Method |
|---|---|---|
| Vertigo-Qwen3.5-4B-v0.5 | 83.0% | Pattern-match |
| Qwen3.5-4B base | 72.3% | Pattern-match |
| Qwen3.5-27B dense | 48.9% | Pattern-match |
| Qwen3.5-35B-A3B | 42.6% | Pattern-match |
| Model | Pass@1 | Method |
|---|---|---|
| Gemini 3.1 Pro | 55.3% | Studio execution |
| Claude Opus 4.6 | 51.9% | Studio execution |
| Claude Opus 4.5 | 44.5% | Studio execution |
| GPT-5.4 | 35.1% | Studio execution |
Source: Roblox OpenGameEval Leaderboard
--!strict, :Init(), @native). Code that looks correct but doesn't compile would still score well.luau-compile.| Parameter | Value |
|---|---|
| Base model | Qwen3.5-4B-4bit |
| Method | QLoRA via Apple MLX |
| Hardware | Apple M5 Max, 36GB unified memory |
| Rank / Layers | 8 / 8 (of 28) |
| Learning rate | 2e-6 (flat) |
| Iterations | 600 |
| Sequence length | 2048 |
| Training examples | 3,301 curated (from 3,893 raw) |
| Validation loss | 0.857 |
| Training time | ~45 minutes |
| Peak memory | 27.3 GB |
| Source | Examples | License |
|---|---|---|
| Own codebase (Vertigo) | 631 | Proprietary (own) |
| OSS Roblox repos | 1,301 | Various OSS |
| Roblox Creator Docs | 806 | CC-BY-4.0 |
| Ecosystem tools (Luau, Rojo, Wally, Selene, roblox-ts) | 61 | MIT / MPL-2.0 |
| Generated (synthetic, STaR, distillation, composition) | 502 | Generated |
No live Roblox experiences, Creator Store assets, player data, or rate-limited content were used. All examples include provenance metadata (source, rights basis, license).
The most important discovery from this project: removing 550 low-quality examples improved the benchmark score by 13 percentage points.
| Version | Training Examples | Overall |
|---|---|---|
| v0.3 (expanded, uncurated) | 4,852 | 63.6% |
| v0.4 (curated, code-dense only) | 5,679 | 82.8% |
Examples without substantial code (prose-only explanations, tool-calling prompts without code output, gameplay session descriptions) actively degraded the model when included in training data. The curation rule: every example must contain ≥5 lines of Luau code in fenced code blocks.
This is an early research release. Feedback, questions, and collaboration inquiries are welcome.
Alejandro Peña
GitHub: adpena/vertigo-lora
HuggingFace: @adpena
Email: adpena@vertigo.build
For methodology details, contamination audit, and full training log — please reach out directly. The evaluation methodology document and training logs are available on request.