Alembiq

projectactive

Alembiq is an end-to-end platform for LLM training, alignment, evaluation, and synthetic data generation. Named from the Arabic al-anbiq (the alembic) — a distillation vessel for refining essences. Alembiq distills raw data into refined models. It is hosted at alawein/alembiq (formerly neper), deployed at alembiq.online, and carries a P2 priority within the project catalog.

The platform is built on PyTorch and uses Pydantic for configuration management. It exposes three CLI entry points — alembiq-train, alembiq-eval, and alembiq-synth — each targeting a distinct phase of the model development lifecycle. Data pipelines are structured as directed acyclic graphs (DAGs) with SHA-256-based caching, ensuring that intermediate results are reused when inputs have not changed and providing reproducibility across runs.

Training capabilities include supervised fine-tuning (SFT) and direct preference optimization (DPO) loops, along with LoRA-based parameter-efficient fine-tuning for adapting models without full retraining. The system incorporates differentiable safety constraints that can be applied during training to guide model behavior. Atomic checkpointing ensures that training state can be recovered cleanly after interruptions.

Operational observability is provided through Prometheus metrics, and the project supports Docker-based deployment. The landing site at website-app/ is a React/Vite app deployed via Netlify (note: deviates from the workspace Vercel default). Alembiq references the LLMWorks project, which addresses the complementary domain of LLM security testing and deployment validation, and SimCore for shared scientific computing infrastructure.

Note: The Python module/package is still named neper internally (intentional — see workspace CLAUDE.md). The repo and CLI are alembiq; imports remain from neper import ....