Alembiq

projectactive

Alembiq is an end-to-end platform for LLM training, alignment, evaluation, and synthetic data generation. Named from the Arabic al-anbiq (the alembic) — a distillation vessel for refining essences. Alembiq distills raw data into refined models. It is hosted at alawein/alembiq and carries a P2 priority within the project catalog.

The platform is built on PyTorch and uses Pydantic for configuration management. It exposes three CLI entry points -- neper-train, neper-eval, and neper-synth -- each targeting a distinct phase of the model development lifecycle. Data pipelines are structured as directed acyclic graphs (DAGs) with SHA-256-based caching, ensuring that intermediate results are reused when inputs have not changed and providing reproducibility across runs.

Training capabilities include supervised fine-tuning (SFT) and direct preference optimization (DPO) loops, along with LoRA-based parameter-efficient fine-tuning for adapting models without full retraining. The system incorporates differentiable safety constraints that can be applied during training to guide model behavior. Atomic checkpointing ensures that training state can be recovered cleanly after interruptions.

Operational observability is provided through Prometheus metrics, and the project supports Docker-based deployment as indicated by its stack configuration. Alembiq references the Llmworks project, which addresses the complementary domain of LLM security testing and deployment validation.