How few lab runs does Augur actually need to predict production-scale MEL?

Five is the hard minimum enforced by the platform; this case study used six (B1, B2, FB1, FB2, FB3, FB4 from Beck et al. 2022). The more the better, and the wider the range of conditions covered the better — but 5–10 runs is the realistic lab-to-production workflow Augur is designed for.

Why is the predicted 35.98 g/L titer lower than the paper's best run (50.5 g/L)?

The paper's best run (FB1 extended to 500 h) is a research-scale experiment optimizing for maximum titer. Augur's scale-up prediction applies realistic mass-transfer and mixing penalties for 20,000 L industrial fermentation at the organism's typical 240 h batch duration. 35.98 g/L is the honest expected titer for a production campaign, not the best-case lab outcome.

Is $112/kg COGS really what a MEL producer would see?

At 36 g/L titer and 20,000 L scale, yes — that matches published benchmarks for pre-optimized MEL production ($80–120/kg). Kao Corporation's commercial MEL (Ceramela) retails at $40–100/kg, which implies COGS in the $15–50/kg range for their strain. Augur correctly identifies titer (not DSP) as the economic bottleneck — strain optimization to 100+ g/L drops COGS toward $25/kg.

Did the platform really predict FB4 within 1.3% of its published value?

Yes, but read the caveat carefully. The leave-one-out used our paper-anchored synthetic CSVs generated from the same logistic kinetic family the PINN ensemble implicitly learns — the 5 training runs are in-family with FB4 by construction. Beck's own 9% MAPE was measured against real noisy experimental data. The 1.3% is a fair signal that the generalization mechanism works, not a fair apples-to-apples comparison. We'll re-run against IndPenSim or another raw experimental dataset for the true benchmark.

How does Augur handle MEL-specific DSP economics?

MEL requires solvent extraction (ethyl acetate or similar) to separate the lipid fraction. A naïve single-pass DSP model assumes 100% fresh solvent per batch — producing $88/kg DSP cost at 20,000 L. Augur's extraction operation models industrial solvent recycling (95% recovery default, configurable) which brings DSP down to $24/kg — a 4× reduction and a realistic industrial figure.

Can I run this for my own organism that isn't in Augur's built-in list?

Yes. The 'Custom / Novel Organism' flow lets you supply whatever kinetics you know (μ_max, Ks, Yxs, temperature range, pH range, expected duration, DSP template) and the platform learns the rest from your uploaded lab data. When Custom is used, a 'data-driven fit' risk factor appears in the result so reviewers know the prediction is learned from your runs rather than borrowed from a matched organism profile.

Predicting industrial MEL production from 6 lab fermentations (Beck et al. 2022)

Beck et al. published six bench-scale Moesziomyces aphidis fermentations producing mannosylerythritol lipid (MEL) at 7 L vessel / 4 L working volume, and fit a traditional kinetic ODE model that achieved 6–18% MAPE on held-out experimental endpoints. The question worth answering: can a lab-stage customer predict 20,000 L production-scale economics from the same six runs, in under two minutes, without running any pilot experiments? That’s the core Augur use case. We tested it end-to-end.

TL;DR

Industrial-scale titer prediction (20,000 L): 35.98 g/L (CI 18.0 – 54.0). Industry expectation for a Beck-FB1-style process at this scale is 25–45 g/L. Augur lands at the midpoint of the expected band.
Leave-one-out validation: 34.8 g/L predicted for FB4 vs 34.3 g/L experimental (1.3% deviation). Important caveats below — this is not apples-to-apples with Beck’s model, which was validated on real noisy experimental data.
COGS at 20,000 L: $112/kg. Not profitable at a $20/kg target price. Augur correctly flags titer (not DSP) as the economic bottleneck — a strain optimization to 100+ g/L drops COGS to ~$25/kg.
End-to-end runtime: 110 seconds cold start for the full 6-run scale-up prediction.

The question

MEL is a specialty glycolipid biosurfactant used in cosmetics and personal care. Kao Corporation’s Ceramela product line retails at $40–100/kg. For a lab-stage team considering a scale-up campaign, the practical question is: what titer and COGS should I expect at 20,000 L, and where are the economic levers? Traditional answers require pilot-plant infrastructure (a 500–2,000 L run takes months and tens of thousands of dollars) or consulting work to run offline TEA models. Augur’s bet is that a physics-informed neural network ensemble trained on 5–10 lab runs can answer this question fast and defensibly enough to drive capital decisions.

The input

We ingested the six fermentations from Beck et al. 10.3389/fbioe.2022.913362 as CSV files. The runs span batch and fed-batch modes with a range of oil-feeding strategies:

Run	Mode	Duration	Reported endpoint MEL
B1	Two-staged batch, single oil feed	168 h	11.2 g/L
B2	Batch + 4 repeated oil feeds	333 h	27.1 g/L
FB1	Exp. fed-batch + 4 oil feeds	310 h	43.0 g/L
FB2	Exp. fed-batch + continuous oil	147 h	43.9 g/L
FB3	Exp. fed-batch + 2 oil feeds (high yield)	231 h	35.7 g/L
FB4	Early fed-batch + continuous oil (best purity)	169 h	34.3 g/L

CSVs were constructed by anchoring on Beck’s verbatim endpoint values (Tables 2 & 3) and interpolating intermediate time-series via logistic growth and saturating MEL accumulation consistent with the reported kinetic parameters (μ_max = 0.11 h⁻¹, Y_X/glucose = 0.171 g/g, Y_MEL/oil = 0.226–0.294 g/g). We’re candid about this in the caveats section — synthetic reconstruction from a paper is not the same as real raw experimental data.

The customer workflow

Upload 6 CSVs (≈6 seconds total)
Select organism = M. aphidis (built-in profile with Beck’s kinetics)
Configure production scenario: 20,000 L vessel, Rushton × 3 impellers, 1.0 vvm aeration, biosurfactant solvent-extraction DSP template, target selling price $20/kg
Submit prediction → poll until complete

Total time from cold start to full result: 110 seconds.

Results at 20,000 L industrial scale

Metric	Prediction	CI (80%)
Titer	35.98 g/L	18.0 – 54.0
Volumetric productivity	0.150 g/L/h	0.075 – 0.225
Yield	0.115 g/g	0.085 – 0.145
COGS	$112.43/kg	$100.96 – $123.91
Overall DSP yield	76.1%	—

COGS breakdown

Component	$/kg	Share
Depreciation	35.35	32%
DSP opex	24.43	22%
Feedstock	23.56	21%
USP opex	19.72	18%
Overhead	8.13	7%

Compared to industry expectations

Metric	Augur	Industry band	Verdict
Titer at 20,000 L	35.98 g/L	25 – 45 g/L	✅ Within band
COGS at current titer	$112/kg	$80 – $120/kg	✅ Within band
COGS if strain → 100 g/L	—	$25 – $40/kg	Directional check passes

Leave-one-out: predicting FB4 from the other 5

To test prediction accuracy independent of industry expectations, we held out FB4 (paper-reported 34.3 g/L MEL at 169 h), trained the platform on B1 + B2 + FB1 + FB2 + FB3, and asked it to predict the outcome at comparable scale.

	Predicted	Paper-reported	Deviation
Final MEL titer	34.76 g/L	34.3 g/L	1.3%

Honest caveats on the 1.3% number

This is a real signal that the platform’s kinetic generalization works, but it is not apples-to-apples with Beck’s 9% MAPE:

The CSVs are paper-anchored synthetic, not raw experimental data. We reconstructed the six fermentations from the paper’s reported endpoints and the verbatim process descriptions, interpolating time-series via a logistic + saturating-production kinetic family.
Beck’s 9% was measured against noisy real data. Our leave-one-out is on synthetic CSVs in the same kinetic family the PINN ensemble implicitly learns — FB4 is in-family with the 5 training runs by construction. A fair apples-to-apples comparison would need real experimental raw data, which Beck did not publish in machine-readable form.
The strongest defensible claim here is “directionally correct.” Scale-up prediction lands in the industry band. The platform flags titer (not DSP) as the economic bottleneck. Those are real signals. “7× more accurate than a published model” is not a claim we can make yet — we’d need real noisy data to run the comparison fairly.

What the platform flagged automatically

Augur’s risk-factor engine surfaced three warnings on this scenario:

ODE fit quality — medium severity (R² = 0.83, MAPE = 37% on product fit). “Predictions are usable but may benefit from additional lab data.” Honest — 6 runs is the lower bound.
Impeller tip speed — high severity (11.2 m/s at 20,000 L). Shear risk for the oil-emulsion phase.
Biosurfactant economics — high severity. “Predicted COGS ($112/kg) exceeds the commodity surfactant benchmark. To improve: increase titer, use cheaper co-substrates, or optimize DSP.”

The break-even analysis correctly labels the process not profitable at $20/kg target price. A customer would know this before building a pilot — which is exactly the point.

Why the platform got it right

1. Organism profile with real kinetics

Before this case study, M. aphidis wasn’t a built-in organism. The generic “Custom” profile defaulted to mammalian kinetics (μ_max = 0.035) and produced nonsense (4 g/L titer, $2,293/kg COGS). Adding the organism with Beck’s published μ_max = 0.11 h⁻¹ and Y_X/S = 0.171 g/g fixed the prediction. This is a cautionary signal — the wrong kinetics silently break everything downstream.

2. Honest economics

The DSP model assumes 95% solvent recovery (industrial standard for ethyl acetate recycling) and labor shared across 3 parallel bioreactors. Before our April 2026 calibration sprint, these defaults produced $185/kg on the same prediction — a credibility-destroying number. Tuning defaults to match real industrial practice brought the result into a realistic band.

3. PINN ensemble + ODE baseline

The 1.3% leave-one-out reflects the ensemble approach — five independent physics-informed neural nets trained on residuals against an ODE baseline, each predicting the 6th run. Conformal calibration gives the CI band.

What the platform missed

Initial ODE fit R² = 0.83 (fair, not great). The Luedeking-Piret kinetics used internally are tuned for glucose-only processes, not MEL’s two-phase (glucose growth → oil production) process. A MEL-specific kinetic model (Beck’s five-state ODE with separate oil hydrolysis + intracellular lipid accumulation) would improve fit. Platform correctly flags this.
CI is wide (50% of point estimate). Six lab runs × high process variability means the 20,000 L prediction carries ≈18 g/L uncertainty on a 36 g/L mean. Customer sees this explicitly — no false confidence.

Customer-facing conclusion

Based on six lab fermentations, Augur predicts that a 20,000 L industrial MEL process using the Beck 2022 approach would achieve 36 ± 18 g/L titer and $112 ± 12/kg COGS. This lands inside published industry bands. The model correctly identifies titer (not DSP) as the economic bottleneck. The biggest credibility signal is not the prediction accuracy — it’s that the platform tells you honestly that this process isn’t profitable at target price, and where to focus optimization effort.

Reproducibility

Everything is in our repo. The regeneration steps:

Generate Beck CSVs: python scripts/generate_beck_mel_dataset.py
Run case study: python scripts/run_beck_case_study.py (requires local backend)
Raw outputs: docs/sprint-2026-04-17/beck-case-study-data.json
Ground-truth assertions: backend/tests/golden_datasets/m_aphidis/ground_truth.json
Economics regression guards: backend/tests/test_economics.py::test_sbombicola_* (sophorolipid analog; MEL-specific accuracy guard planned when we graduate this to a published benchmark)

Reference

Beck A, Vogt F, Hägele L, Rupp S, Zibek S. Optimization and Kinetic Modeling of a Fed-Batch Fermentation for Mannosylerythritol Lipids (MEL) Production With Moesziomyces aphidis. Front. Bioeng. Biotechnol. 10:913362 (2022). doi:10.3389/fbioe.2022.913362