M. aphidisMELSolvent extractionIndustrial scale

Predicting industrial MEL production from 6 lab fermentations (Beck et al. 2022)

Can a physics-informed AI platform predict 20,000 L MEL bioreactor outcomes from 6 published lab runs? We tested it on Beck et al. 2022 and landed at 35.98 g/L — inside the industry-expected 25–45 g/L band — with an honest look at the caveats.

RenewVerse Research11 min read

Beck et al. published six bench-scale Moesziomyces aphidis fermentations producing mannosylerythritol lipid (MEL) at 7 L vessel / 4 L working volume, and fit a traditional kinetic ODE model that achieved 6–18% MAPE on held-out experimental endpoints. The question worth answering: can a lab-stage customer predict 20,000 L production-scale economics from the same six runs, in under two minutes, without running any pilot experiments? That’s the core Augur use case. We tested it end-to-end.

TL;DR

  • Industrial-scale titer prediction (20,000 L): 35.98 g/L (CI 18.0 – 54.0). Industry expectation for a Beck-FB1-style process at this scale is 25–45 g/L. Augur lands at the midpoint of the expected band.
  • Leave-one-out validation: 34.8 g/L predicted for FB4 vs 34.3 g/L experimental (1.3% deviation). Important caveats below — this is not apples-to-apples with Beck’s model, which was validated on real noisy experimental data.
  • COGS at 20,000 L: $112/kg. Not profitable at a $20/kg target price. Augur correctly flags titer (not DSP) as the economic bottleneck — a strain optimization to 100+ g/L drops COGS to ~$25/kg.
  • End-to-end runtime: 110 seconds cold start for the full 6-run scale-up prediction.

The question

MEL is a specialty glycolipid biosurfactant used in cosmetics and personal care. Kao Corporation’s Ceramela product line retails at $40–100/kg. For a lab-stage team considering a scale-up campaign, the practical question is: what titer and COGS should I expect at 20,000 L, and where are the economic levers? Traditional answers require pilot-plant infrastructure (a 500–2,000 L run takes months and tens of thousands of dollars) or consulting work to run offline TEA models. Augur’s bet is that a physics-informed neural network ensemble trained on 5–10 lab runs can answer this question fast and defensibly enough to drive capital decisions.

The input

We ingested the six fermentations from Beck et al. 10.3389/fbioe.2022.913362 as CSV files. The runs span batch and fed-batch modes with a range of oil-feeding strategies:

RunModeDurationReported endpoint MEL
B1Two-staged batch, single oil feed168 h11.2 g/L
B2Batch + 4 repeated oil feeds333 h27.1 g/L
FB1Exp. fed-batch + 4 oil feeds310 h43.0 g/L
FB2Exp. fed-batch + continuous oil147 h43.9 g/L
FB3Exp. fed-batch + 2 oil feeds (high yield)231 h35.7 g/L
FB4Early fed-batch + continuous oil (best purity)169 h34.3 g/L

CSVs were constructed by anchoring on Beck’s verbatim endpoint values (Tables 2 & 3) and interpolating intermediate time-series via logistic growth and saturating MEL accumulation consistent with the reported kinetic parameters (μ_max = 0.11 h⁻¹, Y_X/glucose = 0.171 g/g, Y_MEL/oil = 0.226–0.294 g/g). We’re candid about this in the caveats section — synthetic reconstruction from a paper is not the same as real raw experimental data.

The customer workflow

  1. Upload 6 CSVs (≈6 seconds total)
  2. Select organism = M. aphidis (built-in profile with Beck’s kinetics)
  3. Configure production scenario: 20,000 L vessel, Rushton × 3 impellers, 1.0 vvm aeration, biosurfactant solvent-extraction DSP template, target selling price $20/kg
  4. Submit prediction → poll until complete

Total time from cold start to full result: 110 seconds.

Results at 20,000 L industrial scale

MetricPredictionCI (80%)
Titer35.98 g/L18.0 – 54.0
Volumetric productivity0.150 g/L/h0.075 – 0.225
Yield0.115 g/g0.085 – 0.145
COGS$112.43/kg$100.96 – $123.91
Overall DSP yield76.1%

COGS breakdown

Component$/kgShare
Depreciation35.3532%
DSP opex24.4322%
Feedstock23.5621%
USP opex19.7218%
Overhead8.137%

Compared to industry expectations

MetricAugurIndustry bandVerdict
Titer at 20,000 L35.98 g/L25 – 45 g/L✅ Within band
COGS at current titer$112/kg$80 – $120/kg✅ Within band
COGS if strain → 100 g/L$25 – $40/kgDirectional check passes

Leave-one-out: predicting FB4 from the other 5

To test prediction accuracy independent of industry expectations, we held out FB4 (paper-reported 34.3 g/L MEL at 169 h), trained the platform on B1 + B2 + FB1 + FB2 + FB3, and asked it to predict the outcome at comparable scale.

PredictedPaper-reportedDeviation
Final MEL titer34.76 g/L34.3 g/L1.3%

Honest caveats on the 1.3% number

This is a real signal that the platform’s kinetic generalization works, but it is not apples-to-apples with Beck’s 9% MAPE:

  • The CSVs are paper-anchored synthetic, not raw experimental data. We reconstructed the six fermentations from the paper’s reported endpoints and the verbatim process descriptions, interpolating time-series via a logistic + saturating-production kinetic family.
  • Beck’s 9% was measured against noisy real data. Our leave-one-out is on synthetic CSVs in the same kinetic family the PINN ensemble implicitly learns — FB4 is in-family with the 5 training runs by construction. A fair apples-to-apples comparison would need real experimental raw data, which Beck did not publish in machine-readable form.
  • The strongest defensible claim here is “directionally correct.” Scale-up prediction lands in the industry band. The platform flags titer (not DSP) as the economic bottleneck. Those are real signals. “7× more accurate than a published model” is not a claim we can make yet — we’d need real noisy data to run the comparison fairly.

What the platform flagged automatically

Augur’s risk-factor engine surfaced three warnings on this scenario:

  • ODE fit quality — medium severity (R² = 0.83, MAPE = 37% on product fit). “Predictions are usable but may benefit from additional lab data.” Honest — 6 runs is the lower bound.
  • Impeller tip speed — high severity (11.2 m/s at 20,000 L). Shear risk for the oil-emulsion phase.
  • Biosurfactant economics — high severity. “Predicted COGS ($112/kg) exceeds the commodity surfactant benchmark. To improve: increase titer, use cheaper co-substrates, or optimize DSP.”

The break-even analysis correctly labels the process not profitable at $20/kg target price. A customer would know this before building a pilot — which is exactly the point.

Why the platform got it right

1. Organism profile with real kinetics

Before this case study, M. aphidis wasn’t a built-in organism. The generic “Custom” profile defaulted to mammalian kinetics (μ_max = 0.035) and produced nonsense (4 g/L titer, $2,293/kg COGS). Adding the organism with Beck’s published μ_max = 0.11 h⁻¹ and Y_X/S = 0.171 g/g fixed the prediction. This is a cautionary signal — the wrong kinetics silently break everything downstream.

2. Honest economics

The DSP model assumes 95% solvent recovery (industrial standard for ethyl acetate recycling) and labor shared across 3 parallel bioreactors. Before our April 2026 calibration sprint, these defaults produced $185/kg on the same prediction — a credibility-destroying number. Tuning defaults to match real industrial practice brought the result into a realistic band.

3. PINN ensemble + ODE baseline

The 1.3% leave-one-out reflects the ensemble approach — five independent physics-informed neural nets trained on residuals against an ODE baseline, each predicting the 6th run. Conformal calibration gives the CI band.

What the platform missed

  • Initial ODE fit R² = 0.83 (fair, not great). The Luedeking-Piret kinetics used internally are tuned for glucose-only processes, not MEL’s two-phase (glucose growth → oil production) process. A MEL-specific kinetic model (Beck’s five-state ODE with separate oil hydrolysis + intracellular lipid accumulation) would improve fit. Platform correctly flags this.
  • CI is wide (50% of point estimate). Six lab runs × high process variability means the 20,000 L prediction carries ≈18 g/L uncertainty on a 36 g/L mean. Customer sees this explicitly — no false confidence.

Customer-facing conclusion

Based on six lab fermentations, Augur predicts that a 20,000 L industrial MEL process using the Beck 2022 approach would achieve 36 ± 18 g/L titer and $112 ± 12/kg COGS. This lands inside published industry bands. The model correctly identifies titer (not DSP) as the economic bottleneck. The biggest credibility signal is not the prediction accuracy — it’s that the platform tells you honestly that this process isn’t profitable at target price, and where to focus optimization effort.

Reproducibility

Everything is in our repo. The regeneration steps:

  • Generate Beck CSVs: python scripts/generate_beck_mel_dataset.py
  • Run case study: python scripts/run_beck_case_study.py (requires local backend)
  • Raw outputs: docs/sprint-2026-04-17/beck-case-study-data.json
  • Ground-truth assertions: backend/tests/golden_datasets/m_aphidis/ground_truth.json
  • Economics regression guards: backend/tests/test_economics.py::test_sbombicola_* (sophorolipid analog; MEL-specific accuracy guard planned when we graduate this to a published benchmark)

Reference

Beck A, Vogt F, Hägele L, Rupp S, Zibek S. Optimization and Kinetic Modeling of a Fed-Batch Fermentation for Mannosylerythritol Lipids (MEL) Production With Moesziomyces aphidis. Front. Bioeng. Biotechnol. 10:913362 (2022). doi:10.3389/fbioe.2022.913362

Frequently asked questions

01How few lab runs does Augur actually need to predict production-scale MEL?

Five is the hard minimum enforced by the platform; this case study used six (B1, B2, FB1, FB2, FB3, FB4 from Beck et al. 2022). The more the better, and the wider the range of conditions covered the better — but 5–10 runs is the realistic lab-to-production workflow Augur is designed for.

02Why is the predicted 35.98 g/L titer lower than the paper's best run (50.5 g/L)?

The paper's best run (FB1 extended to 500 h) is a research-scale experiment optimizing for maximum titer. Augur's scale-up prediction applies realistic mass-transfer and mixing penalties for 20,000 L industrial fermentation at the organism's typical 240 h batch duration. 35.98 g/L is the honest expected titer for a production campaign, not the best-case lab outcome.

03Is $112/kg COGS really what a MEL producer would see?

At 36 g/L titer and 20,000 L scale, yes — that matches published benchmarks for pre-optimized MEL production ($80–120/kg). Kao Corporation's commercial MEL (Ceramela) retails at $40–100/kg, which implies COGS in the $15–50/kg range for their strain. Augur correctly identifies titer (not DSP) as the economic bottleneck — strain optimization to 100+ g/L drops COGS toward $25/kg.

04Did the platform really predict FB4 within 1.3% of its published value?

Yes, but read the caveat carefully. The leave-one-out used our paper-anchored synthetic CSVs generated from the same logistic kinetic family the PINN ensemble implicitly learns — the 5 training runs are in-family with FB4 by construction. Beck's own 9% MAPE was measured against real noisy experimental data. The 1.3% is a fair signal that the generalization mechanism works, not a fair apples-to-apples comparison. We'll re-run against IndPenSim or another raw experimental dataset for the true benchmark.

05How does Augur handle MEL-specific DSP economics?

MEL requires solvent extraction (ethyl acetate or similar) to separate the lipid fraction. A naïve single-pass DSP model assumes 100% fresh solvent per batch — producing $88/kg DSP cost at 20,000 L. Augur's extraction operation models industrial solvent recycling (95% recovery default, configurable) which brings DSP down to $24/kg — a 4× reduction and a realistic industrial figure.

06Can I run this for my own organism that isn't in Augur's built-in list?

Yes. The 'Custom / Novel Organism' flow lets you supply whatever kinetics you know (μ_max, Ks, Yxs, temperature range, pH range, expected duration, DSP template) and the platform learns the rest from your uploaded lab data. When Custom is used, a 'data-driven fit' risk factor appears in the result so reviewers know the prediction is learned from your runs rather than borrowed from a matched organism profile.