Predicting scale-up for a novel strain: the Custom Organism flow
Most bioprocess simulation platforms only support a handful of built-in organisms. Augur's Custom Organism flow accepts whatever kinetics you know (mu_max, Ks, Yxs, temperature range, DSP template) and learns the rest from your uploaded runs. Walkthrough with a worked example.
The first question every bioprocess engineer asks when looking at Augur is "do you support my organism?" The list of built-in profiles has seven entries: CHO, E. coli, S. cerevisiae, P. pastoris, P. putida, S. bombicola, M. aphidis. If you're working on a novel strain, a biosurfactant producer outside the three built-in glycolipid hosts, or any engineered organism with atypical kinetics, the answer might appear to be no. It isn't. This post walks through the Custom Organism flow and shows why organism support is not a binary property of the platform.
Why organism-agnostic by construction
The platform's prediction engine does not contain hard-coded organism-specific code paths. The ODE solver, the PINN ensemble, the hydrodynamic scale-up model, and the DSP unit-operation chain all operate on generic kinetic parameters (mu_max, Ks, Yxs, Kp, Ea) and generic process conditions (temperature, pH, DO, feed rate). The "organism" in the platform is essentially a named dictionary of defaults: kinetic parameter priors, temperature and pH ranges, a DSP template, and economics defaults (labor hours per batch, facility sharing).
This means any organism you can describe with a handful of kinetic parameters can run through the pipeline. The built-in profiles exist because most customers want to skip the parameter-specification step for common hosts, not because the engine is organism-specific.
The Custom Organism flow
When you select "Custom / Novel Organism" in the scenario builder, the form expands to accept overrides:
- Organism name (free text, shows up in the result report)
- Typical duration (h): how long a fermentation usually runs
- Temperature range (°C): min and max, centered on your T_opt
- pH range: min and max
- Approximate mu_max (1/h): the specific growth rate at optimal conditions
- DSP template: pick from mab_standard, ecoli_ib, yeast_secreted, small_molecule, biosurfactant
- Optional priors: Ks (substrate affinity), Yxs (biomass yield), Kp (product inhibition), q_s_crit (overflow metabolism threshold), Ea (activation energy), T_opt (optimal temperature)
Any field you leave blank becomes a free parameter in the ODE fit, learned from your uploaded data. The more fields you specify, the tighter the prior and the better the prediction on sparse data.
Worked example: Y. lipolytica producing a lipid
Say you're running a Yarrowia lipolytica strain engineered for odd-chain fatty acid production. It's not in the built-in list. You have 6 lab runs at 5 L, 30°C, pH 5.5, glucose feed. Product titer ranges 8-15 g/L across runs. Your goal is a 20,000 L production-scale estimate for COGS.
Custom organism setup:
- Name: "Y. lipolytica Po1g-FAA"
- Typical duration: 96 h
- Temperature range: 26-32°C
- pH range: 5.0-6.0
- Approximate mu_max: 0.35/h
- DSP template: small_molecule (centrifuge, extraction, evaporation)
- Optional Ks: 1.5 g/L glucose
- Optional Yxs: 0.42 g biomass per g glucose
- Optional T_opt: 29°C
Upload the 6-run CSV. The ODE solver uses the priors as starting points for multi-start fitting; it converges in 5-10 seconds per run. The PINN ensemble trains against the ODE baseline; takes 20-40 seconds. Hydrodynamic scale-up recomputes kLa (expect a drop from ~350 h^-1 at 5 L to ~80 h^-1 at 20,000 L), mixing time, tip speed. DSP runs the small-molecule chain: centrifuge (90% yield), solvent extraction (82% yield, 95% solvent recycling), evaporation (95% yield). Economics rolls up to COGS/kg with a Monte Carlo interval.
Result: you get a production-scale titer prediction (likely 8-14 g/L at 20,000 L, reflecting the kLa drop), a DSP yield ~70%, and a COGS/kg estimate with a 30-45% confidence interval. A "Custom organism" risk factor appears on the result so downstream reviewers (investors, partners, CMO) understand the prediction reflects learned kinetics rather than a matched organism profile. The workflow from upload to output is 60-90 seconds.
Where Custom Organism is and isn't appropriate
Good fits:
- Engineered versions of common hosts with modified kinetics (e.g., E. coli with a slow-growing plasmid)
- Novel biosurfactant producers (e.g., C. bombicola, Y. lipolytica, W. ichthyophaga)
- Wild-type organisms not in the built-in list but with well-known kinetic families (yeasts, gram-negative bacteria, filamentous fungi)
- Industrial strains where the customer has proprietary kinetic data they prefer not to share publicly
Poor fits:
- Cell-free or enzymatic processes (no growth, no Monod). The platform runs but the physics prior is much weaker, so intervals widen substantially.
- Photoautotrophic processes (algae, cyanobacteria). Light-limited kinetics require a different physics term that isn't currently encoded.
- Mixed-culture fermentations (anaerobic digestion, consortia). The single-species Monod assumption breaks down.
- Processes with strong multi-substrate competition where diauxic behavior dominates the timeseries.
Accuracy tradeoffs
Typical built-in organism on 8-10 representative lab runs produces 5-15% MAPE on titer at production scale. A custom organism with a strong prior (mu_max, Ks, temperature range, DSP template, optional Ea/T_opt) runs 8-20% MAPE. A custom organism with only mu_max and duration runs 12-35% MAPE. The gap narrows as you add more priors or more runs, and the platform surfaces this as a risk factor on the result so the reviewer sees the accuracy tradeoff explicitly.
The best workflow for a novel strain is iterative. Start with whatever priors you're confident in, run a prediction, see the confidence interval. If the interval is too wide, add more priors (if you have them) or more lab runs (if you can). The platform will tighten as data accumulates.
What's on the roadmap
Currently you pass custom organism overrides per-scenario. Next sprint adds a saved-profile database so any team can store a novel strain's full profile (kinetics, DSP template, economics defaults) and reuse it across scenarios without re-entering parameters each time. Pilot customers with a specific strain they'll use repeatedly can request a built-in profile be added directly, so the workflow is identical to a native organism.
If you're working on a novel strain and want to see a first-pass scale-up prediction, request access. We're onboarding pilot users this quarter and have capacity to help configure custom organism flows for strains beyond the built-in list.
Frequently asked questions
01What counts as a 'custom' organism?
Anything not in the 7-organism built-in list (CHO, E. coli, S. cerevisiae, P. pastoris, P. putida, S. bombicola, M. aphidis). That includes engineered versions of those organisms with materially different kinetics, novel biosurfactant producers like Y. lipolytica or C. bombicola, recombinant expression hosts we don't have profiles for, and any strain where published kinetics are non-representative of your actual process.
02Do I have to specify all the kinetic parameters? What if I don't know them?
No. The flow has required fields (organism name, typical duration, temperature range, pH range, DSP template, approximate mu_max) and optional fields (Ks, Yxs, Kp, Ea, q_s_crit for overflow metabolism, T_opt). Any parameter you don't provide gets learned from the uploaded runs. The more you specify, the tighter the prior on the ODE fit, which usually translates to better prediction accuracy on sparse data.
03How does the platform know what DSP to use for my organism?
You pick a DSP template from the dropdown. The five built-in templates (mab_standard, ecoli_ib, yeast_secreted, small_molecule, biosurfactant) cover the common unit-operation chains. Any of these can be customized per-scenario: swap unit operations, override yields, change solvent ratios. If your process is genuinely novel (say, a cell-free system or a photoautotroph), you can build a DSP chain from scratch using the 8 registered unit operations.
04What's the accuracy hit for a custom organism vs a built-in one?
Typical built-in organism on 8-10 lab runs: 5-15% MAPE on titer at production scale. Custom organism with strong priors (mu_max, Ks, temperature range, DSP template): 8-20% MAPE. Custom organism with minimal priors (just approximate mu_max and duration): 12-35% MAPE. The gap narrows fast as you add more priors or more runs. A 'data-driven fit' risk factor appears on the result so reviewers understand the intervals reflect the reduced confidence.
05Can I save my custom organism as a profile to reuse?
Not yet, but it's on the roadmap. Current sprint: pass organism_overrides as a dict at prediction time. Next sprint: full database-backed profiles you can save and re-select. If you're a pilot customer with a specific strain you'll use repeatedly, we can manually add your profile to the built-in list so the workflow is identical to a native organism.
06Can I use this for non-fermentation bioprocesses (cell-free, enzymatic, photobioreactor)?
Cell-free and enzymatic processes don't fit the default Monod ODE structure well, so the accuracy benefit from the physics-informed prior is smaller. You can still use the platform but expect wider intervals and rely more on lab anchoring. Photobioreactors (algae, cyanobacteria) are on the roadmap but not currently supported. Light-limited kinetics require a different physics term than the current loss function encodes.