BayBE — A Bayesian Back End for Design of Experiments¶
The Bayesian Back End (BayBE) helps to find good parameter configurations within complex parameter search spaces.
Example use cases:
🧪 Find chemical reaction conditions or process parameters
🥣 Create materials, chemical mixtures or formulations with desired properties
✈️ Optimize the 3D shape of a physical object
🖥️ Optimize a virtual simulation
⚙️ Select model hyperparameters
🫖 Find tasty espresso machine settings
This is achieved via Bayesian Design of Experiments, which helps to efficiently navigate parameter search spaces. It balances exploitation of parameter space regions known to lead to good outcomes and exploration of unknown regions.
BayBE provides a general-purpose toolbox for Bayesian Design of Experiments, focusing on making this procedure easily accessible for real-world experiments.
🔋 Batteries Included¶
BayBE offers a range of ✨built‑in features✨, including:
🛠️ Flexible modeling options
- Choose between different target types, including numerical targets (e.g., experimental outcome values) and binary targets (e.g., good/bad classification of experimental results).
- Specify how favourable individual target values are (e.g., for matching to a specific value or saturation behaviour) via target transformations.
- Optimize multiple targets at once (e.g., via Pareto optimization or desirability scalarization).
- Use both continuous and discrete parameters within a single hybrid search space.
- Restrict the search space to only a relevant subspace (e.g., to define a maximal number of mixture components) using constraints.
- Choose between different optimization strategies to balance exploration and exploitation of the search space:
- Gain the understanding of the whole search space via active learning.
- Maximize total gain across a sequence of actions via bandit models.
📚 Mechanisms for leveraging additional information
- Capture relationships between categories by encoding categorical data.
- Use built-in chemical encodings for chemistry-related use cases.
- Built-in mechanistic process understanding via custom surrogate models.
- Leverage additional data from similar campaigns to accelerate optimization via transfer learning.
🔗 Advanced optimization workflows
- Run campaigns asynchronously with partial measurements and pending experiments.
- Connect BayBE with database storage and API wrappers using the serialization functionality.
🔍 Performance evaluation tools
- Gain insights about the optimization campaigns by analyzing model behavior and feature importance.
- Conduct benchmarks to select between different Bayesian optimization settings via backtesting.
⚡ Quick Start¶
To perform Bayesian Design of Experiments with BayBE, you should first specify the parameter search space and objective to be optimized. Based on this information and any available data about outcomes of specific parameter configurations, BayBE will recommend the next set of parameter configurations to be measured. To inform the next recommendation cycle, the newly generated measurements can be added to BayBE.
From the user perspective, the most important part is the “setup” step (top of the figure).
Below we show a simple optimization procedure, starting with the setup step and subsequently performing the recommendation loop. The provided example aims to maximize the yield of a chemical reaction by adjusting its parameter configurations (also known as reaction conditions).
First, install BayBE into your Python environment:
pip install baybe
For more information on this step, see our detailed installation instructions.
Defining the Optimization Objective¶
In BayBE’s language, the reaction yield can be represented as a NumericalTarget,
which we wrap into a SingleTargetObjective:
from baybe.targets import NumericalTarget
from baybe.objectives import SingleTargetObjective
target = NumericalTarget(name="Yield")
objective = SingleTargetObjective(target=target)
In cases where we are confronted with multiple (potentially conflicting) targets
(e.g., yield vs cost),
the ParetoObjective or DesirabilityObjective can be used to define how the targets should be balanced.
For more details, see the
objectives section
of the user guide.
Defining the Search Space¶
Next, we inform BayBE about the available “control knobs”, that is, the underlying
reaction parameters we can tune to optimize the yield.
In this case we tune granularity, pressure and solvent, each being encoded as a Parameter.
We also need to specify which values individual parameters can take.
from baybe.parameters import (
CategoricalParameter,
NumericalDiscreteParameter,
SubstanceParameter,
)
parameters = [
CategoricalParameter(
name="Granularity",
values=["coarse", "medium", "fine"],
encoding="OHE", # one-hot encoding of categories
),
NumericalDiscreteParameter(
name="Pressure[bar]",
values=[1, 5, 10],
tolerance=0.2, # allows experimental inaccuracies up to 0.2 when reading values
),
SubstanceParameter(
name="Solvent",
data={
"Solvent A": "COC",
"Solvent B": "CCC", # label-SMILES pairs
"Solvent C": "O",
"Solvent D": "CS(=O)C",
},
encoding="MORDRED", # chemical encoding via scikit-fingerprints
),
]
For more parameter types and their details, see the parameters section of the user guide.
Additionally, we can define a set of constraints to further specify allowed ranges and relationships between our parameters. Details can be found in the constraints section of the user guide. In this example, we assume no further constraints.
With the parameter definitions at hand, we can now create our
SearchSpace based on the Cartesian product of all possible parameter values:
from baybe.searchspace import SearchSpace
searchspace = SearchSpace.from_product(parameters)
See the search spaces section of our user guide for more information on the structure of search spaces and alternative ways of construction.
Optional: Defining the Optimization Strategy¶
As an optional step, we can specify details on how the optimization of the experimental configurations should be performed. If omitted, BayBE will choose a default Bayesian optimization setting.
For our example, we combine two recommenders via a so-called meta recommender named
TwoPhaseMetaRecommender:
In cases where no measurements have been made prior to the interaction with BayBE, the parameters will be recommended with the
initial_recommender.As soon as the first measurements are available, we switch to the
recommender.
from baybe.recommenders import (
BotorchRecommender,
FPSRecommender,
TwoPhaseMetaRecommender,
)
recommender = TwoPhaseMetaRecommender(
initial_recommender=FPSRecommender(), # farthest point sampling
recommender=BotorchRecommender(), # Bayesian model-based optimization
)
For more details on the different recommenders, their underlying algorithmic details and how their settings can be adjusted, see the recommenders section of the user guide.
The Optimization Loop¶
We can now construct a Campaign that performs the Bayesian optimization of the experimental configurations:
from baybe import Campaign
campaign = Campaign(searchspace, objective, recommender)
With this object at hand, we can start our optimization cycle. In particular:
The campaign can
recommendnew experiments.We can
add_measurementsof target values for the measured parameter configurations to the campaign’s database.
Note that these two steps can be performed in any order. In particular, available measurements can be submitted at any time and also several times before querying the next recommendations.
df = campaign.recommend(batch_size=3) # Recommend three experimental configurations to test
print(df)
The below table shows the three parameter configurations for which BayBE recommended to measure the reaction yield.
Granularity Pressure[bar] Solvent
15 medium 1.0 Solvent D
10 coarse 10.0 Solvent C
29 fine 5.0 Solvent B
After having conducted the recommended experiments, we can add the newly measured target information to the campaign:
df["Yield"] = [79.8, 54.1, 59.4] # Measured yields for the three recommended parameter configurations
campaign.add_measurements(df)
With the newly provided data, BayBE can produce a refined recommendation for the next iteration. This loop typically continues until a desired target value is achieved in the experiment.
Inspect the Progress of the Experimental Configuration Optimization¶
The below plot shows progression of a campaign that optimized direct arylation reaction by tuning the solvent, base and ligand (from Shields, B.J. et al.). Each line shows the best target value that was cumulatively achieved after a given number of experimental iterations.
Different lines show outcomes of Campaigns with different designs.
In particular, the five Campaigns differ in how molecules are encoded within each chemical Parameter.
We can see that optimization is more efficient when
using chemical encodings (e.g., MORDRED) rather than encoding categories with one-hot encoding or random features.
💻 Installation¶
From Package Index¶
The easiest way to install BayBE is via PyPI:
pip install baybe
A certain released version of the package can be installed by specifying the
corresponding version tag in the form baybe==x.y.z.
From GitHub¶
If you need finer control and would like to install a specific commit that has not been released under a certain version tag, you can do so by installing BayBE directly from GitHub via git and specifying the corresponding git ref.
For instance, to install the latest commit of the main branch, run:
pip install git+https://github.com/emdgroup/baybe.git@main
From Local Clone¶
Alternatively, you can install the package from your own local copy. First, clone the repository, navigate to the repository root folder, check out the desired commit and run:
pip install .
A developer would typically also install the package in editable mode (‘-e’), which ensures that changes to the code do not require a reinstallation.
pip install -e .
If you need to add additional dependencies, make sure to use the correct syntax
including '':
pip install -e '.[dev]'
Optional Dependencies¶
There are several dependency groups that can be selected during pip installation, like
pip install 'baybe[test,lint]' # will install baybe with additional dependency groups `test` and `lint`
To get the most out of baybe, we recommend to install at least
pip install 'baybe[chem,simulation]'
The available groups are:
extras: Installs all dependencies required for optional features.benchmarking: Required for running the benchmarking module.chem: Cheminformatics utilities (e.g. for theSubstanceParameter).docs: Required for creating the documentation.examples: Required for running the examples/streamlit.lint: Required for linting and formatting.mypy: Required for static type checking.onnx: Required for using custom surrogate models in ONNX format.polars: Required for optimized search space construction via Polars.insights: Required for built-in model and campaign analysis (e.g. using SHAP).simulation: Enabling the simulation module.test: Required for running the tests.dev: All of the above plus dev tools. For code contributors.
📡 Telemetry¶
Telemetry was fully and permanently removed in version 0.14.0.
📖 Citation¶
If you find BayBE useful, please consider citing our paper:
@article{baybe_2025,
author = "Fitzner, Martin and {\v S}o{\v s}i{\'c}, Adrian and Hopp, Alexander V. and M{\"u}ller, Marcel and Rihana, Rim and Hrovatin, Karin and Liebig, Fabian and Winkel, Mathias and Halter, Wolfgang and Brandenburg, Jan Gerit",
title = "{BayBE}: a {B}ayesian {B}ack {E}nd for experimental planning in the low-to-no-data regime",
journal = "Digital Discovery",
year = "2025",
volume = "4",
issue = "8",
pages = "1991-2000",
publisher = "RSC",
doi = "10.1039/D5DD00050E",
url = "http://dx.doi.org/10.1039/D5DD00050E",
}
👨🏻🔧 Maintainers¶
🛠️ Known Issues¶
A list of know issues can be found here.
📄 License¶
Copyright 2022-2025 Merck KGaA, Darmstadt, Germany and/or its affiliates. All rights reserved.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
BayBE — A Bayesian Back End for Design of Experiments. |