Bayesian-Inference-Driven Model Parametrization and Model Selection for 2CLJQ Fluid Models

J Chem Inf Model. 2022 Feb 7. doi: 10.1021/acs.jcim.1c00829. Online ahead of print.


A high level of physical detail in a molecular model improves its ability to perform high accuracy simulations but can also significantly affect its complexity and computational cost. In some situations, it is worthwhile to add complexity to a model to capture properties of interest; in others, additional complexity is unnecessary and can make simulations computationally infeasible. In this work, we demonstrate the use of Bayesian inference for molecular model selection, using Monte Carlo sampling techniques accelerated with surrogate modeling to evaluate the Bayes factor evidence for different levels of complexity in the two-centered Lennard-Jones + quadrupole (2CLJQ) fluid model. Examining three nested levels of model complexity, we demonstrate that the use of variable quadrupole and bond length parameters in this model framework is justified only for some chemistries. Through this process, we also get detailed information about the distributions and correlation of parameter values, enabling improved parametrization and parameter analysis. We also show how the choice of parameter priors, which encode previous model knowledge, can have substantial effects on the selection of models, penalizing careless introduction of additional complexity. We detail the computational techniques used in this analysis, providing a roadmap for future applications of molecular model selection via Bayesian inference and surrogate modeling.

PMID:35129974 | DOI:10.1021/acs.jcim.1c00829