About this Abstract |
Meeting |
2025 TMS Annual Meeting & Exhibition
|
Symposium
|
AI/Data Informatics: Computational Model Development, Verification, Validation, and Uncertainty Quantification
|
Presentation Title |
MatFold: Systematic Evaluation of Generalization Errors in Materials Discovery Models |
Author(s) |
Peter Schindler, Matthew Witman |
On-Site Speaker (Planned) |
Peter Schindler |
Abstract Scope |
Machine learning models in materials science validated by a single train/validation/test split can yield biased and overly optimistic performance estimates for downstream modeling or materials screening tasks. This can be particularly counterproductive for applications where the time and cost of failed validation efforts are consequential. We propose a set of standardized and increasingly difficult splitting protocols for chemically and structurally motivated, nested K-fold cross-validation that can be followed to validate any machine learning model for materials discovery. This enables systematic insights into model generalizability, improvability, and uncertainty. A general-purpose toolkit, MatFold, is provided to automate the construction of these chemically motivated train/test splits and facilitate further community use. We employ MatFold to analyze the generalization error of two datasets with distinct model architectures. The observed trends in generalization errors and their variances for various MatFold splitting protocols reveal unique scaling behavior for each model architecture. |
Proceedings Inclusion? |
Planned: |
Keywords |
Machine Learning, Surface Modification and Coatings, Energy Conversion and Storage |