About this Abstract |
Meeting |
2025 TMS Annual Meeting & Exhibition
|
Symposium
|
AI/Data Informatics: Computational Model Development, Verification, Validation, and Uncertainty Quantification
|
Presentation Title |
The Unsaturation Effect: Balanced Data Aggregation for Materials Informatics via Acquisition Functions |
Author(s) |
Layla Purdy, Taylor D Sparks, Ramsey Issa, Federico Ottomano |
On-Site Speaker (Planned) |
Layla Purdy |
Abstract Scope |
Insufficient data is one of the greatest hurdles materials informatics research must overcome to increase ML model performance. Various data aggregation techniques have been explored, but when put to practice, models trained by combined datasets often underperform compared to their single-dataset counterparts. This phenomenon, or saturation effect, results from imbalance and noise introduced when unique datasets are joined. This study acts as a follow-up to the previous data aggregation research conducted by Ottomano et al in which three aggregation techniques were tested: simple concatenation, element-focused concatenation, and the DiSCoVeR algorithm. This work will employ Bayesian optimization to prioritize the exploration and exploitation of data points, returning a more balanced aggregation that can be used to create more accurate models in the future. With the high dimensional nature and limited size of materials datasets, we expect the inclusion of an acquisition function will greatly enhance the aggregation model’s performance. |
Proceedings Inclusion? |
Planned: |
Keywords |
Machine Learning, |