About this Abstract |
Meeting |
MS&T21: Materials Science & Technology
|
Symposium
|
AI for Big Data Problems in Advanced Imaging, Materials Modeling and Automated Synthesis
|
Presentation Title |
Machine Learning Polymer Property Prediction Models with Polymers Represented as Natural Language |
Author(s) |
Christopher Benjamin Kuenneth, Rampi Ramprasad |
On-Site Speaker (Planned) |
Christopher Benjamin Kuenneth |
Abstract Scope |
Polymer informatics tools have been recently gaining ground to design and discover polymers that meet specific application needs. A critical component of such tools is the conversion of polymers to machine readable representations (so-called fingerprints). The fingerprinting process has so far been based on handcrafted approaches that capture key chemical and structural features. Recently, within the domain of natural language processing, transformer-based ML models have demonstrated a new, fully ML based path to obtain fingerprints of language. Here, we view SMILES strings as a language representation of polymers, and use them to train a transformer based ML model using more than 100 million SMILES strings. The performance of the so-derived fingerprints are compared with traditional fingerprints using a large polymer property data set. Our new approach has a similar prediction performance compared to the existing state-of-the-art methods, but is faster, more flexible, and allows us to create fully-autonomous ML pipelines. |