About this Abstract |
Meeting |
2024 TMS Annual Meeting & Exhibition
|
Symposium
|
AI/Data Informatics: Computational Model Development, Verification, Validation, and Uncertainty Quantification
|
Presentation Title |
J-6: Annotating Materials Science Text: A Semi-Automated Approach for Crafting Outputs with Gemini Pro |
Author(s) |
Hasan Muhammad Sayeed, Trupti Mohanty, Taylor D. Sparks |
On-Site Speaker (Planned) |
Hasan Muhammad Sayeed |
Abstract Scope |
Recent advancements in large language models (LLMs) have paved the way for automated information extraction in the materials science domain. However, fine-tuning these models, crucial for effective machine learning pipelines in materials science, is hindered by a lack of pre-annotated data. Manual annotation, a laborious process, exacerbates the challenge. To address this, we introduce a tailored semi-automated annotation process, using Google's Gemini Pro language model. Our approach focuses on two key tasks: extracting information in structured JSON format and generating abstractive summaries from materials science texts. The collaborative process, a symbiotic effort between human annotators and the LLM, driven by structured prompts and user-guided examples, enhances the annotation quality and augments the LLM's capacity to comprehend materials science intricacies. Importantly, it streamlines human annotation efforts by leveraging the LLM's proficient starting point. |
Proceedings Inclusion? |
Planned: |
Keywords |
Machine Learning, Computational Materials Science & Engineering, Extraction and Processing |