About this Abstract | 
  
   
    | Meeting | 
    2024 TMS Annual Meeting & Exhibition
       | 
  
   
    | Symposium 
       | 
    AI/Data Informatics: Computational Model Development, Verification, Validation, and Uncertainty Quantification
       | 
  
   
    | Presentation Title | 
    J-6: Annotating Materials Science Text: A Semi-Automated Approach for Crafting Outputs with Gemini Pro | 
  
   
    | Author(s) | 
    Hasan Muhammad  Sayeed, Trupti   Mohanty, Taylor  D. Sparks | 
  
   
    | On-Site Speaker (Planned) | 
    Hasan Muhammad  Sayeed | 
  
   
    | Abstract Scope | 
    
Recent advancements in large language models (LLMs) have paved the way for automated information extraction in the materials science domain. However, fine-tuning these models, crucial for effective machine learning pipelines in materials science, is hindered by a lack of pre-annotated data. Manual annotation, a laborious process, exacerbates the challenge. To address this, we introduce a tailored semi-automated annotation process, using Google's Gemini Pro language model. Our approach focuses on two key tasks: extracting information in structured JSON format and generating abstractive summaries from materials science texts. The collaborative process, a symbiotic effort between human annotators and the LLM, driven by structured prompts and user-guided examples, enhances the annotation quality and augments the LLM's capacity to comprehend materials science intricacies. Importantly, it streamlines human annotation efforts by leveraging the LLM's proficient starting point. | 
  
   
    | Proceedings Inclusion? | 
    Planned:  | 
  
 
    | Keywords | 
    Machine Learning, Computational Materials Science & Engineering, Extraction and Processing |