About this Abstract |
Meeting |
TMS Specialty Congress 2025
|
Symposium
|
3rd World Congress on Artificial Intelligence in Materials & Manufacturing (AIM 2025)
|
Presentation Title |
LLM-Assisted Data Curation in Starrydata: An Open Database of Material Properties Extracted From Published Plots |
Author(s) |
Yukari Katsura, Tomoya Mato, Yu Takada, Dewi Yana, Eiji Koyama, Erina Fujita, Yoshihiro Sakamoto, Naoto Saito, Fumikazu Hosono, Atsumi Tanaka, Masaya Kumagai |
On-Site Speaker (Planned) |
Yukari Katsura |
Abstract Scope |
We developed the Starrydata web system (https://www.starrydata2.org) as an open database of experimental materials data collected by tracing plot images from scientific literature. This platform enables users, including our data curators, to share experimental data extracted from published papers. The data hosted on Starrydata are publicly available and can be freely downloaded and used for both commercial and non-commercial purposes, provided our paper is cited. Starrydata includes datasets for functional materials such as thermoelectric, magnetic, and battery materials. The thermoelectric materials project comprises data for approximately 50,000 physical samples reported in around 10,000 papers, including more than 130,000 curve data on the temperature dependence of thermoelectric properties. We will present our efforts to accelerate data curation in Starrydata by developing an automated retrieval system using commercial Large Language Models (LLMs). Tasks such as figure labeling and the extraction of experimental processes from text have proven effective in assisting data curators. |
Proceedings Inclusion? |
Undecided |