Skip to main content Skip to navigation

Research News

Reliable Material Databases Bridge AI- and Experimental-Led Material Discovery

Materials databases lie at the heart of future data-driven discovery in energy-related fields, say researchers from Tohoku University.

In a new article published in the journal Precision Chemistry, they examined how different types of databases, both computational and experimental, work together to support modern artificial intelligence (AI) tools used in materials science.

The study found that materials databases are no longer just places to store information. Instead, they play a central role in determining how well AI models perform. The way data is collected, organized, and shared - known as database architecture - can directly affect whether AI systems produce reliable and useful results.

"In a library, if books are poorly labeled, have missing pages, or are difficult to access, even the most skilled reader will struggle to find accurate information," stresses Hao Li, lead author of the paper and Distinguished Professor at Tohoku University's Advanced Institute for Materials Research (AIMR). "In the same way, AI models depend on well-structured and carefully curated data to make sound predictions."

The evolution of materials science paradigms. ©Li et al.

Li and his team categorized computational databases into two main groups: those that focus on bulk material properties and those that focus on surfaces and interfaces. They also reviewed experimental databases that cover areas such as crystal structures, catalysis, energy storage, and materials characterization.

Further analysis revealed the growing importance of integrated platforms. These systems connect computational predictions with detailed experimental data, allowing scientists to test ideas, refine models, and validate results in a continuous cycle. This approach supports more efficient and reliable materials discovery.

Moreover, the researchers introduced a roadmap for combining databases, AI models, and experimental workflows. This includes the use of graph neural networks, machine learning interatomic potentials, and large language model-based AI agents to accelerate the discovery process while maintaining scientific rigor.

However, the researchers identified several challenges that must be addressed. These include the need for standardized data practices aligned with FAIR principles (Findable, Accessible, Interoperable, Reusable), better tracking of data origins, and improved reporting of negative results, which are often missing but are important for reducing bias.

Computational and integrated platform. ©Li et al.

"Materials databases are the foundation of trustworthy AI in science," adds Li. "If we want AI to guide discovery in a reliable way, we must first ensure that the data it learns from is complete, transparent, and well-structured. Without reliable data, AI-led discovery will itself become unreliable."

Looking ahead, the team plans to improve database quality and connectivity across fragmented data sources. They also aim to develop new AI systems that can learn from multiple types of data simultaneously and work alongside experiments and human researchers. These efforts are expected to support more dependable and efficient discovery of materials for energy, sustainability, and everyday applications.

Database-to-model-to-experiment roadmap for domain models and AI Agents. ©Li et al.

Publication Details:

Title: Materials Databases: Foundations of Modern Digital Materials

Authors: Yutian Zhuang, Xiaojin Yang, Chenyi Zhang, Xue Jia, Di Zhang, Mingzhe Li, Tongao Yao, Jiayu Peng, Zhengyang Gao, Weijie Yang, Hao Li

Journal: Precision Chemistry

DOI: 10.1021/prechem.5c00449

Contact:

Hao Li
Advanced Institute for Materials Research (WPI-AIMR)
Email: li.hao.b8tohoku.ac.jp
Website: https://www.li-lab-cat-design.com/