In a world where Google can teach anyone about the aerodynamics of a golf ball or 5 million strangers can see the same Facebook post, it’s hard to imagine that scientists often struggle to process and share their data. Thanks to a fellowship from the University of Minnesota Informatics Institute, Associate Professor Adrian Hegeman is working to create a software platform called Galaxy-M to address this issue in the field of metabolomics.
A single plant can have tens of thousands of metabolites, and many of these metabolites can be measured using modern chemical analysis techniques. The abundance of data from these sorts of experiments has created new challenges and opportunities for researchers trying to figure out how to pull valuable information from big data sets. “Researchers in classical metabolomics look at pattern changes in thousands of chemicals. Then they make sure those changes are real and figure out which chemicals in those changes are interesting,” explained Hegeman. “It lets you focus in on what’s important and use your resources in a targeted way.” Detecting those changes can be a challenge, because analyzing the data with different processes can reveal different information about a single data set. “You can get some info on your data set from one process, but not enough,” said Hegeman.
Researchers often independently develop programs to meet their lab’s needs, but other scientists may struggle to access the program. Through Galaxy-M, researchers can instead draw upon a multitude of programs that other researchers will also be able access easily. As an added bonus, researchers can run their data through these various data processing tools and then display the results side by side for comparison. “If you can get those results next to the other results in your database, you can start doing more and start to learn which tools work best for different experiments,” said Hegeman.
Galaxy-M goes beyond simply gathering programs together. In order to reproduce an experiment, a researcher needs to not only have access to the program that the original experiment used, but they need to know the exact settings used by the program. This isn’t a problem with Galaxy-M. When data is run through Galaxy-M it will capture information on the program and the settings used, which can then be published in a scientific journal and replicated by other researchers.
Another important component of Galaxy-M is that it will be open source, which means that the original code is made public and can be changed by anyone. Hegeman pointed out, “We’re in a phase in metabolomics as a field where lots of people are developing tools, but they’re not sticking the info together or sharing data sets.” Making Galaxy-M open source allows users to add in features and programs that are useful to them that may not have been in the original program. When users make add-ons to Galaxy-M, though, it will then be easily accessible to other researchers replicating the experiment. Doing this ensures that the program can grow and change to meet the needs of metabolomics as the needs of the field shift.
A project like this requires time before it can be useable. The fellowship gave Hegeman the time needed to figure out what the program will look like, but he’s currently seeking support to bring Galaxy-M to reality. “We’re going to really start working in earnest next fall, and then it’s about a three year project. Hopefully we should have a good workable version in three to four years.”