What is Biocuration?

By Marian Caballo

Biocuration describes the act of integration and publishing of biological information into resources, online databases, and data sets. Through biocuration, researchers and data scientists have an avenue to organize biological information in order to make knowledge more accessible to humans and computers. In our 21st century world, biocuration serves as an integral part of scientific exploration.

Popular examples of primary databases—many of which we’ve explored in our Post-AP Genetics class—include DDBJ (Japan), GenBank (USA) and European Nucleotide Archive (Europe), all repositories for raw nucleotide sequence data. Other examples are the 1000 Genomes Project, HapMap, and 23andMe.

This broad and ever-expanding field is inherently interdisciplinary—requiring collaboration from biocurators, software developers, bioinformatics researchers, and marketers/social media managers. Although the science itself is important to upload and understand, it is equally important for websites to be shared and promoted online. Each website must also be palatable for scientist use (and sometimes even for the general public—like in Bronx Science classrooms)!

But what exactly does the biocuration industry entail? How is biological data published without ethical and legal concerns for the participating subjects?

An essential part of data collection for these large genetic databases or biobanks is the informed consent process. Informed consent centers around the ethical principle of respect for persons, ensuring that all subjects of research or clinical trials are equipped with information needed to make a knowledgeable and self-assured decision about participation in the project. Half of the process prioritizes the informing of the individuals, while the other half—the consent form—prioritizes the consent of the individuals.

Additionally, all data samples must also be collected according to the laws that apply to the researchers who collect the samples—most importantly according to the laws of the country in which the samples will be collected. hhs.gov (U.S. Department of Health and Human Services) is an international compilation of human research protections, laws, regulations, and guidelines governing human subjects. The international program specifically works to ensure that human subjects outside of the US who participate in the Department of Health and Human Services-funded research have access to an equal level of protection as research participants inside the United States.

The most relevant and crucial department for biocuration, however, would be The Office for Human Research Protections (OHRP), which provides leadership in the “protection of the rights, welfare, and wellbeing of human subjects involved in research.” Guidance under “CODED PRIVATE INFORMATION OR BIOSPECIMENS USED IN RESEARCH” specifically clarifies that the coding of human information aligns with the rules of 45 CFR 46.102(e). This means that biocuration doesn’t technically involve human subjects as defined under 45 CFR 46.102(f) if the following conditions are both met:

  1. “the private information or specimens were not collected specifically for the currently proposed research project through an interaction or intervention with living individuals”

  1. “the investigator(s) cannot readily ascertain the identity of the individual(s) to whom the coded private information or specimens”

The term “individually identifiable” indicates that the private information can indeed be matched with an individual’s identity.

While these important guidelines exist, there are also a number of independent not-for-profit efforts—such as the Public Population Project in Genomics—that support data collection as well as the infrastructure of biobanks. All employees work at the cross-sections of “epidemiology, law, ethics, technology, and biomolecular science.”

Unfortunately, curation is not always seen as a critical role in the broader scientific community. But it is important to recognize the significance of curation for the advancement of science.

Interest in biocuration has specifically skyrocketed over the past couple of years due to the rapid spread of the COVID-19 pandemic. Online databases—such as the Broad Institute of MIT and Harvard’s Single Cell Portal and COVID CG—have helped support crucial COVID-19 research efforts all over the world. Just like week, researchers in Europe used UK Biobank, (containing genetic information of over 500,000 participants from the UK) to advance genetic disease research.

As our knowledge of genetics and biological data continues to expand, it is crucial that the biocuration industry expands with it.


