Wednesday, August 13, 2014

The Data Lives On - Biospecimen Immortality Through Annotation

 Mark Collins, Ph.D, Director of Marketing, BioFortis, Inc

The idea that a specimen can achieve something akin to immortality via annotation is a somewhat radical idea, but one that is gaining interest from the biorepository community. Sample collections are increasingly viewed as dynamic resources to be used, not merely preserved, so while the lifetime of the physical sample may be finite, the data can live much longer with some measure of immortality. Such a concept is of particular interest to those organizations that consider their biorepository to be as much a knowledgebase for future biomedical research as a physical repository of samples. These organizations are establishing “next generation biobanks” where the value of the sample to scientific research is directly related to the richness of the data. However, linking of annotation data to specimens in the next generation biobank is not without challenges, namely;
  • • What annotations to collect?
  • • How best to collect annotations, e.g. vocabularies, ontologies, standards etc?
  • • How to manage consent, security and privacy issues?
  • • How to maximize the benefit of the sample-linked data?
  • • What kind of informatics infrastructure is needed?
At the 2013 IIR Biorepositories meeting I presented on some of these challenges and how to overcome them, judging by the number of views of the presentation on slideshare – it’s a topic that resonates.

At this year’s meeting I will be presenting again, expanding on the topic further. Outlined below are some thoughts on how to respond to the challenges of nnotation with a view to making your samples “immortal”.

What annotations to collect: the rule-of-thumb is to collect as much data about the sample and the donor as possible, especially if samples are being collected for general use. (Figure 1) However, even If samples are being collected as part of a specific trial or research study you still may want to collect as much as possible. It helps to categorize the data into levels;

·         Level I: Operational data
o   Sample identifiers, storage location, collection dates, type, amounts, chain-of-custody, basic donor information, visits etc
·         Level II: Specimen and donor data
o   Sample assay data (basic biochemical and pathology), donor medical history/medical record

·         Level III: ‘Omic data
o   Genomic information (SNP, CNV, NGS etc), proteomics, metabolomics etc, family history, imaging studies (MRI, PET, CAT etc)



By thinking about levels it helps then with the next challenge;

How best to collect annotations: This topic could be a series of blog posts, but in general the use of standard ontologies and vocabularies is critical to ensuring that the data can actually be searched later. Examples of such are SNOMED, ICD-9, OMOP and SPREC codes. Match the ontology/vocabulary needs with the level of data and always try to collect data electronically rather than manual entry of transcribing paper records. There are several ISBER working groups on standards and vocabularies that you should check out.

How to manage consent, security and privacy issues: Again this is a topic for multiple blog posts. Donors are rightly concerned about not just how their sample will be used, but what information will be generated and made available, to whom and for what purpose. This recent article reinforces the need to educate about biobanks, especially for these next generation biobank knowledgebases where consent, privacy and security aspects are front-and-center. There is no consent “magic bullet” here but a clear understanding of allowed use is a must. Consent needs to be dynamically updated to address changing donor consent decisions.  While an extreme example, too many people are aware of the controversy surrounding Helena Lacks, which underpins the concern about potential sample misuse.
With regard to security and privacy, the informatics infrastructure must be able to ensure security (encryption and access permissions); from a privacy perspective systems managing PHI (Personal Health Information) data must comply with regulatory rules (e.g HIPAA and 21CFRPt.11).

How to maximize the benefit of sample linked data: It is expected that the next generation biobank generates scientific insights from the rich annotations. This requires end-user tools that allow anyone with the appropriate permissions to ask questions of the biobank, without having to know about databases or query languages. As far as possible users should be able to self-serve complex questions (Figure 2)



What kind of informatics infrastructure is needed: Powering the next generation biobank requires software that is able to harmonize all sample and sample related data into a secure, flexible and complaint holistic view that allows researchers to use the biobank as a knowledgebase for future biomedical research.

For more information on the idea of sample immortality join BioFortis at the 7th Annual Biorepositories Conference, September 8-10, Boston, MA.

Download the agenda now to see what else is on tap. 

Reminder: You can SAVE $100 as a reader of this blog. Register here and use code XP1998BLOG.

Don’t miss a thing. Sign up for our email updates
Follow us on Twitter
Like us on Facebook
Join us on LinkedIn






No comments: