• A Cloud Framework for High Throughput Biological Data Processing

  • The molecular systems biology community has to deal with an increasingly growing amount of data. A recent programming model that addresses the data deluge is MapReduce which facilitates processing of huge data volumes on large sets of computing resources. However, the availability of appropriate local computing resources is often limited. Cloud computing addresses this issue by providing virtually infinite resources on demand, usually following a pay per use model. In this paper we present our cloud based high throughput computing infrastructure which combines the Software as a Service approach with the MapReduce programming model for data-intensive applications, and a configurable distributed file system as provided by the Hadoop framework. Within this infrastructure we realized an application in the field of molecular systems biology which matches tryptic peptide fragmentation mass spectra data against a large scale mass spectral reference database. We evaluate this application on a local cloud resource and study the effects of different configuration parameters as provided by the application, the Hadoop framework, and the available computational and storage resources.

  • The molecular systems biology community has to deal with an increasingly growing amount of data. A recent programming model that addresses the data deluge is MapReduce which facilitates processing of huge data volumes on large sets of computing resources. However, the availability of appropriate local computing resources is often limited. Cloud computing addresses this issue by providing virtually infinite resources on demand, usually following a pay per use model. In this paper we present our cloud based high throughput computing infrastructure which combines the Software as a Service approach with the MapReduce programming model for data-intensive applications, and a configurable distributed file system as provided by the Hadoop framework. Within this infrastructure we realized an application in the field of molecular systems biology which matches tryptic peptide fragmentation mass spectra data against a large scale mass spectral reference database. We evaluate this application on a local cloud resource and study the effects of different configuration parameters as provided by the application, the Hadoop framework, and the available computational and storage resources.

  • PDF

  • http://phaidra.univie.ac.at/o:243959

  • Contribution To Periodical

  • Published Version

  • 01.01.2011

  • English

  • Open access

  • 1824-8039

  • ÖFOS 2002 → NATURAL SCIENCES → Physics, Mechanics, Astronomy → Mass spectrometry

  • ÖFOS 2002 → NATURAL SCIENCES → Mathematics, Computer Sciences → Numeric computation

  • ÖFOS 2002 → NATURAL SCIENCES → Biology, Botany, Zoology → Molecular biology

  • ÖFOS 2002 → NATURAL SCIENCES → Mathematics, Computer Sciences →