Similarity-Based Segmentation of Multi-Dimensional Signals

Rainer Machné; Douglas B. Murray; Peter F. Stadler

doi:10.1038/s41598-017-12401-8

Titel

Similarity-Based Segmentation of Multi-Dimensional Signals

Autor*in

Rainer Machné

Institut für Theoretische Chemie, Fakultät für Chemie, Universität Wien

Douglas B. Murray

Institute for Advanced Biosciences, Keio University

Peter F. Stadler

Institut für Theoretische Chemie, Fakultät für Chemie, Universität Wien

Abstract

The segmentation of time series and genomic data is a common problem in computational biology. With increasingly complex measurement procedures individual data points are often not just numbers or simple vectors in which all components are of the same kind. Analysis methods that capitalize on slopes in a single real-valued data track or that make explicit use of the vectorial nature of the data are not applicable in such scenaria. We develop here a framework for segmentation in arbitrary data domains that only requires a minimal notion of similarity. Using unsupervised clustering of (a sample of) the input yields an approximate segmentation algorithm that is efficient enough for genome-wide applications. As a showcase application we segment a time-series of transcriptome sequencing data from budding yeast, in high temporal resolution over ca. 2.5 cycles of the short-period respiratory oscillation. The algorithm is used with a similarity measure focussing on periodic expression profiles across the metabolic cycle rather than coverage per time point.

Stichwort

Applied mathematicsData processingGenome informaticsScientific dataSoftware

Objekt-Typ

journal article

Sprache

Englisch [eng]

Persistent identifier

https://phaidra.univie.ac.at/o:918375

DOI

10.1038/s41598-017-12401-8

Erschienen in

Titel