Abstract
Background: Genomic regulatory blocks (GRBs) are chromosomal regions spanned by highly
conserved non-coding elements (HCNEs), most of which serve as regulatory inputs of one target
gene in the region. The target genes are most often transcription factors involved in embryonic
development and differentiation. GRBs often contain extensive gene deserts, as well as additional
'bystander' genes intertwined with HCNEs but whose expression and function are unrelated to
those of the target gene. The tight regulation of target genes, complex arrangement of regulatory
inputs, and the differential responsiveness of genes in the region call for the examination of
fundamental rules governing transcriptional activity in GRBs. Here we use extensive CAGE tag
mapping of transcription start sites across different human tissues and differentiation stages
combined with expression data and a number of sequence and epigenetic features to discover these
rules and patterns.
Results: We show evidence that GRB target genes have properties that set them apart from their
bystanders as well as other genes in the genome: longer CpG islands, a higher number and wider
spacing of alternative transcription start sites, and a distinct composition of transcription factor
binding sites in their core/proximal promoters. Target gene expression correlates with the
acetylation state of HCNEs in the region. Additionally, target gene promoters have a distinct
combination of activating and repressing histone modifications in mouse embryonic stem cell lines.
Conclusions: GRB targets are genes with a number of unique features that are the likely cause of
their ability to respond to regulatory inputs from very long distances.