Biomolecular Engineering Research Center Papers
Permanent URI for this collection
Browse
Recent Submissions
Item Transcription Factor Map Alignment of Promoter Regions(Public Library of Science, 2006-5-26) Blanco, Enrique; Messeguer, Xavier; Smith, Temple F; Guigó, RodericWe address the problem of comparing and characterizing the promoter regions of genes with similar expression patterns. This remains a challenging problem in sequence analysis, because often the promoter regions of co-expressed genes do not show discernible sequence conservation. In our approach, thus, we have not directly compared the nucleotide sequence of promoters. Instead, we have obtained predictions of transcription factor binding sites, annotated the predicted sites with the labels of the corresponding binding factors, and aligned the resulting sequences of labels—to which we refer here as transcription factor maps (TF-maps). To obtain the global pairwise alignment of two TF-maps, we have adapted an algorithm initially developed to align restriction enzyme maps. We have optimized the parameters of the algorithm in a small, but well-curated, collection of human–mouse orthologous gene pairs. Results in this dataset, as well as in an independent much larger dataset from the CISRED database, indicate that TF-map alignments are able to uncover conserved regulatory elements, which cannot be detected by the typical sequence alignments. Synopsis Sequence comparisons and alignments are among the most powerful tools in research in biology. Since similar sequences play, in general, similar functions, identification of sequence conservation between two or more nucleotide or amino acid sequences is often used to infer common biological functionality. Sequence comparisons, however, have limitations; often similar functions are encoded by higher order elements which do not hold a univocal relationship to the underlying primary sequence. In consequence, similar functions are frequently encoded by diverse sequences. Promoter regions are a case in point. Often, promoter sequences of genes with similar expression patterns do not show conservation. This is because, even though their expression may be regulated by a similar arrangement of transcription factors, the binding sites for these factors may exhibit great sequence variability. To overcome this limitation, the authors obtain predictions of transcription factor binding sites on promoter sequences, and annotate the predicted sites with the labels of the corresponding transcription factors. They develop an algorithm—inspired in an early algorithm to align restriction enzyme maps—to align the resulting sequence of labels—the so-called TF-maps (transcription factor maps). They show that TF-map alignments are able to uncover conserved regulatory elements common to the promoter regions of co-regulated genes, but those regulatory elements cannot be detected by typical sequence alignments.Item Constraining Ribosomal RNA Conformational Space(Oxford University Press, 2005-09-09) Favaretto, Paola; Bhutkar, Arjun; Smith, Temple F.Despite the potential for many possible secondary-structure conformations, the native sequence of ribosomal RNA (rRNA) is able to find the correct and universally conserved core fold. This study reports a computational analysis investigating two mechanisms that appear to constrain rRNA secondary-structure conformational space: ribosomal proteins and rRNA sequence composition. The analysis was carried out by using rRNA–ribosomal protein interaction data for the Escherichia coli 16S rRNA and free energy minimization software for secondary-structure prediction. The results indicate that selection pressures on rRNA sequence composition and ribosomal protein–rRNA interaction play a key role in constraining the rRNA secondary structure to a single stable form.Item GTPases and the Origin of the Ribosome(BioMed Central, 2010-5-20) Hartman, Hyman; Smith, Temple FBACKGROUND. This paper is an attempt to trace the evolution of the ribosome through the evolution of the universal P-loop GTPases that are involved with the ribosome in translation and with the attachment of the ribosome to the membrane. The GTPases involved in translation in Bacteria/Archaea are the elongation factors EFTu/EF1, the initiation factors IF2/aeIF5b + aeIF2, and the elongation factors EFG/EF2. All of these GTPases also contain the OB fold also found in the non GTPase IF1 involved in initiation. The GTPase involved in the signal recognition particle in most Bacteria and Archaea is SRP54. RESULTS. 1) The Elongation Factors of the Archaea based on structural considerations of the domains have the following evolutionary path: EF1→ aeIF2 → EF2. The evolution of the aeIF5b was a later event; 2) the Elongation Factors of the Bacteria based on the topological considerations of the GTPase domain have a similar evolutionary path: EFTu→ IF→2→EFG. These evolutionary sequences reflect the evolution of the LSU followed by the SSU to form the ribosome; 3) the OB-fold IF1 is a mimic of an ancient tRNA minihelix. CONCLUSION. The evolution of translational GTPases of both the Archaea and Bacteria point to the evolution of the ribosome. The elongation factors, EFTu/EF1, began as a Ras-like GTPase bringing the activated minihelix tRNA to the Large Subunit Unit. The initiation factors and elongation factor would then have evolved from the EFTu/EF1 as the small subunit was added to the evolving ribosome. The SRP has an SRP54 GTPase and a specific RNA fold in its RNA component similar to the PTC. We consider the SRP to be a remnant of an ancient form of an LSU bound to a membrane. REVIEWERS. This article was reviewed by George Fox, Leonid Mirny and Chris Sander.Item The Origin and Evolution of the Ribosome(BioMed Central, 2008-4-22) Smith, Temple F.; Lee, Jung C.; Gutell, Robin R.; Hartman, HymanBACKGROUND. The origin and early evolution of the active site of the ribosome can be elucidated through an analysis of the ribosomal proteins' taxonomic block structures and their RNA interactions. Comparison between the two subunits, exploiting the detailed three-dimensional structures of the bacterial and archaeal ribosomes, is especially informative. RESULTS. The analysis of the differences between these two sites can be summarized as follows: 1) There is no self-folding RNA segment that defines the decoding site of the small subunit; 2) there is one self-folding RNA segment encompassing the entire peptidyl transfer center of the large subunit; 3) the protein contacts with the decoding site are made by a set of universal alignable sequence blocks of the ribosomal proteins; 4) the majority of those peptides contacting the peptidyl transfer center are made by bacterial or archaeal-specific sequence blocks. CONCLUSION. These clear distinctions between the two subunit active sites support an earlier origin for the large subunit's peptidyl transferase center (PTC) with the decoding site of the small subunit being a later addition to the ribosome. The main implications are that a single self-folding RNA, in conjunction with a few short stabilizing peptides, formed the precursor of the modern ribosomal large subunit in association with a membrane. REVIEWERS. This article was reviewed by Jerzy Jurka, W. Ford Doolittle, Eugene Shaknovich, and George E. Fox (nominated by Jerzy Jurka).Item Protein Docking by the Underestimation of Free Energy Funnels in the Space of Encounter Complexes(Public Library of Science, 2008-10-10) Shen, Yang; Paschalidis, Ioannis Ch.; Vakili, Pirooz; Vajda, SandorSimilarly to protein folding, the association of two proteins is driven by a free energy funnel, determined by favorable interactions in some neighborhood of the native state. We describe a docking method based on stochastic global minimization of funnel-shaped energy functions in the space of rigid body motions (SE(3)) while accounting for flexibility of the interface side chains. The method, called semi-definite programming-based underestimation (SDU), employs a general quadratic function to underestimate a set of local energy minima and uses the resulting underestimator to bias further sampling. While SDU effectively minimizes functions with funnel-shaped basins, its application to docking in the rotational and translational space SE(3) is not straightforward due to the geometry of that space. We introduce a strategy that uses separate independent variables for side-chain optimization, center-to-center distance of the two proteins, and five angular descriptors of the relative orientations of the molecules. The removal of the center-to-center distance turns out to vastly improve the efficiency of the search, because the five-dimensional space now exhibits a well-behaved energy surface suitable for underestimation. This algorithm explores the free energy surface spanned by encounter complexes that correspond to local free energy minima and shows similarity to the model of macromolecular association that proceeds through a series of collisions. Results for standard protein docking benchmarks establish that in this space the free energy landscape is a funnel in a reasonably broad neighborhood of the native state and that the SDU strategy can generate docking predictions with less than 5 Å ligand interface Cα root-mean-square deviation while achieving an approximately 20-fold efficiency gain compared to Monte Carlo methods. Author SummaryProtein–protein interactions play a central role in various aspects of the structural and functional organization of the cell, and their elucidation is crucial for a better understanding of processes such as metabolic control, signal transduction, and gene regulation. Genomewide proteomics studies, primarily yeast two-hybrid assays, will provide an increasing list of interacting proteins, but only a small fraction of the potential complexes will be amenable to direct experimental analysis. Thus, it is important to develop computational docking methods that can elucidate the details of specific interactions at the atomic level. Protein–protein docking generally starts with a rigid body search that generates a large number of docked conformations with good shape, electrostatic, and chemical complementarity. The conformations are clustered to obtain a manageable number of models, but the current methods are unable to select the most likely structure among these models. Here we describe a refinement algorithm that, applied to the individual clusters, improves the quality of the models. The better models are suitable for higher-accuracy energy calculation, thereby increasing the chances that near-native structures can be identified, and thus the refinement increases the reliability of the entire docking algorithm.Item Inferring Genome-Scale Rearrangement Phylogeny and Ancestral Gene Order: A Drosophila Case Study(BioMed Central, 2007-11-8) Bhutkar, Arjun; Gelbart, William M.; Smith, Temple F.A simple, fast, and biologically inspired computational approach for inferring genome-scale rearrangement phylogeny and ancestral gene order has been developed. This has been applied to eight Drosophila genomes. Existing techniques are either limited to a few hundred markers or a small number of taxa. This analysis uses over 14,000 genomic loci and employs discrete elements consisting of pairs of homologous genetic elements. The results provide insight into evolutionary chromosomal dynamics and synteny analysis, and inform speciation studies.Item Survey of Human Mitochondrial Diseases Using New Genomic/Proteomic Tools(BioMed Central, 2001-06-01) Plasterer, Thomas N.; Smith, Temple F.; Mohr, Scott C.BACKGROUND. We have constructed Bayesian prior-based, amino-acid sequence profiles for the complete yeast mitochondrial proteome and used them to develop methods for identifying and characterizing the context of protein mutations that give rise to human mitochondrial diseases. (Bayesian priors are conditional probabilities that allow the estimation of the likelihood of an event - such as an amino-acid substitution - on the basis of prior occurrences of similar events.) Because these profiles can assemble sets of taxonomically very diverse homologs, they enable identification of the structurally and/or functionally most critical sites in the proteins on the basis of the degree of sequence conservation. These profiles can also find distant homologs with determined three-dimensional structures that aid in the interpretation of effects of missense mutations. RESULTS. This survey reports such an analysis for 15 missense mutations one insertion and three deletions involved in Leber's hereditary optic neuropathy, Leigh syndrome, mitochondrial neurogastrointestinal encephalomyopathy, Mohr-Tranebjaerg syndrome, iron-storage disorders related to Friedreich's ataxia, and hereditary spastic paraplegia. We present structural correlations for seven of the mutations. CONCLUSIONS. Of the 19 mutations analyzed, 14 involved changes in very highly conserved parts of the affected proteins. Five out of seven structural correlations provided reasonable explanations for the malfunctions. As additional genetic and structural data become available, this methodology can be extended. It has the potential for assisting in identifying new disease-related genes. Furthermore, profiles with structural homologs can generate mechanistic hypotheses concerning the underlying biochemical processes - and why they break down as a result of the mutations.