Muppirala, UshaHonavar, VasantDobbs, Drena2020-06-302020-06-30Sat Jan 012011-01-01https://dr.lib.iastate.edu/handle/20.500.12876/38009<p><h3>Background</h3></p> <p>RNA-protein interactions (RPIs) play important roles in a wide variety of cellular processes, ranging from transcriptional and post-transcriptional regulation of gene expression to host defense against pathogens. High throughput experiments to identify RNA-protein interactions are beginning to provide valuable information about the complexity of RNA-protein interaction networks, but are expensive and time consuming. Hence, there is a need for reliable computational methods for predicting RNA-protein interactions. <h3>Results</h3></p> <p>We propose RPISeq, a family of classifiers for predicting R NA-p rotein i nteractions using only seq uence information. Given the sequences of an RNA and a protein as input, <em>RPIseq</em> predicts whether or not the RNA-protein pair interact. The RNA sequence is encoded as a normalized vector of its ribonucleotide 4-mer composition, and the protein sequence is encoded as a normalized vector of its 3-mer composition, based on a 7-letter reduced alphabet representation. Two variants of <em>RPISeq</em> are presented: <em>RPISeq-SVM</em>, which uses a Support Vector Machine (SVM) classifier and <em>RPISeq-RF</em>, which uses a Random Forest classifier. On two non-redundant benchmark datasets extracted from the Protein-RNA Interface Database (PRIDB), <em>RPISeq</em> achieved an AUC (Area Under the Receiver Operating Characteristic (ROC) curve) of 0.96 and 0.92. On a third dataset containing only mRNA-protein interactions, the performance of <em>RPISeq</em> was competitive with that of a published method that requires information regarding many different features (e.g., mRNA half-life, GO annotations) of the putative RNA and protein partners. In addition, <em>RPISeq</em> classifiers trained using the PRIDB data correctly predicted the majority (57-99%) of non-coding RNA-protein interactions in NPInter-derived networks from <em>E. coli, S. cerevisiae, D. melanogaster, M. musculus</em>, and <em>H. sapiens</em>. <h3>Conclusions</h3></p> <p>Our experiments with <em>RPISeq</em> demonstrate that RNA-protein interactions can be reliably predicted using only sequence-derived information. <em>RPISeq</em> offers an inexpensive method for computational construction of RNA-protein interaction networks, and should provide useful insights into the function of non-coding RNAs. <em>RPISeq</em> is freely available as a web-based server at <a href="http://pridb.gdcb.iastate.edu/RPISeq/">http://pridb.gdcb.iastate.edu/RPISeq/</a>.</p>application/pdfenPredicting RNA-Protein Interactions Using Only Sequence Informationarticleisulib-bepress-aws-west10979739158gdcb_las_pubs/95BioinformaticsComputational BiologyGeneticsGenomics