ProGen AP: um Pipeline para Anotação Proteogenômica de Mycobaterium tuberculosis visando o Descobrimento de Genes com Potencial para Intervenção Biotecnológica.

Pinto, Beatriz Jeronimo

doi:10.11606/D.17.2013.tde-11062013-162047

Home

Facilities

Master's Dissertation

DOI

https://doi.org/10.11606/D.17.2013.tde-11062013-162047

Document

Master's Dissertation

Author

Pinto, Beatriz Jeronimo (Catálogo USP)

Full name

Beatriz Jeronimo Pinto

E-mail

Institute/School/College

Faculdade de Medicina de Ribeirão Preto

Knowledge Area

Genetics

Date of Defense

2013-05-03

Published

Ribeirão Preto, 2013

Supervisor

Giuliatti, Silvana (Catálogo USP)

Committee

Giuliatti, Silvana (President)
Faça, Vitor Marcel
Godoy, Lyris Martins Franco de

Title in Portuguese

ProGen AP: um Pipeline para Anotação Proteogenômica de Mycobaterium tuberculosis visando o Descobrimento de Genes com Potencial para Intervenção Biotecnológica.

Keywords in Portuguese

Mycobacterium tuberculosis
Pipeline
Banco de dados
Biomarcadores
Proteogenômica

Abstract in Portuguese

Anotação proteogenômica é uma abordagem que une a análise proteômica com a anotação genômica. O intuito de tal abordagem é prover uma anotação mais detalhada ao gene. Intuito esse, que nem sempre é possível quando se trata apenas de genes, uma vez que produtos gênicos, com funções importantes preditas, somente passam a ter papel na fisiologia do organismo quando expressos e traduzidos. Com todo o avanço atual de estudos na área proteogenômica, a geração de dados tem crescido de modo exponencial e, com esse crescimento, nota-se a necessidade cada vez maior da criação de sistemas capazes de processar, armazenar e gerenciar essas novas informações produzidas. Assim, é descrito nesse trabalho o desenvolvimento do ProGen AP , sendo constituído de uma interface web construída em HTML/PHP5, um banco de dados cujo SGBD é o mySQL e de módulos de processamento de dados proteômicos, neste caso o LabKey (com o core Xtandem!) e o QuickMod. Todos os módulos são open source e comunicam entre si através de scripts PERL. Nesse sistema, o pesquisador fornece dados de experimentos proteômicos e o sistema, então, os processa e retorna ao usuário informações sobre o gene expresso, a localização dos peptídeos dentro do gene aos quais pertencem e, ainda, informações quantitativas sobre o peptídeo e a proteína identificados. Além disso, o uso de um processamento esquematizado reduz a possibilidade de erro de entrada/saída de dados nos módulos intermediários do processamento. Aqui, o ProGen AP foi aplicado no estudo proteômico do Mycobacterium tuberculosis (MTb). Na literatura, o genoma do MTb cepa H37Rv contém apenas 4062 open reading frames (ORFs) preditos e o complemento funcional desse genoma, o proteoma, ainda não está totalmente elucidado. A análise do proteoma do MTb, com o uso do ProGen AP, resultou em uma lista total de 154.982 identificações de peptídeos, representando um total de 147.334 peptídeos únicos. Até o momento, foram identificadas 2.369 proteínas, cobrindo aproximadamente 58% de todo o genoma do MTB. É importante ressaltar que, dentre todas as proteínas identificadas até o momento, a maioria delas está anotada como proteinas hipotéticas em seu genoma, e, por consequência, os resultados obtidos nesse projeto confirmam e validam a existência de tais produtos gênicos. Além disso, 567 peptídeos foram identificados como N-terminal e 1229 como C-terminal, o que indica a correta predição do início e do término da tradução de tais genes. Todos esses resultados positivos confirmam que a abordagem utilizada no ProGen AP é eficiente e pode ser usada em vários outros organismos de interesse do pesquisador.

Title in English

ProGen AP: a Pipeline for Proteogenomic Annotation of Mycobacterium tuberculosis Seeking the Discovery of Potential Genes for Biotechnological Intervention.

Keywords in English

Mycobacterium tuberculosis
Pipeline
Biomarkers
Database
Proteogenomic

Abstract in English

Proteogenomic annotation is an approach that combines proteomic analysis and genomic annotation. The aim of this approach is to provide a more detailed annotation, which is not possible in most of the times when dealing mostly with genes, once that genomic products, with important predicted functions are only important in the organism physiology when they are expressed and translated. There have been occurring several advances in proteogenomic studies and the generation of new data sets has been growing in an exponential wave. With all this growth, the creation of systems able to storing, processing and analyzing all the new knowledge produced is eminent. This study presents the deployment of ProGen AP, a system built with a HTML/PHP5 web interface, a mySQL data management system to store the data and two processing modules (LabKey, with core X!Tandem and QuickMod). In this system, the researcher provides a data set from a proteomic experiment and then the system processes it and returns to the researcher information about the expressed gene, the peptides localization inside the gene that they belong and, also, quantitative information about the peptide and the protein that were identified. Also, the use of an automated pipeline reduces the possibility of making mistakes in input/output of the data when using the intermediate modules. Here, the ProGen AP were applied to perform a proteogenomic annotation of Mycobacterium tuberculosis (MTb). In literature, the MTb genome, strain H37RV, have only 4062 predicted open reading frames (ORFs) and the functional complement of this genome is not completely known. The MTb analysis using ProGen AP, resulted in a list of 154.982 peptides identification, representing a total of 147.334 single peptides. Until now, were identified 2.369 proteins, covering nearly of 58% of the whole MTb genome. Is very important to highlight that, among all the identified proteins until now, most of them are annotated as hypothetical proteins in the MTb genome, so can be affirmed that the results of this project can confirm and validate the existence of all these genomic products. Beside this, 567 peptides were identified as been an N-terminal peptide and 1229 were identified as been a C-terminal, this fact indicates that the prediction of the beginning and the end of translation of those genes are right. All these positive results corroborate that the approach utilized in the ProGen AP is efficient and can be used in studies of other organisms.

WARNING - Viewing this document is conditioned on your acceptance of the following terms of use:
This document is only for private use for research and teaching activities. Reproduction for commercial use is forbidden. This rights cover the whole data about this document as well as its contents. Any uses or copies of this document in whole or in part must include the author's name.

MestradoBeatrizJeronimoPinto.pdf (1.11 Mbytes)

Publishing Date

2013-08-19

Derived works

WARNING: The material described below relates to works resulting from this thesis or dissertation. The contents of these works are the author's responsibility.

PINTO, B. J., et al. ProGen AP -A Pipeline for the Automatic Proteogenomic Annotation to Aid in the Discovery of New Potential Genes for Biotechnological Intervention Applied to Health. In 7th International Conference ofth Brazilian Association for Bioinformatics and Computational Biology -X-Meeting, Florianópolis, SC, 2011. 2011 Abstract book., 2011. Abstract. Available from: http://www.ab3c.org/content/x-meeting-2011.
PINTO, B. J., et al. Machine Learning's Techniques Applied in the Study of the Profiles of Particular and Common Gene Expression of Autoimmune Diseases and Cancer. In ISMB 2008, Toronto, 2008. ISMB 2008 Proceedings., 2008. Abstract. Available from: http://www.iscb.org/cms_addon/conferences/ismb2008/index.php.
PINTO, B. J., et al. A Study of SNPs in Sirtuins Structures. In X-meeting 2006: ISBM 2006 and 2nd Annual AB3C Conference, Fortaleza, CE, 2006. Papers e Posters., 2006. Abstract. Available from: http://ismb2006.cbi.cnptia.embrapa.br/.
VINCI, A., et al. Comparative Analysis oh Human SIRT Proteins Tertiary Structures. In X-meeting 2006: ISBM 2006 and 2nd Annual AB3C Conference, Fortaleza, CE. Papers e Posters., 2006. Abstract. Available from: http://ismb2006.cbi.cnptia.embrapa.br/.