Team III Functional Annotation Group: Difference between revisions

From Compgenomics 2020
Jump to navigation Jump to search
Pmisra9 (talk | contribs)
No edit summary
Pmisra9 (talk | contribs)
No edit summary
Line 9: Line 9:


===[https://card.mcmaster.ca/analyze/rgi '''CARD-RGI''']===
===[https://card.mcmaster.ca/analyze/rgi '''CARD-RGI''']===
Comprehensive Antibiotic Resistance Database (CARD) is a rigorously curated collection of characterized, peer-reviewed Antibiotic Resistance Genes which is monthly updated. Resistance Gene Identifier(RGI) id a tool based on CARD for prediction of resistome.
Comprehensive Antibiotic Resistance Database (CARD) is a rigorously curated collection of characterized, peer-reviewed Antibiotic Resistance Genes which is monthly updated. Resistance Gene Identifier(RGI) is a toolkit based on CARD for annotating Antimicrobial genes.


===[http://www.mgc.ac.cn/VFs/main.htm '''VFDB''']===
===[http://www.mgc.ac.cn/VFs/main.htm '''VFDB''']===
Virulence Factor Database (VFDB) is an integrated and comprehensive online resource for curating information about virulence factors of bacterial pathogens (recently updated in 2019)
Virulence Factor Database (VFDB) is an integrated and comprehensive online resource for curating information about virulence factors of bacterial pathogens (recently updated in 2019). The database contains information such as structure features of the virulence factors, functions and mechanisms used by the pathogens for circumventing host defense mechanisms and causing pathogenicity. Core dataset of DNA sequences was downloaded from VFDB website, which include genes associated with experimentally verified Virulence Factors only. BLAST database was build based on the downloaded dataset from VFDB and BLASTN was used.


=='''Ab-initio Tools'''==
=='''Ab-initio Tools'''==


===[https://www.drive5.com/pilercr/ '''PILERCR''']===
===[https://www.drive5.com/pilercr/ '''PILERCR''']===
CRISPR are family of DNA sequences found in the genomes of prokaryotic organisms- bacteria and archaea. They are derived from DNA fragments of viruses that had previously infected the prokaryote and provides protection from viruses and plays a major role in antiviral defense system. PILERCR identifies CRISPR repeats by using BLAST to find their fragmented/ degraded copies
CRISPR are family of DNA sequences found in the genomes of prokaryotic organisms- bacteria and archaea. They are derived from DNA fragments of viruses that had previously infected the prokaryote and provides protection from viruses and plays a major role in antiviral defense system. PILERCR identifies CRISPR repeats by using BLAST to find their fragmented/ degraded copies. A CRISPR array is found when it fulfills the criteria of having a set of CRISPR repeats with intervening unique sequences known as spacers. This program provides fast identification and classification of CRISPR genes and also has both high sensitivity and high specificity.


=='''Final Pipeline'''==
=='''Final Pipeline'''==

Revision as of 19:53, 9 April 2020

Group Members - Allison, Bengu, Cheng, Pallavi Misra

Introduction

Initial Pipeline

Homology Tools

CARD-RGI

Comprehensive Antibiotic Resistance Database (CARD) is a rigorously curated collection of characterized, peer-reviewed Antibiotic Resistance Genes which is monthly updated. Resistance Gene Identifier(RGI) is a toolkit based on CARD for annotating Antimicrobial genes.

VFDB

Virulence Factor Database (VFDB) is an integrated and comprehensive online resource for curating information about virulence factors of bacterial pathogens (recently updated in 2019). The database contains information such as structure features of the virulence factors, functions and mechanisms used by the pathogens for circumventing host defense mechanisms and causing pathogenicity. Core dataset of DNA sequences was downloaded from VFDB website, which include genes associated with experimentally verified Virulence Factors only. BLAST database was build based on the downloaded dataset from VFDB and BLASTN was used.

Ab-initio Tools

PILERCR

CRISPR are family of DNA sequences found in the genomes of prokaryotic organisms- bacteria and archaea. They are derived from DNA fragments of viruses that had previously infected the prokaryote and provides protection from viruses and plays a major role in antiviral defense system. PILERCR identifies CRISPR repeats by using BLAST to find their fragmented/ degraded copies. A CRISPR array is found when it fulfills the criteria of having a set of CRISPR repeats with intervening unique sequences known as spacers. This program provides fast identification and classification of CRISPR genes and also has both high sensitivity and high specificity.

Final Pipeline

Results

CARD-RGI

$ rgi -i <input_file> -o <output_file>

VFDB

$ makeblastdb -in <input_db> -parse_seqids -blastdb_version 5 -dbtype nucl -out <name_db>
$ blastn -db <name_db> -query <input_file> -out <output_file> -max_hsps 1 -max_target_seqs 1 -num_threads 4 -evalue 1e-5

PILERCR

$ ./pilercr -in <input_file> -out <output_file>

References

Barrangou R. The roles of CRISPR-Cas systems in adaptive immunity and beyond. Curr Opin Immunol. 2015;32:36–41. doi:10.1016/j.coi.2014.12.008

Edgar, Robert C. "PILER-CR: fast and accurate identification of CRISPR repeats." BMC bioinformatics 8.1 (2007): 18.

Alcock, Brian P., et al. "CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database." Nucleic acids research 48.D1 (2020): D517-D525.

Liu, Bo, et al. "VFDB 2019: a comparative pathogenomic platform with an interactive web interface." Nucleic acids research 47.D1 (2019): D687-D692.