Team I Webserver Group

Members: Devishi Kesar, Shuheng Gan, Winnie Zheng, Priya Narayanan, Aaron Pfennig

Introduction

Here is our final presentation for Webserver: File:Web-server presentation.pdf

Background

The primary purpose of our team is to develop a pipeline to analyze the unassembled Escherichia coli(E.Coli) sequence from 50 raw datasets in order to predict the pathogenicity and the closest related strain. In order to reach this goal, we utilize different computational genomics tools, including Genome Assembly, Gene Prediction, Functional Annotation, and Comparative Genomics. Therefore, we develop a web-server which can not only complete our main goal for the specific sequence reads but also allow more general sequences to be handled faster and more comfortable (not separate but generalize those tools in one pipeline). In other words, this web-server is for analyzing different sequences in order to predict the pathogenicity and visualize the closest related strain under a more convenient, faster, and accurate method.

Objective

Provide a comprehensive, automated platform to analyze E.coli isolates in order to predict virulence factors and outbreak cluster
Functionalities of the webserver:
- Identify virulence factors/microbial resistance and outbreak response for provided isolates
- Allow data upload at each step of outline pipeline
- Visualize findings in a comprehensible way
Design
- Intuitive usage
- Provide only essential options

WebServer

Structure
- In order to build a functional web-server, we need to construct front end and back end separately.
  - Front End: Everything involved with what the user sees(Web-browser).
  - Back End: How the site works, updates, and changes.

Access to Webserver

Here is Link to access our webserver: echo-Team1_webserver

Functionalities

Genome Assembly

Performs de-novo assembly with FastQ files as input
Runs following tools by default:
- fastp: read pre-processing
- Unicycler: Genome assembly
Options:
- Perform read preprocessing
- Kmer-size
- Spades as alternative assembly method
The input FastQ files must be paired-end reads
Outputs as FASTA file
Visualisation: Quast output
For more details to visit: Team1_Genome_Assembly

Gene Prediction

Gene finding in assembled isolates or provided FASTA fileTakes FastQ files as input
Runs following tools by default:
- CDS: Prodigal
- tRNA: Aragorn
- rRNA: barrnap
Options:
- GeneMarkS-2 as alternative tool for CDS predictions
- tRNAscan-SE as alternative tool for tRNA predictions
- RNAmmer as alternative tool for rRNA predictions
Outputs as *.gff file, *_cds.fna file, *_protein.faa file and *_rna.fna file
For more details to visit: Team1_Gene_Prediction

Functional Annotation

Obtain functional information about predicted genes
Input: FASTA file
Cluster Tool: usearch
- Output: centroid.fasta
Homology Tools:
- General annotation: InterProScan, EggNOGmapper
- Antibiotic resistance gene: DeepARG
Abinitio Tools:
- Signal Peptides: SignalP 5.0
- Transmembrane Proteins: TMHMM
- CRISPR Sites: PilerCR
Output: *.tsv file
For more details to visit: Team1_Functional_Annotation

Comparative Genomics

Comparison of genomic features of input files to identify outbreak cluster
Input: FASTA file, prodigal training file(chewBBACA)
Tools used:
- MUMmer 4.0
- chewBBACA
- kSNP 3.0
- FigTree
Options:
- Parsimony tree, maximum likelihood and neighbour joining trees as option for kSNP
- k-mer size option for kSNP
Output: .tsv file(for chewBBACA, MUMmer), .pdf(kSNP)
Visualisation: Phylogenetic tree for identified SNP’s, phylogenetic tree for MLST, graph for epidemiological data visualisation
For more details to visit: Team1_Comparative_Genomics

Method

WebServer Building
- We build Apache webserver through installing apache and mod_wsgi
  - Apache Apache_HTTP_Server: Apache is a free and open-source cross-platform web-server software, which contains a different type of modules.
  - Mod_wsgi Mod_wsgi: Among those modules, we choose mod_wsgi provides a WSGI compliant interface in order to host python based(both python 2 and 3) web applications under Apache.

Data visualization
- We generate the final results through using basic CSS and html

Webserver Demo

Choice One: Running General Pipeline

The user could process their raw genome sequence data in one click for four distinct stages of analysis and interpretation.

Click Analyze and choose General Pipeline.
Upload a compressed folder or metadata on here(Be careful about the file type, metadata is optional but not complementary for compressed folder).
Enter your email want to receive the final result image and datasets.
Click "RUN", the user will gain the notification from his or her email

Choice Two: Running each step separately

The user could process their raw genome sequence data in four distinct stages of analysis and interpretation separately which allow them to choose any tools they prefer or gain the results of each stage separately.

Click Analyze and choose any tools user want to use

Upload a compressed folder or metadata on here(Be careful about the file type, metadata is optional but not complementary for compressed folder).
Make a proper choice for each option base on users requirement
Enter the email where the user wants to receive the result image and datasets for each tool.
Click "RUN", the user will gain the notification from his or her email

Results

We have analyzed 50 E.coli isolates, including epi data, with respect to a foodborn outbreak. The data has been analyzed using this web server and the final results are presented below. We analyzed the data by performing de-novo genome assembly, gene prediction, functional annotation and comparative genomics. We have paid special attention to the virulence factors, possible food sources and the outbreak location.

First, we have chosen one of our isolates as references and have determined the Average Nucleotide Identity of all other isolates with respect to our selected reference. Three isolates have a relatively low ANI of approximately 84%. Three other isolates have an ANI between 97-98% signifying some differences in regions of the genome with respect to the reference. All other 44 isolates are closely related to the reference genome with an ANI of approximately 99%. To determine the ANI MUMMER-4.0 has been used which has low resolution and does not discriminate more details about differences between highly similar genomes. The plot below shows the ANIs with respect to the reference genome:

Subsequently, we have performed MLST analysis using chewBBACA to create a schema and do allele calling on the assembled genomes of the 50 isolates. Thereby, cluster outbreak isolates have been identified. These preliminary results combined with epidemiological data allowed to narrow down outbreak locations. The results were visualized using Grapetree and are shown below:

In our strain analysis we incorporated functional annotation results. The results supported the results from the MLST analysis yielding hints on three food sources: melons, chorizo and bananas which are all served at certain brunch places. Furthermore, we asked the question whether these clear strains, possessed of clear genetic relatedness, are treatable in a similar fashion. Therefore, these strains have been analyzed using deepARG and (fortunately) they are identical on this basis and vulnerable to phenicol and sulfonamides. The insights gained from the MLST analysis, strain analysis and epidemiological data is depicted below:

In addition SNP analysis has been performed using kSNP 3.0. The optimal k-mer size has been determined using Kchooser which yields, as a nice feature, the fraction of kmers that are present in all genomes. The FCK value is a measure of sequence diversity and hence a measure of relatedness. The lower the FCK, the more diverse and hence the more distantly related. The FCK is 0.422. Studies have shown when FCK is ≥ 0.1 SNP detection efficiency is adequate, and the accuracy of parsimony trees estimated by kSNP3 is > 97%; i.e. the trees can be considered to be reliable. The tree is shown below:

Reference

1. Maiden MC, Jansen van Rensburg MJ, Bray JE, et al. MLST revisited: the gene-by-gene approach to bacterial genomics. Nat Rev Microbiol. 2013;11(10):728-36.

2. Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, et al. (2018) MUMmer4: A fast and versatile genome alignment system. PLOS Computational Biology 14(1): e1005944. https://doi.org/10.1371/journal.pcbi.1005944

3. Perez-Losada M, Arenas M, Castro-Nallar E. Microbial sequence typing in the genomic era. Infection, Genetics and Evolution. 2018;63:346-359. http://dx.doi.org/10.1016/j.meegid.2017.09.022

4. Strockbine N, Bopp C, Fields P, Kaper J, Nataro J. 2015. Escherichia, Shigella, and Salmonella, p 685-713. In Jorgensen J, Pfaller M, Carroll K, Funke G, Landry M, Richter S, Warnock D (ed), Manual of Clinical Microbiology, Eleventh Edition. ASM Press, Washington, DC. doi: 10.1128/9781555817381.ch37

5. Sultan, I., Rahman, S., Jan, A. T., Siddiqui, M. T., Mondal, A. H., & Haq, Q. M. R. (2018). Antibiotics, Resistome and Resistance Mechanisms: A Bacterial Perspective. Frontiers in Microbiology, 9(2066). doi:10.3389/fmicb.2018.02066

6. Trees E, Rota P, Maccannell D, Gerner-smidt P.. Molecular Epidemiology, p 131-159. In Jorgensen J, Pfaller M, Carroll K, Funke G, Landry M, Richter S, Warnock D (ed), Manual of Clinical Microbiology, Eleventh Edition. ASM Press, Washington, DC. 2015. doi: 10.1128/9781555817381.ch10

7. Silva M, Machado M, Silva D, Rossi M, Moran-Gilad J, Santos S, Ramirez M, Carriço J. 15/03/2018. M Gen 4(3): doi:10.1099/mgen.0.000166

8. Z Zhou, NF Alikhan, MJ Sergeant, N Luhmann, C Vaz, AP Francisco, JA Carrico, M Achtman (2018) "GrapeTree: Visualization of core genomic relationships among 100,000 bacterial pathogens", Genome Res; doi: https://doi.org/10.1101/gr.232397.117

Team I Webserver Group

Contents

Introduction

Background

Objective

WebServer

Functionalities

Genome Assembly

Gene Prediction

Functional Annotation

Comparative Genomics

Method

Webserver Demo

Choice One: Running General Pipeline

Choice Two: Running each step separately

Results

Reference

Navigation menu

Team I Webserver Group

Introduction

Background

Objective

WebServer

Functionalities

Genome Assembly

Gene Prediction

Functional Annotation

Comparative Genomics

Method

Webserver Demo

Choice One: Running General Pipeline

Choice Two: Running each step separately

Results

Reference

Navigation menu

Search