Team I Webserver Group

From Compgenomics 2020
Jump to navigation Jump to search

Members: Devishi Kesar, Shuheng Gan, Winnie Zheng, Priya Narayanan, Aaron Pfennig

Introduction

Background

Objective

  • Provide a comprehensive, automated platform to analyze E.coli isolates in order to predict virulence factors and outbreak cluster
  • Functionalities of the webserver:
    • Identify virulence factors/microbial resistance and outbreak response for provided isolates
    • Allow data upload at each step of outline pipeline
    • Visualize findings in a comprehensible way
  • Design
    • Intuitive usage
    • Provide only essential options

WebServer

  • Structure
  • Access to Webserver

Here is Link to access our webserver:

Functionalities

Genome Assembly

  • Performs de-novo assembly with FastQ files as input
  • Runs following tools by default:
    • fastp: read pre-processing
    • Unicycler: Genome assembly
  • Options:
    • Perform read preprocessing
    • Kmer-size
    • Spades as alternative assembly method
  • The input FastQ files must be paired-end reads
  • Outputs as FASTA file
  • Visualisation: Quast output
  • For more details to visit: Team1_Genome_Assembly

Gene Prediction

  • Gene finding in assembled isolates or provided FASTA fileTakes FastQ files as input
  • Runs following tools by default:
    • CDS: Prodigal
    • tRNA: Aragorn
    • rRNA: barrnap
  • Options:
    • GeneMarkS-2 as alternative tool for CDS predictions
    • tRNAscan-SE as alternative tool for tRNA predictions
    • RNAmmer as alternative tool for rRNA predictions
  • Outputs as *.gff file, *_cds.fna file, *_protein.faa file and *_rna.fna file
  • For more details to visit: Team1_Gene_Prediction


Functional Annotation

  • Obtain functional information about predicted genes
  • Input: FASTA file
  • Cluster Tool: usearch
    • Output: centroid.fasta
  • Homology Tools:
    • General annotation: InterProScan, EggNOGmapper
    • Antibiotic resistance gene: DeepARG
  • Abinitio Tools:
    • Signal Peptides: SignalP 5.0
    • Transmembrane Proteins: TMHMM
    • CRISPR Sites: PilerCR
  • Output: *.tsv file
  • For more details to visit: Team1_Functional_Annotation


Comparative Genomics

  • Comparison of genomic features of input files to identify outbreak cluster
  • Input: FASTA file, prodigal training file(chewBBACA)
  • Tools used:
    • MUMmer 4.0
    • chewBBACA
    • kSNP 3.0
    • FigTree
  • Options:
    • Parsimony tree, maximum likelihood and neighbour joining trees as option for kSNP
    • k-mer size option for kSNP
  • Output: .tsv file(for chewBBACA, MUMmer), .png(kSNP)
  • Visualisation: Phylogenetic tree for identified SNP’s, phylogenetic tree for MLST, graph for epidemiological data visualisation

Method

Webserver Demo

  • Choice One: Running General Pipline
  • Choice Two: Running each step separately

Results

Reference