Team I Webserver Group

From Compgenomics 2020
Jump to navigation Jump to search

Members: Devishi Kesar, Shuheng Gan, Winnie Zheng, Priya Narayanan, Aaron Pfennig

Introduction

Background

The primary purpose of our team is to develop a pipeline to analyze the unassembled Escherichia coli(E.Coli) sequence from 50 raw datasets in order to predict the pathogenicity and the closest related strain. In order to reach this goal, we utilize different computational genomics tools, including Genome Assembly, Gene Prediction, Functional Annotation, and Comparative Genomics. Therefore, we develop a web-server which can not only complete our main goal for the specific sequence reads but also allow more general sequences to be handled faster and more comfortable (not separate but generalize those tools in one pipeline). In other words, this web-server is for analyzing different sequences in order to predict the pathogenicity and visualize the closest related strain under a more convenient, faster, and accurate method.

Objective

  • Provide a comprehensive, automated platform to analyze E.coli isolates in order to predict virulence factors and outbreak cluster
  • Functionalities of the webserver:
    • Identify virulence factors/microbial resistance and outbreak response for provided isolates
    • Allow data upload at each step of outline pipeline
    • Visualize findings in a comprehensible way
  • Design
    • Intuitive usage
    • Provide only essential options

WebServer

  • Structure
    • In order to build a functional web-server, we need to construct front end and back end separately.
      • Front End: Everything involved with what the user sees(Web-browser).
      • Back End: How the site works, updates, and changes.

  • Access to Webserver

Here is Link to access our webserver: [link need on here]

Functionalities

Genome Assembly

  • Performs de-novo assembly with FastQ files as input
  • Runs following tools by default:
    • fastp: read pre-processing
    • Unicycler: Genome assembly
  • Options:
    • Perform read preprocessing
    • Kmer-size
    • Spades as alternative assembly method
  • The input FastQ files must be paired-end reads
  • Outputs as FASTA file
  • Visualisation: Quast output
  • For more details to visit: Team1_Genome_Assembly

Gene Prediction

  • Gene finding in assembled isolates or provided FASTA fileTakes FastQ files as input
  • Runs following tools by default:
    • CDS: Prodigal
    • tRNA: Aragorn
    • rRNA: barrnap
  • Options:
    • GeneMarkS-2 as alternative tool for CDS predictions
    • tRNAscan-SE as alternative tool for tRNA predictions
    • RNAmmer as alternative tool for rRNA predictions
  • Outputs as *.gff file, *_cds.fna file, *_protein.faa file and *_rna.fna file
  • For more details to visit: Team1_Gene_Prediction

Functional Annotation

  • Obtain functional information about predicted genes
  • Input: FASTA file
  • Cluster Tool: usearch
    • Output: centroid.fasta
  • Homology Tools:
    • General annotation: InterProScan, EggNOGmapper
    • Antibiotic resistance gene: DeepARG
  • Abinitio Tools:
    • Signal Peptides: SignalP 5.0
    • Transmembrane Proteins: TMHMM
    • CRISPR Sites: PilerCR
  • Output: *.tsv file
  • For more details to visit: Team1_Functional_Annotation

Comparative Genomics

  • Comparison of genomic features of input files to identify outbreak cluster
  • Input: FASTA file, prodigal training file(chewBBACA)
  • Tools used:
    • MUMmer 4.0
    • chewBBACA
    • kSNP 3.0
    • FigTree
  • Options:
    • Parsimony tree, maximum likelihood and neighbour joining trees as option for kSNP
    • k-mer size option for kSNP
  • Output: .tsv file(for chewBBACA, MUMmer), .png(kSNP)
  • Visualisation: Phylogenetic tree for identified SNP’s, phylogenetic tree for MLST, graph for epidemiological data visualisation
  • For more details to visit: Team1_Comparative_Genomics

Method

How to build web server

Webserver Demo

  • Choice One: Running General Pipeline
    • 1. Upload file:
      • Click Analyze and choose General Pipeline.
      • Upload a compressed folder or metadata or here(Be careful about the file type).
      • Enter your email want to receive the final result image and datasets.
  • Choice Two: Running each step separately

Results

Reference