Team II Comparative Genomics Group

From Compgenomics 2020
Jump to navigation Jump to search

Team 2: Comparative Genomics

Team Members: Kara Keun Lee, Courtney Astore, Kristine Lacek, Ujani Hazra, Jayson Chao


Class Presentations

Introduction

What is Comparative Genomics?

Once genomes are fully assembled and annotated, outbreak analysis can begin via comparative genomics. Generally, metadata ascertained from gene prediction and annotation can be used to map the relatedness of multiple isolates. Combined with epidemiological data, a given outbreak can be mapped back to a particular source (patient zero), and tracked to determine which strains are outbreak isolates and which are sporadic cases. Furthermore, phenotypic features such as virulence, antibiotic resistence, and pathogenicity can be determined. Compilation of these data allow for recommendations to be made on behalf of human impact, treatment strategy, and management methods to address further spread.

Our Data

Our genomic data comes from 50 isolates of C. jejuni from an outbreak of foodbourne illnesses.


Our epidemiological data includes

Pipeline Overview

Objectives

  • Identify kinds of strains (outbreak vs. sporadic)​
  • Construct phylogeny demonstrating which isolates are related and which differ​
  • Determine source of outbreak​
  • Map virulence and antibiotic resistence features of outbreak isolates​
  • Compile recommendations for outbreak response and treatment

Overview of Techniques

When performing phylogenomics, there are many options by which one can classify similarities and differences across the genome. Our approach utilizes tools from three different techniques.

Hierarical Clustering

MLST

SNP-based

SNP stands for Single Nucleotide Polymorphism, meaning that certain alleles have two or three possibilities as to which base is at a given locus. As SNPs accumulate through de novo mutations and are passed down through generations, comparing a given isolate's SNPs to other isolates and a reference genome allow ascertainment of phylogenetic distance between samples(1). Tools have been developed to compare bases position by position (SNP-calling) and create matrices to compute relatedness between samples based on common SNPs.

Generalized Algorithm Overview:

  • Pre-processing and read cleaning
  • Mapping
  • SNP calling against reference genome
  • Phylogeny generation based on SNP profiles

Tools to be tested:

  • kSNP3.0 (2)
  • Lyve-SET (3)

Outbreak Analysis Results

Source of Outbreak

Human Impact

Treatment Strategy

CDC Recommendations

Works Cited

​ 1.https://cba.anu.edu.au/news-events/snps-population-and-phylo-genomics​

2.https://academic.oup.com/bioinformatics/article/31/17/2877/183216​

3.https://github.com/lskatz/lyve-SET.Katz et al. (2017) A comparative analysis of the Lyve-SET phylogenomics pipeline for genomic epidemiology for foodborne pathogens. Frontiers in Microbiology 8: 375.​