Modules for Escherichia species#
--preset escherichia
These modules will be deployed if the enterobacterales__species module confirms the input assembly as a member of the Escherichia genus.
E. coli MLST#
-m escherichia__mlst_achtman, escherichia__mlst_pasteur
Genomes identified as belonging to the Escherichia genus are subjected to MLST using Achtman 7-locus schemes.
The Achtman scheme is hosted on EnteroBase.
We also provide an option for users to run MLST using Pasteur scheme by running:
-m escherichia__mlst_pasteur
The Pasteur scheme is described in the Escherichia coli Database maintained by the Pasteur Institute. For more information and references, see BIGSdb.
The genes included in each scheme are noted in the Outputs table below.
A copy of the MLST alleles and ST definitions used in each module is stored in the /data directory of the module.
E. coli MLST parameters#
--escherichia_mlst_achtman_min_identity
Minimum alignment percent identity for Escherichia-Achtman MLST (default: 90.0)
--escherichia_mlst_achtman_min_coverage
Minimum alignment percent coverage for Escherichia-Achtman MLST (default: 80.0)
--escherichia_mlst_achtman_required_exact_matches
Minimum number of exact allele matches required to call an ST (default: 3).
--escherichia_mlst_pasteur_min_identity
Minimum alignment percent identity for Escherichia-Pasteur MLST (default: 90.0)
--escherichia_mlst_pasteur_min_coverage
Minimum alignment percent coverage for Escherichia-Pasteur MLST (default: 80.0)
--escherichia_mlst_pasteur_required_exact_matches
Minimum number of exact allele matches required to call an ST (default: 4).
E. coli MLST outputs#
Output of the Achtman E. coli MLST module include the following columns:
|
Sequence type |
|
Allele numbers for the Achtman scheme loci. |
Output of the Pasteur E. coli MLST module includes the following columns:
|
Sequence type. |
|
Allele numbers for the Pasteur scheme loci. |
Notes#
Kleborate attempts to report the closest matching ST if a precise match is not found.
Imprecise allele matches are indicated with a
*.Imprecise ST calls are indicated with
-nLV, where n indicates the number of loci that differ from the ST reported. For example,258-1LVindicates a single-locus variant (SLV) of ST258, i.e. 6/7 loci match ST258.
E. coli Pathotyping#
-m escherichia__pathovar
Escherichia coli is broadly divided into 2 groups: intestinal diarrheagenic E. coli (DEC), and extra-intestinal E. coli (ExPEC) see paper. DEC encompasses several clinically relevant pathotypes: enteropathogenic E. coli (EPEC), enterotoxigenic E. coli (ETEC), enterohaemorrhagic E. coli (EHEC), Shiga toxin-producing E. coli (STEC), enteroaggregative E. coli (EAEC), enteroinvasive E. coli (EIEC), and diffusely adherent E. coli (DAEC) see paper. Additionally, Shigella is considered a DEC pathotype due to its genetic and pathogenetic similarity to EIEC.
The majority of DEC pathotypes are defined by specific virulence markers. However, for EAEC, DAEC and AIEC, the pathogenic role of proposed markers is not well established.
Virulence markers of diarrheagenic E. coli#
Pathotype |
Defining marker |
Virulence determinants |
Location of determinants |
PCR Diagnostic targets |
Other diagnostic targets |
|---|---|---|---|---|---|
EPEC |
LEE pathogenicity island |
LEE pathogenicity island |
Pathogenicity island |
|
|
EIEC/Shigella |
pINV |
pINV |
Plasmid |
|
Other |
ETEC |
ST or LT |
ST or LTnPlus colonisation factors |
Plasmid; transposon |
|
- |
EHEC |
Shiga toxin |
Stx1 and/or Stx2 |
Prophages |
|
|
EAEC |
pAA; aggregative adhesion |
Not known |
Plasmid |
|
- |
DAEC |
Afa/ Dr adhesins |
Not known |
Not known |
|
- |
AIEC |
Adherent-invasive phenotype |
Not known |
Not known |
none |
- |
How it works#
This module classifies E. coli genomes into DEC pathotypes based on the presence or absence of virulence marker genes using a curated database VirulenceFinder DB. Input assemblies are aligned to the database using Minimap2, and Kleborate assigns pathotypes based on logic adapted from EnteroBase.
Additionally, Kleborate distinguishes Shigella species based on the serotype-specific O-antigen biosynthetic gene cluster. The module aligns input genomes against a curated reference sequence derived from the Shigella serotyping pipeline, shigatyper using Minimap2.
All reference sequences and marker definitions used by this module are included in the /data directory of this module.
E. coli Pathovar parameters#
--escherichia__pathovar_min_identity
Minimum alignment percent identity for pathotype (default: 90.0).
--escherichia__pathovar_min_coverage
Minimum alignment percent coverage for pathotype (default: 80.0).
E. coli Pathovar outputs#
|
Predicted pathotype |
|
Virulence markers |
Typing the LEE pathogenicity island of E. coli#
-m escherichia__mlst_lee
Locus of enterocyte effacement (LEE) is a ~40 kb chromosomal pathogenicity island composed of 41 core genes organized into five operons Elliot et al., 1998. It encodes an (i) outer membrane adhesive protein, known as intimin protein that encodes eae gene (ii) type III secretion system (T3SS), and (iii) translocated receptor (Tir) as well as translocons, chaperones, regulators and secreted effector proteins that are linked to virulence.
Kleborate includes a module for subtyping of the LEE pathogenicity island. Details of the LEE subtypes and lineages can be found in this Nature Microbiology paper.
The LEE typing database is based on analysis of >250 LEE-containing E. coli genomes and includes 7 loci (eae (intimin), tir, espA, espB, espD, espH, espZ). The data is provided as a MLST-style database, in which combinations of alleles are assigned to a LEE subtype, to facilitate a common nomenclature for LEE subtypes. Each sequence in the database represents a cluster of closely related alleles that have been assigned to the same locus type. The LEE scheme includes three distinct lineages: Lineage 1 consists of LEE subtypes 1-2; Lineage 2 consists of LEE subtypes 3-8; Lineage 3 consists of LEE subtypes 9-30.
The reference sequences and MLST-style profile definitions are included in the /data directory of this module.
Parameters#
--escherichia__mlst_LEE_min_identity
Minimum alignment percent identity for escherichia__mlst_LEE. Default: 90.0
--escherichia__mlst_LEE_min_coverage
Minimum alignment percent coverage for escherichia_mlst_LEE. Default: 80.0
escherichia__mlst_LEE_mlst_required_exact_matches
Minimum number of exact allele matches required to assign an ST. Default: 3
E. coli LEE MLST outputs#
The output of the E. coli LEE MLST module includes the following columns:
|
Assigned LEE sequence type. |
|
Lineage associated with the LEE ST. |
|
Allele numbers for each LEE locus. |
Additional Notes#
Kleborate attempts to report the closest matching ST if an exact match is not found.
Imprecise allele matches are indicated with a
*.Imprecise ST calls are indicated with
-nLV, where n indicates the number of loci that disagree with the ST reported. For example,ST10-3LVindicates a three-locus variant (SLV) of ST10 (i.e. 4/7 loci match ST10).
Stxtyper#
-m escherichia__stxtyper
Shiga toxins (Stxs) are key virulence factors of Stx-producing Escherichia coli (STEC). They are also found in Shigella dysenteriae 1. Stxs belong to the AB-type toxin family and are divided into two antigenically distinct groups: Stx1 and Stx2. Each group contains several variants/subtypes—six for Stx1 (a, b, c, d, e, f) and seven for Stx2 (a, b, c, d, e, f, and g) [Yano et al., 2023, Melton-Celsa, 2014]. These toxins are encoded by lysogenic bacteriophages (Stx phage) and STEC strains may produce either single Stx subtype or a combination of subtypes.
This module will run StxTyper to determine the stx type. See the StxTyper documentation for more details of how it works.
StxTyper Outputs#
StxTyper results are output in the following columns:
Column Name |
Description |
|---|---|
|
The Shiga toxin type. If the operon is complete, the subtype will be reported (e.g., |
|
Status the operon detected. Possible values:
|
|
Percent identity for both A and B subunits. |
|
Start position of the alignment. |
|
End position of the alignment. |
|
Strand orientation of the target sequence. |
|
Closest reference protein for the A subunit. |
|
Percent identity to the reference for the A subunit. |
|
Subtype assigned to the reference sequence for the A subunit. |
|
Percentage of the A subunit reference sequence covered by the alignment. |
|
Closest reference protein for the B subunit. |
|
Subtype assigned to the reference sequence for the B subunit. |
|
Percent identity to the reference for the B subunit. |
|
Percentage of the B subunit reference sequence covered by the alignment. |
E. coli O:H serotyping#
-m escherichia__ectyper
E. coli serotypes are defined by combinations of O (lipopolysaccharide) and H (flagellar) antigens. Currently there are ~183 O-groups and 53 H-types that have been defined serologically Ørskov and Ørskov 1984.
O-antigen#
The O-antigen is an integral component of the Lipopolysaccharide (LPS) found in the outer membrane of the bacteria. LPS comprises three components: lipid A, a core oligosaccharide, and the O-specific polysaccharide chain (O antigen). The O-antigen domain exhibits significant variability consisting of 10 to 25 repeating oligosaccharide units, with each unit containing two to seven sugar residues Liu et al., 2020. The genes responsible for synthesis of O-antigens are usually present as a gene cluster and are located between the two chromosomal housekeeping genes galF and gnd/ugd Iguchi et al 2014. Major pathways involved in the assembly, synthesis and transport of O-antigen include, the Wzy pathway the Wzx/Wzy-dependent pathway, encoded by the wzx (O-antigen flippase) and wzy (O-antigen polymerase) genes, and the ABC transporter pathway, encoded by wzm and wzt. These genes are ideal biomarkers for predicting O antigen types.
H antigens#
H antigens (flagellar) are surface proteins composed of repeated molecules of the protein flagellin, which facilitate bacterial motility. These antigens are numbered from H1 to H56 (H13, H22, and H50 are not used) and are distinct from the O and K antigens. Flagellin is encoded by the fliC gene on the chromosomal locus or its homologues (non-fliC flagellin-coding genes such as flkA, fllA, and flmA). Of the 53 well known H antigen types, 44 are conferred by expression of the fliC gene, the remaining 9 H types are encoded by non-fliC flagellin genes. Specifically H3, H35, H36, H47,and H53 are encoded by flkA, H44 and H55 by fllA, H54 by flmA, and H17 by flnA.
Kleborate uses ECTyper for in silico serotyping. See ECTyper paper. for more details
Outputs#
Outputs of the ECTyper module is the following columns:
|
Predicted O antigen. |
|
Predicted H antigen. |
|
Combined prediction of O and H antigens. |
|
Quality control values summarising the overall confidence of the serotype prediction. |
|
Total number of alleles used to call both O and H antigens. |
|
ECTyper gene scores for O and H antigens, ranging from 0 to 1. |
|
Best-matching allele keys from the ECTyper database used for serotype assignment. |
|
Percent identity values of the query alleles. |
|
Percent coverage values for the query alleles. |
|
Gene lengths ( in base pairs) of the query alleles. |
|
Additional messages related to QC status or other issues affecting serotype prediction. |
ClermonTyping#
-m escherichia__ezclermont
The Escherichia genus comprises several clades, including Escherichia albertii, E. fergusonii, five cryptic Escherichia clades (I–V) and E. coli sensu stricto. Within E. coli, strains can be further divided into seven main phylogroups: A, B1, B2, C, D, E and F.
Kleborate assigns genomes to these phylogroups and clades using EzClermont tool, which is based on in vitro PCR assay logic.
Parameters#
--escherichia__ezclermont_min_length
Minimum contig length to consider. Default: 500
Outputs#
|
Assigned phylogroup or clade. |
|
Presence or absence pattern of PCR products. |
Escherichia AMR#
-m escherichia__amr
This module screens input genomes for acquired antimicrobial resistance genes and known resistance-associated point mutations using the AMRFinderPlus tool . Identified determinants are grouped by drug class.
AMR parameters#
--organism
Used to screen for point mutations in species-specific resistance markers.
-t , --threads
Number of threads to use for alignment.
AMR outputs#
Results of the Escherichia AMR module are grouped by drug class:
|
Aminoglycoside resistance genes. |
|
Fluoroquinolone resistance genes. |
|
Fosfomycin resistance genes. |
|
Sulfonamide resistance genes. |
|
Tetracycline resistance genes. |
|
Glycopeptide resistance genes. |
|
Colistin resistance genes. |
|
Phenicol resistance genes. |
|
Macrolide resistance genes. |
|
Rifampin resistance genes. |
|
Trimethoprim resistance genes. |
|
Beta-lactamase genes. |
|
Carbapenemase genes. |
|
Third-generation Cephalosporin resistance genes. |
|
Methicillin resistance genes. |
|
Resistance genes in other antimicrobial categories. |