Modules for Escherichia species#

--preset escherichia

These modules will be deployed if the enterobacterales__species module confirms the input assembly as a member of the Escherichia genus.

E. coli MLST#

-m escherichia__mlst_achtman, escherichia__mlst_pasteur

Genomes identified as belonging to the Escherichia genus are subjected to MLST using Achtman 7-locus schemes.

The Achtman scheme is hosted on EnteroBase.

We also provide an option for users to run MLST using Pasteur scheme by running: -m escherichia__mlst_pasteur

The Pasteur scheme is described in the Escherichia coli Database maintained by the Pasteur Institute. For more information and references, see BIGSdb.

The genes included in each scheme are noted in the Outputs table below.

A copy of the MLST alleles and ST definitions used in each module is stored in the /data directory of the module.

E. coli MLST parameters#

--escherichia_mlst_achtman_min_identity

Minimum alignment percent identity for Escherichia-Achtman MLST (default: 90.0)

--escherichia_mlst_achtman_min_coverage

Minimum alignment percent coverage for Escherichia-Achtman MLST (default: 80.0)

--escherichia_mlst_achtman_required_exact_matches

Minimum number of exact allele matches required to call an ST (default: 3).

--escherichia_mlst_pasteur_min_identity

Minimum alignment percent identity for Escherichia-Pasteur MLST (default: 90.0)

--escherichia_mlst_pasteur_min_coverage

Minimum alignment percent coverage for Escherichia-Pasteur MLST (default: 80.0)

--escherichia_mlst_pasteur_required_exact_matches

Minimum number of exact allele matches required to call an ST (default: 4).

E. coli MLST outputs#

Output of the Achtman E. coli MLST module include the following columns:

ST

Sequence type

adk, fumC, gyrB, icd, mdh, purA, recA

Allele numbers for the Achtman scheme loci.

Output of the Pasteur E. coli MLST module includes the following columns:

ST

Sequence type.

dinB, icdA, pabB, polB, putP, trpA, trpB, uidA

Allele numbers for the Pasteur scheme loci.

Notes#

  • Kleborate attempts to report the closest matching ST if a precise match is not found.

  • Imprecise allele matches are indicated with a *.

  • Imprecise ST calls are indicated with -nLV, where n indicates the number of loci that differ from the ST reported. For example, 258-1LV indicates a single-locus variant (SLV) of ST258, i.e. 6/7 loci match ST258.

E. coli Pathotyping#

-m escherichia__pathovar

Escherichia coli is broadly divided into 2 groups: intestinal diarrheagenic E. coli (DEC), and extra-intestinal E. coli (ExPEC) see paper. DEC encompasses several clinically relevant pathotypes: enteropathogenic E. coli (EPEC), enterotoxigenic E. coli (ETEC), enterohaemorrhagic E. coli (EHEC), Shiga toxin-producing E. coli (STEC), enteroaggregative E. coli (EAEC), enteroinvasive E. coli (EIEC), and diffusely adherent E. coli (DAEC) see paper. Additionally, Shigella is considered a DEC pathotype due to its genetic and pathogenetic similarity to EIEC.

The majority of DEC pathotypes are defined by specific virulence markers. However, for EAEC, DAEC and AIEC, the pathogenic role of proposed markers is not well established.

Virulence markers of diarrheagenic E. coli#

Pathotype

Defining marker

Virulence determinants

Location of determinants

PCR Diagnostic targets

Other diagnostic targets

EPEC

LEE pathogenicity island

LEE pathogenicity island

Pathogenicity island

eae

bfpA

EIEC/Shigella

pINV

pINV

Plasmid

ipaH

Other ipa genes

ETEC

ST or LT

ST or LTnPlus colonisation factors

Plasmid; transposon

elt, est

-

EHEC

Shiga toxin

Stx1 and/or Stx2

Prophages

stx1, stx2

eae, ehxA

EAEC

pAA; aggregative adhesion

Not known

Plasmid

aggR, aatA, aaiC

-

DAEC

Afa/ Dr adhesins

Not known

Not known

afa/Dr adhesins

-

AIEC

Adherent-invasive phenotype

Not known

Not known

none

-

How it works#

This module classifies E. coli genomes into DEC pathotypes based on the presence or absence of virulence marker genes using a curated database VirulenceFinder DB. Input assemblies are aligned to the database using Minimap2, and Kleborate assigns pathotypes based on logic adapted from EnteroBase.

Additionally, Kleborate distinguishes Shigella species based on the serotype-specific O-antigen biosynthetic gene cluster. The module aligns input genomes against a curated reference sequence derived from the Shigella serotyping pipeline, shigatyper using Minimap2.

All reference sequences and marker definitions used by this module are included in the /data directory of this module.

E. coli Pathovar parameters#

--escherichia__pathovar_min_identity

Minimum alignment percent identity for pathotype (default: 90.0).

--escherichia__pathovar_min_coverage

Minimum alignment percent coverage for pathotype (default: 80.0).

E. coli Pathovar outputs#

Pathotype

Predicted pathotype

Stx1, Stx2, ST, LT, eae, ipaH

Virulence markers

Typing the LEE pathogenicity island of E. coli#

-m escherichia__mlst_lee

Locus of enterocyte effacement (LEE) is a ~40 kb chromosomal pathogenicity island composed of 41 core genes organized into five operons Elliot et al., 1998. It encodes an (i) outer membrane adhesive protein, known as intimin protein that encodes eae gene (ii) type III secretion system (T3SS), and (iii) translocated receptor (Tir) as well as translocons, chaperones, regulators and secreted effector proteins that are linked to virulence.

Kleborate includes a module for subtyping of the LEE pathogenicity island. Details of the LEE subtypes and lineages can be found in this Nature Microbiology paper.

The LEE typing database is based on analysis of >250 LEE-containing E. coli genomes and includes 7 loci (eae (intimin), tir, espA, espB, espD, espH, espZ). The data is provided as a MLST-style database, in which combinations of alleles are assigned to a LEE subtype, to facilitate a common nomenclature for LEE subtypes. Each sequence in the database represents a cluster of closely related alleles that have been assigned to the same locus type. The LEE scheme includes three distinct lineages: Lineage 1 consists of LEE subtypes 1-2; Lineage 2 consists of LEE subtypes 3-8; Lineage 3 consists of LEE subtypes 9-30.

The reference sequences and MLST-style profile definitions are included in the /data directory of this module.

Parameters#

--escherichia__mlst_LEE_min_identity

Minimum alignment percent identity for escherichia__mlst_LEE. Default: 90.0

--escherichia__mlst_LEE_min_coverage

Minimum alignment percent coverage for escherichia_mlst_LEE. Default: 80.0

escherichia__mlst_LEE_mlst_required_exact_matches

Minimum number of exact allele matches required to assign an ST. Default: 3

E. coli LEE MLST outputs#

The output of the E. coli LEE MLST module includes the following columns:

LEE_ST

Assigned LEE sequence type.

LEE_lineage

Lineage associated with the LEE ST.

LEE_eae, LEE_tir, LEE_espA, LEE_espB, LEE_espD, LEE_espH, LEE_espZ

Allele numbers for each LEE locus.

Additional Notes#

  • Kleborate attempts to report the closest matching ST if an exact match is not found.

  • Imprecise allele matches are indicated with a *.

  • Imprecise ST calls are indicated with -nLV, where n indicates the number of loci that disagree with the ST reported. For example, ST10-3LV indicates a three-locus variant (SLV) of ST10 (i.e. 4/7 loci match ST10).

Stxtyper#

-m escherichia__stxtyper

Shiga toxins (Stxs) are key virulence factors of Stx-producing Escherichia coli (STEC). They are also found in Shigella dysenteriae 1. Stxs belong to the AB-type toxin family and are divided into two antigenically distinct groups: Stx1 and Stx2. Each group contains several variants/subtypes—six for Stx1 (a, b, c, d, e, f) and seven for Stx2 (a, b, c, d, e, f, and g) [Yano et al., 2023, Melton-Celsa, 2014]. These toxins are encoded by lysogenic bacteriophages (Stx phage) and STEC strains may produce either single Stx subtype or a combination of subtypes.

This module will run StxTyper to determine the stx type. See the StxTyper documentation for more details of how it works.

StxTyper Outputs#

StxTyper results are output in the following columns:

Column Name

Description

Stx_type

The Shiga toxin type. If the operon is complete, the subtype will be reported (e.g., stx1a). If the operon is incomplete or ambiguous, a broader designation is used: stx1, stx2, or simply stx if the algorithm cannot resolve at further.

operon

Status the operon detected. Possible values: COMPLETE – Full operon found. PARTIAL – Operon incomplete. PARTIAL_CONTIG_END – Partial operon likely truncated at contig boundary. EXTENDED – Coding sequence extends beyond the reference stop codon for one or both subunits. INTERNAL_STOP – A subunit contains a nonsense mutation. FRAMESHIFT – Indel detected in coding sequence. AMBIGUOUS – Ambiguous base(s) found in the sequence. COMPLETE_NOVEL – Full-length operon that cannot be typed.

identity

Percent identity for both A and B subunits.

target_start

Start position of the alignment.

target_stop

End position of the alignment.

target_strand

Strand orientation of the target sequence.

A_reference

Closest reference protein for the A subunit.

A_identity

Percent identity to the reference for the A subunit.

A_reference_subtype

Subtype assigned to the reference sequence for the A subunit.

A_coverage

Percentage of the A subunit reference sequence covered by the alignment.

B_reference

Closest reference protein for the B subunit.

B_reference_subtype

Subtype assigned to the reference sequence for the B subunit.

B_identity

Percent identity to the reference for the B subunit.

B_coverage

Percentage of the B subunit reference sequence covered by the alignment.

E. coli O:H serotyping#

-m escherichia__ectyper

E. coli serotypes are defined by combinations of O (lipopolysaccharide) and H (flagellar) antigens. Currently there are ~183 O-groups and 53 H-types that have been defined serologically Ørskov and Ørskov 1984.

O-antigen#

The O-antigen is an integral component of the Lipopolysaccharide (LPS) found in the outer membrane of the bacteria. LPS comprises three components: lipid A, a core oligosaccharide, and the O-specific polysaccharide chain (O antigen). The O-antigen domain exhibits significant variability consisting of 10 to 25 repeating oligosaccharide units, with each unit containing two to seven sugar residues Liu et al., 2020. The genes responsible for synthesis of O-antigens are usually present as a gene cluster and are located between the two chromosomal housekeeping genes galF and gnd/ugd Iguchi et al 2014. Major pathways involved in the assembly, synthesis and transport of O-antigen include, the Wzy pathway the Wzx/Wzy-dependent pathway, encoded by the wzx (O-antigen flippase) and wzy (O-antigen polymerase) genes, and the ABC transporter pathway, encoded by wzm and wzt. These genes are ideal biomarkers for predicting O antigen types.

H antigens#

H antigens (flagellar) are surface proteins composed of repeated molecules of the protein flagellin, which facilitate bacterial motility. These antigens are numbered from H1 to H56 (H13, H22, and H50 are not used) and are distinct from the O and K antigens. Flagellin is encoded by the fliC gene on the chromosomal locus or its homologues (non-fliC flagellin-coding genes such as flkA, fllA, and flmA). Of the 53 well known H antigen types, 44 are conferred by expression of the fliC gene, the remaining 9 H types are encoded by non-fliC flagellin genes. Specifically H3, H35, H36, H47,and H53 are encoded by flkA, H44 and H55 by fllA, H54 by flmA, and H17 by flnA.

Kleborate uses ECTyper for in silico serotyping. See ECTyper paper. for more details

Outputs#

Outputs of the ECTyper module is the following columns:

O-type

Predicted O antigen.

H-type

Predicted H antigen.

Serotype

Combined prediction of O and H antigens.

QC

Quality control values summarising the overall confidence of the serotype prediction.

Evidence

Total number of alleles used to call both O and H antigens.

GeneScores

ECTyper gene scores for O and H antigens, ranging from 0 to 1.

AllelesKeys

Best-matching allele keys from the ECTyper database used for serotype assignment.

GeneIdentities(%)

Percent identity values of the query alleles.

GeneCoverages(%)

Percent coverage values for the query alleles.

GeneLengths

Gene lengths ( in base pairs) of the query alleles.

Warnings

Additional messages related to QC status or other issues affecting serotype prediction.

ClermonTyping#

-m escherichia__ezclermont

The Escherichia genus comprises several clades, including Escherichia albertii, E. fergusonii, five cryptic Escherichia clades (I–V) and E. coli sensu stricto. Within E. coli, strains can be further divided into seven main phylogroups: A, B1, B2, C, D, E and F.

Kleborate assigns genomes to these phylogroups and clades using EzClermont tool, which is based on in vitro PCR assay logic.

Parameters#

--escherichia__ezclermont_min_length

Minimum contig length to consider. Default: 500

Outputs#

Clermont_type

Assigned phylogroup or clade.

Clermont_profile

Presence or absence pattern of PCR products.

Escherichia AMR#

-m escherichia__amr

This module screens input genomes for acquired antimicrobial resistance genes and known resistance-associated point mutations using the AMRFinderPlus tool . Identified determinants are grouped by drug class.

AMR parameters#

--organism

Used to screen for point mutations in species-specific resistance markers.

-t , --threads

Number of threads to use for alignment.

AMR outputs#

Results of the Escherichia AMR module are grouped by drug class:

Aminoglycoside

Aminoglycoside resistance genes.

Fluoroquinolone

Fluoroquinolone resistance genes.

Fosfomycin

Fosfomycin resistance genes.

Sulfonamide

Sulfonamide resistance genes.

Tetracycline

Tetracycline resistance genes.

Glycopeptide

Glycopeptide resistance genes.

Colistin

Colistin resistance genes.

Phenicol

Phenicol resistance genes.

Macrolide

Macrolide resistance genes.

Rifamycin

Rifampin resistance genes.

Trimethoprim

Trimethoprim resistance genes.

BetaLactam

Beta-lactamase genes.

Carbapenem

Carbapenemase genes.

Cephalosporin

Third-generation Cephalosporin resistance genes.

Methicillin

Methicillin resistance genes.

Other Classes

Resistance genes in other antimicrobial categories.