OUP user menu

The structures of Escherichia coli O-polysaccharide antigens

Roland Stenutz, Andrej Weintraub, Göran Widmalm
DOI: http://dx.doi.org/10.1111/j.1574-6976.2006.00016.x 382-403 First published online: 1 May 2006


Escherichia coli is usually a non-pathogenic member of the human colonic flora. However, certain strains have acquired virulence factors and may cause a variety of infections in humans and in animals. There are three clinical syndromes caused by E. coli: (i) sepsis/meningitis; (ii) urinary tract infection and (iii) diarrhoea. Furthermore the E. coli causing diarrhoea is divided into different ‘pathotypes’ depending on the type of disease, i.e. (i) enterotoxigenic; (ii) enteropathogenic; (iii) enteroinvasive; (iv) enterohaemorrhagic; (v) enteroaggregative and (vi) diffusely adherent. The serotyping of E. coli based on the somatic (O), flagellar (H) and capsular polysaccharide antigens (K) is used in epidemiology. The different antigens may be unique for a particular serogroup or antigenic determinants may be shared, resulting in cross-reactions with other serogroups of E. coli or even with other members of the family Enterobacteriacea. To establish the uniqueness of a particular serogroup or to identify the presence of common epitopes, a database of the structures of O-antigenic polysaccharides has been created. The E. coli database (ECODAB) contains structures, nuclear magnetic resonance chemical shifts and to some extent cross-reactivity relationships. All fields are searchable. A ranking is produced based on similarity, which facilitates rapid identification of strains that are difficult to serotype (if known) based on classical agglutinating methods. In addition, results pertinent to the biosynthesis of the repeating units of O-antigens are discussed. The ECODAB is accessible to the scientific community at http://www.casper.organ.su.se/ECODAB/.

  • Enterobacteriacea
  • serotype
  • O-antigen
  • structure
  • NMR
  • database


Escherichia coli is the type species of the genus Escherichia that contains mostly motile Gram-negative bacilli that fall within the family Enterobacteriaceae. It is the predominant facultative anaerobe of the human colonic flora. The organism typically colonizes the infant gastro-intestinal tract within hours after birth, and E. coli and the host derive mutual benefit for the rest of the host's life (Kaper, 2004). However, several E. coli clones have acquired specific virulence factors which increase their ability to adapt to new niches and allow them to cause a broad spectrum of diseases. Three general clinical syndromes can result from infection with pathogenic E. coli strains: enteric/diarrhoeal disease; urinary tract infection; and sepsis/meningitis (Nataro & Kaper, 1998). As long as these bacteria do not acquire genetic elements encoding for virulence factors, they remain benign commensals. Strains that acquire bacteriophage or plasmid DNA encoding enterotoxins or invasion factors become virulent. Among the E. coli causing intestinal diseases, there are six well-described pathotypes: enteropathogenic E. coli (EPEC), enterotoxigenic E. coli (ETEC), enteroinvasive E. coli (EIEC), enterohaemorrhagic E. coli (EHEC), enteroaggregative E. coli (EAEC) and diffusely adherent E. coli (DAEC) (Nataro & Kaper, 1998). These pathotypes have virulence attributes that help bacteria to cause diseases by different mechanisms.

Enteric/diarrhoeal Escherichia coli

Enteropathogenic Escherichia coli (EPEC)

Enteropathogenic Escherichia coli was the first pathotype of Escherichia coli to be described. Large outbreaks of infant diarrhoea in UK led Bray, in 1945, to describe a group of serologically distinct E. coli strains that were isolated from children with diarrhoea but not from healthy children (Kaper, 2004). The hallmark of infections due to EPEC is the attaching-and-effacing histopathology, which can be observed in intestinal biopsy specimens from patients or infected animals (Nataro & Kaper, 1998). The most prevalent serogroups within this group of E. coli are: O18ac, O20, O25, O26, O44, O55, O86, O91, O111, O114, O119, O125ac, O126, O127, O128, O142 and O158 (Nataro & Kaper, 1998).

Enteropathogenic Escherichia coli infection is primarily a disease of infants younger than 2 years (Nataro & Kaper, 1998). EPEC primarily causes acute diarrhoea, although many cases of persistent EPEC diarrhoea have been reported (Nataro & Kaper, 1998; Scaletsky, 1996). In addition to watery diarrhoea, vomiting and low-grade fever are common symptoms of EPEC infection. EPEC plays a more important role in developing countries where it is the foremost cause of diarrhoea. Many case-control studies have found EPEC to be more frequently isolated from children with diarrhoea than from the controls. Studies in Brazil, Mexico, and South Africa have shown that 30–40% of infant diarrhoea can be attributed to EPEC (Robins-Browne, 1980; Cravioto, 1988, 1990; Gomes, 1989, 1991). Recently, the pathogenesis of EPEC has been reviewed from the historical point of view and although the pathotype has been described in the 1940s, the exact mechanism of the disease is not completely understood (Chen & Frankel, 2005).

Enterotoxigenic Escherichia coli (ETEC)

Enterotoxigenic Escherichia coli is a common cause of infectious diarrhoea (Black, 1993), especially in tropical climates, where uncontaminated water is not readily available. Most of the illnesses, in terms of both numbers of cases and severity of symptoms, occur in infants and young children after weaning. This pathogen may express heat-labile and/or heat-stable toxins. Heat-labiles are a class of enterotoxins that are closely related in structure and function to cholera enterotoxin, which is expressed by Vibrio cholerae O1 and O139 (Sixma, 1993). The genes encoding heat-labile and heat-stable toxins are carried on plasmids. ETEC colonizes the surface of the small bowel mucosa and elaborates enterotoxins, which give rise to intestinal secretion. Colonization is mediated by one or more proteinaceous fimbrial or fimbrillar adhesins termed colonization factor antigens (CFA) (Kaper, 2004). A single plasmid often carries a toxin and CFA, for example, heat-stable toxin and CFA/I (Reis, 1980; McConnell, 1981; Murray, 1983), heat-labile and heat-stable toxins and CFA/II (Penaranda, 1983; Smith, 1983), and heat-stable toxin and CFA/IV (Thomas, 1987). The clinical features of ETEC diarrhoea are consistent with the pathogenic mechanism of ETEC enterotoxins. ETEC diarrhoea may be mild, brief, and self-limiting or may be as severe as that seen in V. cholerae infection (Levine, 1977; Wolf, 1997). The percentage of ETEC in children with diarrhoea varies from 10% to 30% (Albert, 1992; Mangia, 1993; Hoque, 1994; Flores Abuxapqui, 1999). Several studies suggest that 20–60% of travellers from developed countries experience diarrhoea when visiting the areas where ETEC infection is endemic; 20–40% of the cases are due to ETEC (Black, 1990; Arduino & DuPont, 1993; DuPont & Ericsson, 1993). The most common ETEC serogroups are: O6, O8, O11, O15, O20, O25, O27, O78, O128, O148, O149, O159 and O173.

Enteroinvasive Escherichia coli (EIEC)

Enteroinvasive Escherichia coli is a pathogenic form of E. coli that can cause dysentery (Nataro & Kaper, 1998). EIEC strains are biochemically, genetically and pathogenically closely related to Shigella spp. The precise pathogenic scheme of EIEC has yet to be elucidated. However, pathogenesis studies of EIEC suggest that its pathogenic features are virtually identical to those of Shigella spp. (Goldberg & Sansonetti, 1993; Parsot & Sansonetti, 1996). Genes necessary for invasiveness are carried on a 120-MDa plasmid in Shigella sonnei and a 140-MDa plasmid in other Shigella species and in EIEC (Baudry, 1987; Small & Falkow, 1988; Sasakawa, 1992). EIEC penetrates the intestinal mucosa, predominantly that lining the large intestine, to cause inflammation and mucosal ulceration that are characteristic of bacillary dysentery.

The most severe manifestation of infection with Shigella spp. and EIEC is bacillary dysentery, a syndrome characterized by frequent small-volume stools with blood and mucus. The disease is responsible for a substantial proportion of acute diarrhoeal diseases worldwide. However, most persons infected with Shigella spp. or EIEC experience watery diarrhoea that may or may not be followed by dysentery (Snyder, 1984; Nataro, 1998; Taylor, 1988). In most cases, EIEC elicits watery diarrhoea that is indistinguishable from that caused by other E. coli pathotypes (Nataro, 1998). EIEC can cause outbreaks of gastroenteritis. In sporadic cases, EIEC may be misidentified as Shigella spp. or non-pathogenic E. coli strains. EIEC outbreaks are usually food-borne or waterborne (Nataro, 1998). The most common EIEC serogroups are: O28ac, O29, O112ac, O124, O136, O143, O144, O152, O159, O164 and O167.

Enterohaemorrhagic Escherichia coli (EHEC)

Enterohaemorrhagic Escherichia coli is an etiological agent of diarrhoea with life-threatening complications. EHEC belongs to a group of E. coli called VTEC (‘verotoxigenic E. coli’ or ‘Vero cytotoxin-producing E. coli’) or STEC (‘Shiga toxin-producing E. coli’), formerly SLTEC (‘Shiga-like toxin producing E. coli’). It is believed that this pathotype adheres to the colon and distal small intestine; however, typical lesions have not been demonstrated (Kehl, 2002). The best-characterized adherence phenotype is the intimate or attaching and effacing adherence mediated by the eaeA gene. STEC isolates that possess the eaeA gene are capable of producing diarrhoea. However, the pathological lesions associated with haemorrhagic colitis and haemorrhagic uremic syndrome are due to the action of Shiga toxin (Stx) with endothelial cells. The term ‘enterohaemorrhagic E. coli’ (EHEC) was originally coined to denote strains that cause haemorrhagic colitis and haemorrhagic uremic syndrome, express Stx, cause attaching-and-effacing lesions on epithelial cells, and possess a c. 60-MDa plasmid (Levine & Edelman, 1984; Levine, 1987). Thus, EHEC denotes a subset of STEC. Whereas not all STEC strains are believed to be pathogens, all EHEC strains by the above definition are considered to be pathogens. EHEC can cause nonbloody diarrhoea, bloody diarrhoea, and haemorrhagic uremic syndrome in all age groups, but the young and the elderly are the most susceptible. The most notorious E. coli serotype associated with EHEC is O157:H7, which has been the cause of several large outbreaks of disease in North America, Europe and Japan (Boyce, 1995; Grimm, 1995; Kaper, 1998; Ozeki, 2003; Ezawa, 2004). The most common EHEC serogroups are: O4, O5, O16, O26, O46, O48, O55, O91, O98, O111ab, O113, O117, O118, O119, O125, O126, O128, O145, O157 and O172. Recently, several new EHEC serogroups have been described: O176, O177, O178, O179, O180 and O181 (Scheutz, 2004). In addition, many of the EHEC serogroups are also identified as EPEC.

Enteroaggregative Escherichia coli (EAEC)

Enteroaggregative Escherichia coli is defined as E. coli that do not secrete heat-labile or heat-stable enterotoxins and adhere to HEp-2 cells in an aggregative pattern (Nataro & Kaper, 1998; Nataro, 1998). The basic strategy of EAEC seems to comprise colonization of the intestinal mucosa, probably predominantly that of the colon, followed by secretion of enterotoxins and cytotoxins (Nataro, 1998). Studies on human intestinal specimens indicate that EAEC induces mild, but significant, mucosal damage (Hicks, 1996). The clinical features of EAEC diarrhoea are increasingly well defined in outbreaks, sporadic cases and the volunteer model. A growing number of studies have supported the association of EAEC with diarrhoea in developing countries, most prominently in association with persistent diarrhoea (Bhan, 1989ac; Fang, 1995; Lima, 1992). Previous studies in children less than 5 years of age, all with diarrhoea or acute diarrhoea, have shown a significant difference in the EAEC prevalence compared to the controls (Nataro, 1987; Cravioto, 1991; Bhatnagar, 1993; Bouzari, 1994; Gonzalez, 1997). The increasing number of such reports and the rising proportion of diarrhoeal cases in which EAEC is implicated suggest that this pathotype is an important emerging agent of paediatric diarrhoea. The serogroups that have been identified within the EAEC group are O3, O7, O15, O44, O77, O86, O111, O126 and O127.

Diffusely adherent Escherichia coli (DAEC)

Diffusely adherent Escherichia coli is a category of E. coli that produces a diffuse adherence in the HEp-2 cell assay (Nataro, 1998). Little is known about the pathogenesis of DAEC. A surface of fimbria that mediates diffuse adherence phenotype has been cloned and characterized (Bilge, 1993a, b, 1989; Kerneis, 1991). The gene encoding the fimbria can be found on either the bacterial chromosome or a plasmid. Few epidemiological and clinical studies have been carried out to be able to describe adequately the epidemiology and clinical aspect of diarrhoea caused by DAEC. In one study, the patients with DAEC had watery diarrhoea without blood and faecal leukocytes (Poitrineau, 1995). The association of DAEC with diarrhoea has been shown in some studies (Giron, 1991; Jallat, 1993; Levine, 1993) but not in others (Gunzburg, 1993; Germani, 1996; Scaletsky, 2002).

Urinary tract infections

Uropathogenic Escherichia coli (UPEC)

The urinary tract is among the most common sites of bacterial infection and Escherichia coli is by far the most common infecting agent at this site. The subset of E. coli that causes uncomplicated cystitis and acute pyelonephritis is distinct from the commensal E. coli strains that make up most of the E. coli populating the lower colon of humans. E. coli from a small number of O serogroups – O4, O6, O14, O22, O75 and O83 – cause 75% of these urinary tract infections. Furthermore, they have phenotypes that are epidemiologically associated with cystitis and acute pyelonephritis in the normal urinary tract. Clonal groups and epidemic strains that are associated with urinary tract infections have been identified (Phillips, 1988; Manges, 2001). Although many urinary tract infection isolates seem to be clonal, there is no single phenotypic profile that causes urinary tract infections. Specific adhesins, including P (Pap), type 1 and other fimbriae, seem to aid in colonization (Phillips, 1988; Nowicki, 1989; Johnson, 1991; Manges, 2001).


Meningitis/sepsis associated Escherichia coli (MNEC)

This Echerichia coli pathotype is the most common cause of Gram-negative neonatal meningitis, with a case fatality rate of 15–40% and severe neurological defects in many of the survivors (Unhanand, 1993; Dawson, 1999). A majority (80%) of the E. coli strains that cause meningitis possess the K1 capsular polysaccharide.

Other potential Escherichia coli pathotypes

Several other potential E. coli pathotypes have been described, but none of these is as well established as the pathotypes described above. Among the most intriguing of these potential pathogens are strains of E. coli that are associated with Crohn's disease and are known as adherent-invasive E. coli (Darfeuille-Michaud, 2002). An inflammatory process and necrosis of the intestinal epithelium are characteristics of necrotizing enterocolitis (NEC), an important cause of mortality and long-term morbidity in pre-term infants. Necrotoxic E. coli (NTEC) have been associated with disease in both humans and animals (De Rycke, 1999). The relationships among the NEC-associated strains, NTEC and strains associated with Crohn's disease have not yet been clearly established. A poorly characterized subset of E. coli infections outside the gastrointestinal or urinary tract is a group implicated in intra-abdominal infections, including abscesses, wounds, appendicitis and peritonitis.

Typing of Escherichia coli

There have been several available assays to identify different categories of diarrhoeagenic Escherichia coli. Isolation and identification of E. coli based on the biochemical properties are widely used in most microbiological laboratories as they do not require sophisticated equipment or complicated protocols. E. coli can be easily recovered from clinical samples on general or selective media at 37°C under aerobic conditions. E. coli are usually identified by biochemical reactions. In general, the different pathotypes cannot be identified based on biochemical criteria alone, as in most cases they are indistinguishable from non-pathogenic E. coli.

In addition to the biochemical tests, serology is commonly used. It is based on Kauffmann's scheme for the serologic classification of E. coli, which is extensively reviewed in (Orskov & Orskov, 1984; Ewing, 1986). Serotyping E. coli is performed on the basis of their O (somatic), H (flagellar), and K (capsular) surface antigen profile. More than 180 O, 60 H, and 80 K antigens have been proposed (Whitfield & Roberts, 1999; Robins-Browne & Hartland, 2002). Each O antigen defines a serogroup. E. coli of specific serogroups can be associated with certain clinical syndromes (Nataro & Kaper, 1998; Campos, 2004). A specific combination of O and H antigens defines the ‘serotype’ of an isolate. One pathotype can comprise several serogroups and one serogroup may belong to several pathotypes and even to non-pathogenic E. coli (Nataro & Kaper, 1998; Campos, 2004). Due to the limited sensitivity and specificity, and the various combinations of antigens, serotyping is tedious and expensive and is performed reliably only by a small number of reference laboratories.

Among the most useful methods to diagnose different pathotypes of E. coli are phenotypic assays, which are based on the virulence characteristics. Of them, the HEp-2 adherence assay is useful to identify the adherence patterns of diarrhoeagenic E. coli. It remains the ‘gold standard’ for the diagnosis of EAEC and DAEC (Vial, 1990; Nataro, 1998; Donnenberg & Nataro, 1995). Identification of ETEC has relied on the detection of heat-labile and/or heat-stable enterotoxins. The classical phenotypic assay for EIEC identification is the Sereny (guinea pig keratoconjunctivitis) test, which correlates with the ability of the strain to invade epithelial cells and spread from cell to cell (Kopecko, 1994).

Molecular genetic methods remain the most popular and most reliable techniques for differentiating pathogenic strains from non-pathogenic members. The assays are based on nucleic acid probes and PCR and have been extensively used. The advantages of PCR include its high sensitivity in detection of target templates and both rapid and reliable results due to its high specificity (Schultsz, 1994; Ramotar, 1995; Stacy-Phipps, 1995; Kai, 2000; Dutta, 2001; Pulz, 2003; Gioffre, 2004).


Shigellae are Gram-negative, non-motile, facultative anaerobic rods. Shigella are differentiated from the closely related E. coli on the basis of pathogenicity, physiology (failure to ferment lactose or decarboxylate lysine) and serology (Samuel, 1996). The genus is divided into four species with multiple serotypes: Shigella dysenteriae (12 serotypes), Shigella flexneri (6 serotypes), Shigella boydii (18 serotypes) and S. sonnei (1 serotype) (Samuel, 1996). Shigella enterotoxin 1 (ShET1) is found in S. flexneri 2a, but it is only occasionally found in other serotypes. In contrast, ShET2 is more widespread and detectable in 80% of Shigella representing all four species. Shigella dysenteriae serotype 1 expresses Shiga toxin, an extremely potent, ricin-like cytotoxin that inhibits protein synthesis in susceptible mammalian cells. This toxin also has enterotoxic activity in rabbit ileal loops, but its role in human diarrhoea is unclear. Shiga toxin is associated with haemorrhagic uremic syndrome, a complication of infections with S. dysenteriae serotype 1. Closely related toxins are expressed by EHEC strains including the potentially lethal, food-borne O157:H7 serotype (Samuel, 1996).

The four Shigella species cause varying degrees of dysentery, characterized by fever, abdominal cramps and diarrhoea containing blood and mucous. Shigellosis is endemic in developing countries where sanitation is poor. In developed countries, single-source, food or water-borne outbreaks occur sporadically, and pockets of endemic shigellosis can be found in institutions and in remote areas with substandard sanitary facilities. Isolation and identification of Shigella spp. is usually based on culture, biochemical tests, and serotyping. Molecular methods can be used to determine some target genes.


Lipopolysaccharide (LPS), also known as endotoxin, is anchored in the outer membrane of the Gram-negative bacterium. It consists of three parts: lipid A, which is the toxic component; the core region, which can be divided into an inner and an outer part; and finally the O-antigen polysaccharide, which is specific for each serogroup (Fig. 1) (Brade, 1999). The sugar residues in lipid A and the core region are decorated to a varying extent with phosphate groups or phosphodiester-linked derivatives, which ensures microheterogeneity in each strain. The lipid A part is highly conserved in Escherichia coli. The core, however, contains five different basic structures, denoted R1 to R4 and K12. The O-polysaccharide is linked to a sugar in the outer core. The O-antigen usually consists of 10–25 repeating units containing two to seven sugar residues. Thus, the molecular mass of the LPS present in smooth strains will be up to ∼25 kDa.


Schematic structure of an enterobacterial lipopolysaccharide molecule. The lipids are depicted by curved lines and the sugar residues are as follows: GlcN (▪), Kdo (▾), heptose (▴), hexose (◆), and O-antigen components (•), most commonly hexose.

The present scheme of E. coli O-antigens comprises O1 to O181. The following O groups have been removed: O31, O47, O67, O72, O93, O94 and O122. The O93 strain, however, will probably be re-introduced (Scheutz, 2004). Escherichia coli strain 73-1 has been typed as E. coli O73:K−:H33 and strain 62D1 was suggested to belong to the genus Erwinia herbicola (Scheutz, 2004). In several cases the O-antigens of E. coli are identical or nearly identical to those of other bacteria (Table 1).

View this table:

Escherichia coli O-antigens identical or nearly identical to other bacterial polysaccharides

Structural determination of O-antigens from strains that are difficult to type or of nontyped strains

The serotyping of clinical isolates of Escherichia coli is under constant development and usually it is possible to identify the isolated strains. In some cases, however, it is not possible properly to characterize the strain with available monospecific polyclonal antisera, either due to auto agglutination or because the isolated E. coli strain is novel and appropriate antisera have not been raised. Under such circumstances it is of great interest to have a procedure that rapidly could indicate, independently of immunological tests, the serotype of the isolated strain.

Since immunochemical tests require cultivation of the strain, we obtain sufficient material for analysis by other methods.

Nuclear magnetic resonance spectroscopy is a powerful tool that is used for studying biomolecules, including bacterial polysaccharides. In structural studies of these polysaccharides, NMR signals from the polymers may be observed from live bacteria preparations or of the extracted LPS. In the structural determination of the O-antigen polysaccharide part of an LPS, the O-polysaccharide is often released from the lipid A part by treatment with dilute acid and purified by gel permeation chromatography. These steps are laborious and should be omitted, in particular, if only typing of the strain is required.

As it is easy to perform the phenol-water extraction from the cultivated bacterial isolates to obtain LPS, we focus on a procedure that rapidly can identify the most probable O-antigenic formula from a crude LPS preparation. A 1H NMR spectrum of the LPS in D2O can be obtained in a few minutes. Such a spectrum contains a number of characteristic signals, even though most of them are not resolved (Fig. 2). To utilize the information contained in a 1H NMR spectrum, a database with the structures of the E. coli O-antigen polysaccharides was implemented. Each structure has published NMR data associated with it as well as described cross-reactivity when present. Links to the original publications are also provided in the web-based implementation. In the following approach we often enter sugar components of the O-polysaccharide as some or all can be determined in a few hours by chemical derivatization and analysis with gas-liquid chromatography/mass spectrometry (GLC-MS), high performance liquid chromatography (HPLC) or electrophoresis techniques from a hydrolysate of the polymer, where the choice of technique for practical reasons is the one used in each investigator's laboratory.


1H nuclear magnetic resonance spectrum of the lipopolysaccharide from Escherichia coli strain 97RN in D2O solution.

We will now exemplify the approach by analysis of E. coli isolates that were not possible to serotype. Two clinical isolates of E. coli from children with diarrhoea in León, Nicaragua, termed strains 97RN and 121RN, showed identical 1H NMR spectra (cf. Fig. 2) and contained glucose, galactose and glucosamine according to GLC analysis. These sugar components together with selected 1H NMR data were entered to the web-based search interface, which then selects a best fit to the records in the database. The results of this search gave a close match to the O-antigen structure of E. coli O21 (and E. coli strain 105). Further inspection and comparison of NMR data confirmed the identity between the strains. Thus, the procedure rapidly revealed the serogroup of these two strains and no further structural investigation was necessary.

Biosynthesis considerations

The biosynthesis of an LPS molecule and its transport to the outer membrane of Gram-negative bacteria depend on several complex events taking place at different locations in the bacterium (Raetz & Whitfield, 2002; Samuel & Reeves, 2003). For the synthesis of the O-chain part, two of the three reported pathways are present in Escherichia coli, namely, the Wzy-polymerase-dependent pathway present in most cases and typical for heteropolysaccharides and the ABC-transporter-dependent pathway, typical for homopolymers. Once the nucleotide sugars have been synthesized they can be incorporated into the growing O-chain. In the Wzy-dependent pathway a glycosyl-1-phosphoryl residue is transferred to an undecaprenyl phosphate acceptor to form an undecaprenyl-PP-sugar intermediate. Subsequent transfer of additional sugars to this acceptor results in an undecaprenyl-PP-oligosaccharide intermediate in which the sequence of sugars is related to the biological repeating unit to be formed in the O-chain. Translocation of this intermediate occurs from the cytoplasmic side of the membrane to the periplasmic side in a Wzx-dependent process. The Wzy-dependent polymerization of the O-antigen occurs at the reducing end of the nascent chain being formed, meaning that the O-chain on the undecaprenol-PP carrier is transferred to the most recently synthesized undecaprenol-PP-oligosaccharide. The extent of polymerization, i.e. the chain-length modality, is determined by the Wzz product. The action of the Wzy-polymerase from a linear undecaprenol-PP-oligosaccharide to produce a branched structure with a side-chain offers several possibilities just at this step to produce different structures with regard to anomeric configuration, linkage position and sugar residue.

The ABC-transporter pathway utilizes the β-d-GlcNAc-PP-undecaprenol entity as a primer for the chain elongation taking place on the cytoplasmic side of the membrane. In E. coli O9, a homopolymer of mannose, an adaptor (α-d-Man) is (1→3)-linked to the N-acetylglucosamine residue. Subsequent chain growth occurs by processive glycosyl transfer to the non-reducing terminus. In E. coli O8, also a homopolymer of mannose, the O-chain is terminated by a 3-O-methyl-d-Man residue. Although the sugars are added one by one, sodium dodecyl sulphate-polyacrylamide electrophoresis (SDS-PAGE) analysis of these LPS molecules reveal distributions of distinct bands. It is therefore reasonable to describe, also in this case, the repeating units of the O-chain in the context of biological repeating units. The undecaprenyl-linked polymer depends on Wzm and Wzt for transfer to the periplasmic face of the membrane. For both these pathways the O-chain-PP-undecaprenyl entity is ligated to the Lipid A-core acceptor and subsequently translated to the outer membrane.

In Shigella flexneri the O-antigens have different structures as a result of acquisition of genetic material from bacteriophages via transduction (Lerouge & Vanderleyden, 2001). The glucosyl residues, present as side-chains in the repeating unit, are proposed to be transferred to the growing O-antigen chain on the periplasmic side of the membrane. A similar pathway could be possible for some of the E. coli O-antigens, as indicated by their substituent sugars and their location within the repeating unit of the polymer (vide infra).

We also note that a gene has been identified for a glucosylphosphate transferase, which then is responsible for the formation of the phosphodiester-linked glycosyl residue within the repeating unit of the O-antigen of E. coli O172 (Guo, 2004). Thus, this finding indicates that the ‘phospho-sugar’ is transferred en bloc in the biosynthesis.

O-antigen repeating units: characteristics and statistics of the structures

In humans, only a handful of different sugar residues are utilized in most glycoconjugates such as glycolipids and glycoproteins (Varki, 1999). In bacteria, however, a large number of different sugars are found and the O-antigens of Escherichia coli contain a great variety of them (Table 2). In addition, a number of unusual sugars are found in these polymers (Scheme 1), including pentoses, deoxyhexoses, lactyl substituted hexoses, heptoses and nonuloses. The number of sugar residues in the O-antigen repeating unit ranges from two to seven and the topology of the repeats may be described as linear, branched or double branched. We have analysed the topology based on the number of sugar residues in the backbone (Table 3). By far, the most common topology contains four sugars in the backbone being linear or containing a single terminal residue in the side-chain. The 3- and 5-residue backbones are also common, whereas the 2- and 6-residue backbones are only present in a few cases.

View this table:

Abundance of glycosyl residues in Escherichia coli O-antigens

Scheme 1.

Unusual glycosyl residues in Escherichia coli O-antigens.

View this table:

Topology of the O-antigen repeating units

Each sugar residue is found in either the α- or the β-configuration at the anomeric centre. The common sugars (including ring form) of E. coli O-antigens, viz., d-Glcp, d-GlcpNAc, d-Galp, d-GalpNAc, d-Manp, and l-Rhap are all found with both anomeric configurations. Other sugars, e.g. l-FucpNAc or d-Quip4NAc, have hitherto only been found in one of the anomeric configurations, namely the α- or the β-configuration, respectively. Some of the sugars in the side-chains are present only as nonterminal residues, e.g. d-GlcpNAc, whereas others are only found at a terminal position in the biological repeating unit, e.g. Colp. The unusual groups are then highly accessible and consequently specific for that particular E. coli serogroup.

The O-antigens synthesized by the ABC-transporter-dependent pathway (see above) or herein tentatively assigned to that pathway are homopolymers or have only two sugar residues in the backbone of the repeating unit (Table 4). In 1994 it was shown that in E. coli O7 (Table 5), having a Wzy-dependent pathway, the repeating unit of the O-antigen had an N-acetylglucosamine residue at its reducing end (Alexander & Valvano, 1994) and the authors proposed that this pattern should also be found in other O-antigen structures. By arranging the E. coli O-antigen structures hitherto determined (Tables 5 and 6) with the d-GlcNAc residue at the reducing end one readily observes that this pattern is quite reasonable. In cases when d-GlcNAc is not present in the polymer, d-GalNAc takes its place, in agreement with the observation that WecA can transfer either of the N-acetylhexosamine sugars (Marolda, 2004). In several of the O-antigens both amino sugars are components of the repeating unit. In just two strains, d-FucNAc has been found and is expected to be the sugar at the reducing end of the repeating unit. As noted above, one of these, strain 62D1, was recently identified as a non-E. coli species. In all but a few cases it is possible to identify that the amino sugar at the reducing end is 3-substituted. The other cases being the O1A, O2 and possibly O149 antigens, where d-GlcNAc is 4-substituted by a β-l-Rhap residue, or in O83 and O136, where it is substituted by a β-d-Galp residue, i.e. the structural element is N-acetyl-lactosamine. These results are in good agreement with the few examples when the biological repeating unit has been determined by NMR spectroscopy, e.g. in semi-rough type of LPS containing only one repeating unit as for E. coli O6 (Grozdanov, 2002). The biological repeating unit has also been determined on medium-sized O-antigens with a degree of polymerization of ∼13 for E. coli O126 and ∼10 for E. coli O91 (Larsson, 2004; Lycknert & Widmalm, 2004). The three-substituted d-GlcNAc residue was present in these three O-polysaccharides at the reducing end of the repeating unit.

View this table:

O-antigens synthesised by the ABC-transporter-dependent pathway

View this table:

O-antigens synthesised by the polymerase-dependent pathway with four or less residues in the backbone

View this table:

O-antigens synthesized by the polymerase-dependent pathway with five or six residues in the backbone

Genetic analysis of E. coli O26 and O172 has revealed that the second sugar is added to the d-GlcNAc-PP-undecaprenol carrier by a UDP-l-FucNAc transferase to form an α-(1→3)-linkage (Guo, 2004; D'Souza, 2002). Analysis of the O-antigen structures hitherto determined indicates that in the serogroups O4, O25 and O172 the third sugar to be added is an α-(1→3)-linked glucosyl residue, i.e. the backbone, or part of it, has the following structure: →X)-α-d-Glc-(1→3)-α-l-FucNAc-(1→3)-β-d-GlcNAc(1→, where X represents different linkage positions. Further genetic similarities may be present, e.g. in O4:K52, →2)-α-l-Rha-(1→6)-α-d-Glc-(1→3)-α-l-FucNAc-(1→3)-β-d-GlcNAc(1→, and O26 (e.g. without the glucosyl residue), where the last sugar is an α-linked rhamnosyl residue, which is possibly also the case for O25. The latter strain carries an additional d-Glc residue that forms a substituted branch-point residue. In analogy to the hypothesis described above, close structural relationships are observed between, for example, O6, O17, O44, O58, O77, O78 and O88, having a Man-Man-GlcNAc sequence at the reducing end. Although in E. coli the numbering of serogroups is chronological, at least with newly described strains, with the most recent ones covering O174-O181 (Scheutz, 2004), subgroups are present in some cases based on cross-reactivity, e.g. in O1, O18 and most recently in O5, (Urbina, 2005) similar to the Danish serotyping scheme for Streptococcus pneumoniae capsular polysaccharides, which is based on cross-reactivity, in contrast to the American system for which up to almost 100 different CPS serotypes have been described (Tomasz, 2000). In the future one may also type E. coli based on genetic resemblance between the strains which then should explain both structural and cross-reactivity relationships. Furthermore, other structural similarities such as those of blood-group determinants are present for the O86, O90, O127 and O128 O-antigens, and these strains presumably utilize the concept of molecular mimicry, thereby evading the immune system of the human host (Moran, 1996).

In some of the E. coli strains the O-antigen structures contain terminal glucosyl or N-acetylglycosamine residues, e.g. in O23A, O139 and O142, as side-chains. Whether these residues are added by a phage-induced glycosyl transferase machinery or by another mechanism is of great interest for future genetic studies as the positioning of the side-chain onto the structure differs and sometimes leads to a doubly branched residue, e.g. in O141. In many cases the repeating unit is formed and structurally determined by the polymerization process, e.g. in O55 and O164, often occurring at the penultimate sugar residue of the linear undecaprenol-PP-oligosaccharide leading to a single sugar residue in the side-chain, e.g. in O35, O113, O152, O159 and O167.

The O-antigen of Shigella boydii type 13 was recently both structurally and genetically characterized (Feng, 2004b). Although this strain is more distantly related to E. coli and other Shigella species, its O-antigen shows a quite close structural resemblance to that of E. coli O172. The linear pentasaccharide of S. boydii type 13 has the following chemical structure: →3)-α-l-QuipNAc-(1→4)-α-d-Glcp-(1→P-4)-α-d-GlcpNAc-(1→3)-α-l-QuipNAc-(1→3)-α-d-GlcpNAc-(1→, in which the 4-linked GlcNAc residue is 6-O-acetylated to ∼ 15%. Based on biosynthetic considerations where an N-acetylglycosamine residue should be present at the reducing end, two possibilities were suggested for the biological repeating unit, i.e. the one above or the frame-shifted one with the 4-linked GlcNAc residue at the reducing end. From the structural results presented in this article we propose that the biological repeating unit of the O-antigen from S. boydii type 13 has a 3-linked GlcNAc residue at the reducing end as presented in the above structure. In both the E. coli O172 and the S. boydii type 13 O-antigens the glucosyl-1-phosphoryl residue is the penultimate one (as presented) in the assembled linear undecaprenol-PP-oligosaccharide. The O-antigens of E. coli O152 and O173 have branched structures with one sugar residue as the side-chain and, most notably, the branching sugar is a glycosyl-1-phosphoryl residue being the penultimate one, suggesting similar biosynthetic pathways. Future detailed investigations will clarify the whole assembly of these E. coli O-antigen units and that of S. boydii type 13.

Concluding remarks

Members of the species Escherichia coli range from completely harmless to life-threatening microorganisms. The differences are based on particular virulence factors that certain strains may have acquired. These factors may be toxins or surface structures that enable the bacterium to adhere to mammalian cells or to evade the immune system. The typing of E. coli is often based on detection of different surface molecules using specific antibodies. The O-antigen present in the lipopolysaccharide is one of the molecules used in serotyping. As of today, more than 180 different O-serotypes have been described but not even half of them have been structurally elucidated.

The E. coli database implemented facilitates a more rapid identification of strains that are difficult to type or suggests similarities to previously determined O-antigens in the case of novel isolates. Analysis of the O-antigen structures revealed that 3-linked N-acetylglucosamine or N-acetylgalactosamine residues should be present at the reducing end of the biological repeating unit, in accordance with NMR spectroscopy and genetic data. In a limited number of cases, a 4-linked N-acetylglucosamine residue is instead observed. The topology with four sugar residues in the backbone of the O-antigen is present in half of the hitherto determined structures. Future structural studies should be combined with genetic analysis of the O-antigen cluster to facilitate insight into structural patterns and biosynthetic pathways. As part of this effort, amino acid sequences of flippases and polymerases are being added as elements of the entries in the database.


This review covers structures reported up to early 2005. In addition, recently reported E. coli O-antigen structures are those of serogroups O178 (Ali, 2005) and O145 (Feng, 2005). Noteworthy is also the fact that phenotypically rough E. coli K12 have been genetically complemented to produce its O-antigen, an O16 variant with cross-reactivity to O17 (Liu & Reeves, 1994; Stevenson, 1994).


This work was supported by grants from the Swedish Research Council and from the Swedish Agency for Research Cooperation with Developing Countries SIDA/SAREC.


  • Editor: Simon Cutting


View Abstract