OUP user menu

Lateral genetic transfer and the construction of genetic exchange communities

Elizabeth Skippington, Mark A. Ragan
DOI: http://dx.doi.org/10.1111/j.1574-6976.2010.00261.x 707-735 First published online: 1 September 2011


Lateral genetic transfer (LGT) is a major source of phenotypic innovation among bacteria. Determinants for antibiotic resistance and other adaptive traits can spread rapidly, particularly by conjugative plasmids, but also phages and natural transformation. Each successive step from the uptake of foreign DNA, its genetic recombination and regulatory integration, to its establishment in the host population presents differential barriers and opportunities. The emergence of successive multidrug-resistant strains of Staphylococcus aureus illustrates the ongoing role of LGT in the combinatorial assembly of pathogens. The dynamic interplay among hosts, vectors, DNA elements, combinations of genetic determinants and environments constructs communities of genetic exchange. These relations can be abstracted as a graph, within which an exchange community might correspond to a path, transitively closed set, clique or near-clique. We provide a set-based definition, and review the features of actual genetic exchange communities (GECs), adopting first a knowledge-driven approach based on literature, and then a synoptic data-centric bioinformatic approach. GECs are diverse, but share some common features.

  • antibiotic resistance
  • lateral genetic transfer
  • horizontal genetic transfer
  • genetic exchange communities


It has long been known that phenotypic features can be transmitted between unrelated strains of bacteria. This year marks the one-hundredth anniversary of Schmitt's report that the human paratyphoid bacillus can take on the agglutination properties of the calf paratyphoid bacillus during in vivo passage through a calf (Schmitt, 1911). Although this and other early experiments admit to other possible explanations (Gurney-Dixon, 1919), Griffith (1928) demonstrated that a nonvirulent strain of Streptococcus pneumoniae could be rendered virulent by a heat-stable substance from a virulent strain, and Avery et al. (1944) identified this transforming substance as DNA. The ability of bacteria to accept and express genetic material transmitted not only from parent to offspring (vertical transfer) but also from sources external to the cellular lineage (lateral or horizontal transfer) remains a cornerstone of experimental molecular genetics and biotechnology.

Lateral genetic transfer (LGT) is equally significant outside the laboratory. Since the mid-twentieth century, genetic determinants specifying resistance to successive antimicrobial drugs have been spreading in strongly selective environments, notably among pathogenic bacteria in hospitals, but also in the community and along the commercial food chain, with obvious implications for public health. The spread of antibiotic resistance has become the poster-child of LGT, as it is well documented clinically, relatively well understood at the molecular level, of undeniable societal concern in developing as well as developed countries and easily translated into headlines (superbugs).

Over the last decade, it has become increasingly apparent that LGT is widespread among bacteria and drives metabolic innovation well beyond the context of antibiotic resistance (Ochman et al., 2000; Woese et al., 2000; Jain et al., 2003; Nakamura et al., 2004). Strains within a species typically share a set of core genes, but can differ substantially in their inventories of variable genes, the presence or absence of which is due principally to LGT and gene loss (Lerat et al., 2005). Twenty sequenced strains of Escherichia coli, for example, share a common core of about 1976 orthologous genes, while each strain individually possesses, in addition, a further 2092–3403 genes from a collective pool of some 15 862 distinct genes comprising the variable set. Altogether, the gene repertoire of E. coli– the species pan-genome – totals some 17 838 genes (Touchon et al., 2009).

Although certain issues remain open (Ragan & Beiko, 2009), the succession of steps comprising a successful instance of LGT is now adequately known. Exogenous genetic material is presented to a cell as free DNA in the environment, as a plasmid or similar element or packaged in a phage. Once inside the cell, if the DNA survives cellular defence mechanisms and becomes established either via recombination into the main chromosome or on an extrachromosomal element, depending on the fortunes of its new host, it may become abundant in the host-cell population. Through mutations, its expression may be further regulated and tuned, and the encoded protein(s) better integrated into cellular networks. As a determinant is selected and spreads in a population, other traits linked to it – other resistance genes, for example, or virulence – become prevalent at the same time. Each of these steps throws up differential opportunities and differential barriers that, together with environmental heterogeneity and situational contingencies, dynamically construct the diverse microbial genetic-exchange communities around us.

In this review, we consider the role of LGT in the spread of antibiotic resistance. In the second section, we present a graph-theoretic framework that allows us to define genetic exchange communities (GECs) with flexibility, rigour and precision. In the third section, we consider how genetic material is transferred from one bacterial lineage to another, calling attention to the differential opportunities and barriers presented by the mechanisms of DNA transmission, recombination and system-level integration that together give rise to the diversity of GECs. Finally, in the fourth section, we survey some actual GECs, first adopting a knowledge-based approach and then introducing data-driven methods based on large-scale sequence comparison and bioinformatics.

What is a GEC?

Exchange communities

Although perhaps implicit across the literature on bacterial population structure, recombination and LGT, so far as we can determine, the term exchange community was first introduced in the context of lateral transfer by Jain et al. (2003), who defined it as ‘a collection of organisms that can share genes by (LGT), but need not be in physical proximity’. These investigators examined the association of eight types of factors (temperature, oxygen level, pH, salinity, pressure, genome size, G+C content, mode of carbon utilization) with LGT as indicated by the patterns of topological discordance among phylogenetic trees inferred for gene families represented in eight ecologically and phyletically diverse prokaryotes. All factors were found to associate with genetic exchange, with the ‘internal’ determinants genome size and G+C content most strongly associated, followed by mode of carbon utilization; external factors including pressure, pH, salinity and growth temperature were the least strongly associated. As the strongly associated factors are features of the microorganisms themselves and not of their habitats (and indeed can vary widely within actual microbial communities), Jain et al. (2003) concluded that exchange communities are not limited by physical proximity; instead, it is primarily internal parameters that ‘affect (LGT) and thereby delineate exchange community boundaries’. Jain and colleagues described these communities as potentially extending across major phyletic and ecological barriers at a complete ecosystem scale, some 1028 individuals.

Their formulation captures key concepts, notably that exchange communities can extend across location, habitat and taxon identities, and are actively constructed by LGT. It bears further consideration, however, whether we should restrict membership in exchange communities to cellular organisms only, and whether and how to integrate vertical inheritance. Further, as we demonstrate in the following section, the Jain and colleagues formulation is operationally ambiguous. We may need not a single definition, but rather a framework for thinking precisely and operationally about GECs, together with a series of settings appropriate for different problems.

GECs: conceptual framework and parameters

Many problems in genetics and molecular bioscience can be abstracted as graphs; trees (e.g. Darwin's tree of life) and networks are special cases of graphs. As discussed below (GECs: data-driven approach), vertical and LGT, and the description of exchange communities are well suited to graphical treatment. Graphs contain nodes (vertices) connected by edges (arcs). Here, nodes depict entities that carry and can potentially exchange genetic material, and edges represent the transmission of genetic material between them. If we could take a snapshot of a habitat (say, of a specific biofilm or rumen) and abstract its gene-sharing relationships as a graph, individual microorganisms might be the nodes and the most recent instances of LGT the edges.

This formulation immediately raises several issues. It is usually (although not always) impractical to observe microbial cells individually; more typically, we aggregate many millions of individuals from a strain or a clone for study (e.g. by genome sequencing), and our inferences about vertical and lateral transmission integrate over a series of temporal snapshots potentially corresponding to many millions of generations (Fig. 1). For example, we infer that a region of genetic material G has been transferred laterally between strains A and B if both contain a copy of G so similar in sequence that it could not have arisen by vertical descent from their common ancestor (even more, if G is borne on a phage or a plasmid, or has characteristics of a mobile element). If the two instances of G are 100% identical, G was almost certainly transferred between A and B (or from an unknown third entity C to both A and B) very recently, and if G is widely distributed among the close relatives of A, but not among the relatives of B, then G was likely transferred from A to B, not vice versa. On the other hand, if the two instances of G are only 80% identical, then either an ancestral version of G was transferred between the ancestors of A and B many generations ago, or alternatively, the donor was an unknown source not in our analysis. It is appropriate to abstract the former case (100% identity) as a graph with nodes labelled as present-day strains A and B connected by an edge representing the transfer of G, even though we may not be able to rule out the possibility that an unknown C lies along this edge. In the latter case (80% identity), neither present-day strain A nor B is the donor, and particularly where we cannot infer directionality of transfer, it is better to label the nodes not with the present-day strain, but more generally (e.g. by species).

Figure 1

Representing exchange communities. Cellular or genomic lineages (depicted here as tubes) evolve in time (here, left to right), with genetic regions continually gained and lost (for clarity, not shown here). The orthogonal planes represent successive temporal shapshots. Within each plane, the intersections with lineages represent biological entities as they existed at that point in time; thus, in the right-hand-most plane, A, B and C represent groups of bacteria, and P and V types of plasmids or phage, as they exist and are delineated today. A region of DNA 100% identical in sampled members of A, B and P and inferred to be of lateral origin is likely to have been transferred very recently, as represented by solid lines. A region 80% identical in A and B (and/or C) might be inferred to have been transferred at some point in the past (the dashed line in the middle plane) into or between A′ and B′, earlier biological groups that, especially due to LGT, may have been quite different from present-day A and B (and/or C) and should not (without cause) be labelled with present-day strain designations. Alternatively, a region of 80% identity in present-day A and B might have been transferred to both very recently from an unknown source not available to our analysis; the situation in C might favour one of these scenarios. In general, precision and density of sampling are helpful in reconstructing pathways of LGT and in delineating GECs.

Further complications arise because LGT itself makes it difficult to extrapolate genome contents back in time within bacterial lineages. Node labels typically imply membership in more-inclusive node-label classes, for example node E. coli MG1655 simultaneously belongs to Escherichia, Gammaproteobacteria and Bacteria, but given the temporal and strain-to-strain variation in gene content, it may be impossible to reconstruct the ancestral gene content and we may need to use the appropriate pan-genome (Medini et al., 2005). In contrast to cellular chromosomes, individual plasmids are temporally shallow and at some temporal depth merge indistinguishably into a pool of plasmid-borne genetic material. We may wish to annotate the nodes of our graph further, for example as an enterohaemorrhagic pathogen or a soil bacterium, terms that likewise may belong to a more inclusive class within an ontology. Hooper et al. (2009) caution that habitat annotations may be suspect, as microorganisms can have broader environments than we currently appreciate.

Delineation of nodes must be informed by a problem-specific perspective. It may matter (e.g. for clinical diagnosis) that antibiotic resistance is being transferred among identifiable bacterial strains or isolates, in which case nodes are labelled accordingly and interpreted to include the genetic material normally inherited vertically within those strains or isolates, i.e. the chromosomal genome with prophages, plus stable plasmids. We term this perspective cell-centric. Alternatively, the problem may concern the flow of genetic material through each type of vehicle (chromosomes, phages, plasmids), in which case nodes could represent specific vehicles even when these happen to be colocated within a cell. We discuss examples of this vehicle-centric perspective in GECs: data-driven approach. Finally, it may be possible to adopt a purely DNA-centric perspective without reference to residence in a particular cell or vehicle, although it is not immediately clear that smallish stretches of DNA can be said to exchange genetic material, and we suspect that labelling of nodes and/or edges would in practice largely reduce this to one of the other two perspectives.

Edges depict the transfer of genetic material between nodes. An edge must be supported by evidence, to which we can sometimes apply a statistical threshold and/or a quantitative estimate of confidence. Edges can be signed if the donor–recipient relationship is known and stable, or unsigned. Signed edges can be unidirectional (in direct exchange, one node always donates) or bidirectional (each node can both donate to, and accept from, the other). A signed edge is typically drawn as an arrow pointing from donor to recipient. Edges may be annotated by mechanism (e.g. conjugative transfer of plasmid) and/or by frequency of transfer. As GECs are constructed by differential frequencies of transfer, annotation by frequency is highly relevant and could support computational analysis and modelling. Finally, it is important to emphasize that, in general, there is little reason to imagine that intact genes are typically the units of transfer and recombination (see Recombination per se); for this reason, we refer to lateral genetic (not gene) transfer (Chan et al., 2009).

Graphs are analysed by carrying out operations that probe their structure, for example their connectedness and modularity. Of particular relevance here are paths or walks (series of nodes connected by edges, following any arrows), transitively closed sets (sets of nodes, each member of which is reachable from every other node in the set), cliques (sets of nodes, each of which shares an edge with every other member of that set) and near-cliques (Fig. 2). Any of these four structures could in principle define a GEC. We disfavour GEC-as-path because exchange implies both donating and receiving, while given current methods for probing LGT in natural communities, cliques and near-cliques may set too high an evidentiary standard. Thus, operationally, we recommend the following definition: a GEC is a set of entities, each of which has over time both donated genetic material to, and received genetic material from, every other entity in that GEC, via a path of lateral transfer.

Figure 2

Graph operations and structures for delineating genetic exchange communities.

Abstracting genetic-exchange relationships as a graph thus provides a framework for thinking about and delineating GECs. Each of the four structures we identify, and indeed others, can be defined rigorously, has well-understood properties and opens for us a rich algorithmic and computational toolkit. GECs as defined can overlap each other, and can be contained in larger GECs; depending on the question, it may be useful to work with only maximal GECs. Requiring each entity (over time) to have donated and accepted genetic material delineates GECs much more tightly than in the ecosystem-wide exchange communities of Jain et al. (2003). Here, we do not specifically limit the period of time during which this exchange must have taken place, although the quality of evidence tends to diminish as we look farther back in time. Against these advantages must be weighed risks, for example that graphical abstraction may flatten out the diversity of GECs within which different genes circulate (Gogarten & Townsend, 2005), in the same way that molecular-interaction network diagrams tend to lose touch with cell type- and tissue-specific interactions. Nonetheless, a graphical framework allows GECs to be identified, enumerated, analysed and perhaps situated within a more global map of LGT that might depict the complete spectrum of exchange relationships, from active mutual-exchange communities to the underlying gossamer of one-off transformations by environmental DNA.

Constructing GECs

Groups of bacteria traditionally recognized as species, except the most rigorously clonal, will fall within GECs, but GECs are potentially much larger than species. Definitions of biological species in general, and of prokaryotic species in particular, lie beyond the scope of this review. Models that emphasize descent with modification within clones and periodic selective sweeps (Levin, 1981) fail to account for genetic exchange and recombination within groups such as Helicobacter pylori, Neisseria meningitides or S. pneumoniae (Maynard Smith et al., 1993; Feil et al., 2003). On the other hand, so-called biological species concepts map uncomfortably into situations in which recombination is relatively infrequent, uncoordinated with replication or cell division, integrates genetic material from phyletically diverse sources (including plasmids, phages and environmental DNA) unpredictably into different regions of the chromosome and occasionally results in drastic modification of phenotype. Opinions differ on how much LGT might be required to render ideas of species (indeed the tree-like nature of descent and hierarchical classification systems in general) meaningless in the prokaryotic context (Doolittle, 1999, 2009; Kurland, 2000; Berg & Kurland, 2002; Gogarten et al., 2002), and whether and how some or all of these concepts might be rescued (Gevers et al., 2005; Cohan, 2006; Doolittle & Zhaxybayeva, 2009). Even among eukaryotes, reproductive isolation can be incomplete or contingent: evolution remains a work in progress. Bacteria that are members of a species by virtue of genetic recombination will for the same reason belong to a common GEC, but GECs are potentially, and often in reality, much broader than species.

LGT complicates species definitions, but constructs GECs. GECs are actively fashioned (and continually refashioned) by the complex ongoing interplay among habitats, donors, vectors, recipients, mechanisms, sequences, population structures and selection. In this sense, GECs are analogous to ecological niches: except perhaps in the broadest sense, niches do not exist a priori in the physical world, but are constructed dynamically by organisms through diverse physical, chemical and biological interactions with their environment and with each other. Microorganisms similarly construct GECs, in the process altering the genomes and physiologies of their interaction partners and reciprocally being altered by them (including the ability to differentially accommodate or resist LGT). The recombinant microorganisms may then alter their physical environment or spread to a new one.

In the following section, we review how each step in lateral transfer, recombination, integration and establishment of genetic material offers opportunities for the establishment of GECs involving some, but not other exchange partners, vectors and genes. In keeping with the theme of this issue, we focus on antibiotic (antimicrobial) resistance. For the purposes of this review, bacteria should be read to include archaea where indicated by context, although the latter are not recognized as human pathogens (Eckburg et al., 2003) and to our knowledge do not harbour antibiotic-resistance determinants. We refer to exchange where genetic material can (potentially) move in both directions. Lateral (LGT) and horizontal (HGT) genetic transfer are synonyms.

LGT and the construction of GECs

By the time we detect antibiotic resistance in a previously susceptible strain, the DNA conferring that resistance has come into contact with a host cell, entered its cytoplasm, evaded the host defences, become established typically by recombination into the host genome and integrated itself sufficiently into host systems to express one or more protein products, with the result that recombinant cells are now abundant in the population. We examine these steps in turn, focusing on the opportunities and barriers that can be differentially exploited to construct GECs.

Availability of DNA to the potential host cell

Pathogens and their nonpathogenic relatives often share a common habitat that encompasses DNA in the environment, agents of DNA mobilization and packaging, sites of concentration, and vectors such as insects that can intermediate between the microbial and the eukaryotic worlds. Brüssow (2009) and Norman et al. (2009) encourage us to include viruses, conjugative plasmids and other mobile genetic elements in our view of the living world. In this broader ‘eco-evo’ perspective, distinctions between pathogen and nonpathogen, and between factors of virulence and colonization, become less clear-cut (Pallen & Wren, 2007). Antibiotic-resistance genes occur naturally among some bacteria and fungi, quite apart from antibiotic production and use by humans (Allen et al., 2010). Humans encounter this broader microbial world at distinct interfaces, and from our perspective, describe regions of it as reservoirs, for example of antibiotic resistance.

DNA can be surprisingly abundant in the natural environment. Free DNA is released into soil, water and other natural environments by live bacteria (Lorenz & Wackernagel, 1994; Niemeyer & Gessler, 2002; Moscoso & Claverys, 2004) and plant root cells, pollen, dying bacteria and decomposing biomass (Levy-Booth et al., 2007; Nielsen et al., 2007), and can persist there for days to years (Vlassov et al., 2007), indeed under some circumstances (e.g. permafrost), very much longer. Similarly, DNA can persist in natural water bodies and deep-sea sediments (Vlassov et al., 2007), with some polymers exceeding 10 kb in size (Corinaldesi et al., 2005). The theoretical upper limit of persistence of PCR-able DNA has been estimated at 400–600k years (Willerslev et al., 2004). In these environments, it is available for natural transformation (Lorenz & Wackernagel, 1994; Paget & Simonet, 1994).

In humans, DNA enters the blood by active secretion from cells or from cell death. Blood plasma can contain DNA of high molecular weight (21–80 kb); although partly resistant to DNase degradation, it is reasonably quickly taken up by macrophages and cleared by the liver. Small plasmids may be more stable in the blood than large plasmids or chromosomal DNA (Rozenberg-Arska et al., 1984). DNA bound to proteins on the cell surface likewise shows molecular weights >20 kbp (Morozkin et al., 2004).

Viruses too are abundant and diverse in the natural environment (Brabban et al., 2005; Breitbart & Rohwer, 2005). Estimates point to 1030–1032 phages mediating >1016 gene transfer events per second (Brabban et al., 2005; Rohwer et al., 2009). Intact viruses and/or viral DNA can move rapidly among biomes including soil, sediments, freshwater and marine waters (Breitbart & Rohwer, 2004; Sano et al., 2004). Although a global picture of viral host specificity has not yet emerged, cross-infecting viruses may not be uncommon (Breitbart & Rohwer, 2005). The human gastrointestinal tract may host between 160 and 1200 distinct viral genome types (Chibani-Chennoufi et al., 2004). Lambdoid coliphages themselves are mosaics (Juhala et al., 2000), resulting from genetic exchange both within the lambdoid phage family and with other families, and involving not only homologous but also extensive nonhomologous recombination (Canchaya et al., 2003a, b; Brüssow, 2009).

Potential host cells also encounter other cells of the same or different species that represent potential LGT donors. Some bacteria, including members of Streptococcus, use quorum sensing of a competence-stimulating peptide to ensure that most LGT is within species. In Plasmids and conjugation, we consider some of the diverse genetic elements that, under certain conditions, can be mobilized for lateral transfer including plasmids, conjugative transposons, other integrative conjugative elements (ICEs) and other chromosomal regions. There is growing realization that commensal and environmental bacteria can serve as important reservoirs of antibiotic resistance (Boerlin & Reid-Smith, 2008), and in particular that ‘Gram-negative bacteria have access to the gene pool of Gram-positive cocci’ (Courvalin, 1994). Strains commensal in animals have been identified as reservoirs for the spread of antibiotic resistance to humans via integrons located on plasmids and mobilized by transposons in E. coli (Singh et al., 2005) and via phage transduction in Salmonella (Brabban et al., 2005). Similarly, the rhizosphere (Berg et al., 2005) and the environment more generally (Henriques Normark & Normark, 2002; Jang et al., 2008) can serve as a reservoir for opportunistic pathogens in humans.

Specialized environments that bring together potential donors, recipients, vectors and selective pressure favouring LGT are known as recombination hotspots. Examples include the digestive tracts of insects, the cytoplasm of amoebae, surfaces in soil or around bodies of water, leaf and root surfaces of plants, and within decomposing biomass (Davison et al., 1999; Crawford et al., 2005; Akhtar et al., 2009; Ragan & Beiko, 2009; Moliner et al., 2010), and for antibiotic resistance, the rumen of cattle, milk products, various foodstuffs, biofilms on food-processing equipment and the oral cavity (reviewed by Kelly et al., 2009b).

Biofilms are a particularly important type of recombination hotspot. In a biofilm, large numbers of bacteria are enclosed in a hydrated polymeric matrix composed of polysaccharides, proteins, double-stranded DNA (dsDNA) and single-stranded RNAs (ssRNAs). Bacteria release DNA into this matrix not only upon death but also actively via a type IV secretion system (Hall-Stoodley et al., 2004; Vlassov et al., 2007), and this helps to stabilize the biofilm (Whitchurch et al., 2002). The efficiency of gene transfer by bacterial conjugation is enhanced in biofilms; the E. coli F-conjugative plasmid encodes factors that induce biofilm development and, more generally, LGT enhances the stability of biofilm structure (Ghigo, 2001; Molin & Tolker-Nielsen, 2003). All these factors enhance biofilms as hotspots for LGT. The Staphylococcus aureus biofilm-associated protein Bap induces an alternative method of biofilm production and can be transferred laterally among Staphylococcus species (Tormo et al., 2005). Not only may antibiotic-resistant bacteria flourish inside, but the biofilm structure itself offers a measure of physical protection against antibiotics. Biofilms are ubiquitous in natural environments, on and in living bodies (e.g. dental plaque), and on artificial surfaces including contact lenses, sutures, catheters, prostheses, artificial heart valves and medical devices (Fux et al., 2005; Sørensen et al., 2005; Ong et al., 2009).

Individual humans too can be LGT hotspots. Salmonella enterica can transfer plasmid-borne antibiotic-resistance genes with high frequency inside human epithelial cells (Ferguson et al., 2002), and evidence is accumulating that the intestinal tract can be an LGT hotspot and reservoir for multidrug resistance (Salyers et al., 2004; Kelly et al., 2009b). In populations of the highly recombinogenic obligate human pathogen Neisseria gonorrhoeae, antibiotic resistance can spread at a high frequency via natural transformation, potentially from strain to strain in mixed infections (Ohnishi et al., 2010).

Entry of DNA into the host cytoplasm

DNA and transformation

Diverse bacteria are naturally competent (able to take up DNA from their environment), or can be stimulated to become so (Nielsen et al., 1998; de Vries & Wackernagel, 2005; Johnsborg et al., 2007), including clinical isolates of Campylobacter, Cardiobacterium, Haemophilus, Helicobacter, Legionella, Moraxella, Neisseria, Pseudomonas, Staphylococcus, Streptococcus and Vibrio (Lorenz & Wackernagel, 1994; Johnsborg et al., 2007). Competence-related genes are even more broadly distributed (Ambur et al., 2009). Because ssDNA enters the cytoplasm, plasmids must be reconstituted as dsDNA (Chen & Dubnau, 2004) by interaction with complementary sequence, for example on a preexisting copy of the same plasmid, another newly arrived copy or (in the case of multimeric plasmids) itself. As a consequence, natural transformation with plasmids tends to be inefficient and relatively unimportant as an entry mechanism leading to the lateral transfer of antibiotic-resistance genes.

Uptake of DNA is usually sequence-independent, although in some species (e.g. Haemophilus influenzae, N. gonorrhoeae), it involves the recognition of short nucleotide motifs that are interspersed along the chromosomal DNA. This mechanism not only biases uptake in favour of transformation by DNA of the same species, but the motifs moreover promote subsequent homologous recombination into the chromosome (Ambur et al., 2007).

Phage and transduction

Bacteriophages can serve as vectors for the lateral transmission of antibiotic-resistance genes among bacteria. The mechanism of phage transduction is well understood (Brabban et al., 2005): following infection of a host cell by a temperate phage, phage DNA integrates into that of the host at a specific point, or less specifically, depending on the type of phage. The lysogenic conversion that often results may render the host bacterium less susceptible to invasion by other phages. The integrated phage genome (prophage) is then transmitted vertically within the host lineage until the lytic cycle is induced, during which an adjacent region of the host genome is sometimes excised and packaged together with that of the phage (specialized transduction). More rarely, a nonadjacent region of host DNA is packaged and delivered to a new bacterial host (generalized transduction). Lytic viruses, which do not integrate into the host genome, can similarly be agents of generalized transduction.

Several lines of evidence point to a role for temperate phages in the assembly and spread of antibiotic resistance within Salmonella species (Brabban et al., 2005). Schmieger & Schicklmaier (1999) documented the transduction of ampicillin, chloramphenicol and tetracycline resistance among strains of S. typhimurium DT104. Genes specifying resistance to five drug classes are clustered in a genomic island (GI) that contains both phage- and plasmid-related genes (Cloeckaert & Schwarz, 2001; Hermans et al., 2006). Zhang & LeJeune (2008) demonstrated phage-mediated transfer of the extended-spectrum cephalosporin-resistance gene blaCYM−2 and tetracycline-resistance genes tet(A) and tet(B) from a multidrug-resistant Salmonella to an antibiotic-susceptible S. typhimurium. Inducible phages have been observed in 75% of antimicrobial-resistant Salmonella, compared with 53% of non-antimicrobial-resistant isolates (Zhang & LeJeune, 2008). Phages are likewise involved in the transduction of multidrug resistance in Pseudomonas aeruginosa (Blahova et al., 2001).

In enteric bacteria, virulence factors such as the Shiga-like dysentery toxins Stx 1 and Stx 2 are often encoded in prophages. Prophages can be abundant in a bacterial genome, with different strains containing different types and unique combinations. Temperate phages encoding Stx1 or Stx 2 have been isolated from the environment and show considerable variability in sequence, host range and infection characteristics (Brabban et al., 2005). Very recent lateral acquisitions into E. coli genomes (although not into Shigella) are highly enriched in phage-related genes (Touchon et al., 2009), reinforcing the picture of frequent, ongoing LGT. Frequent deletions and point mutations have reinforced the view that prophages are ‘dead’ genetic elements. However, most of the 18 prophages in the Sakai strain of E. coli O157 are spontaneously inducible, can excise themselves, replicate, be packaged and released; moreover, they can supply virion proteins for the assembly of other lambdoid phages, and even recombine among themselves (and likely with other lambdoid phages) to generate new transducing phages able to spread virulence determinants to other strains (Asadulghani et al., 2009). Stx 1 production in the enteric bacteria Citrobacter freundii and Enterobacter cloacae (Herold et al., 2004) suggests transduction beyond genus Escherichia.

Treatment with certain antibiotics can induce prophages to enter the lytic phase (Brabban et al., 2005). Thus, antibiotic use can not only act as a selective factor that favours resistant strains, but may increase the number of transduction events, thereby promoting LGT. This has been demonstrated for transfer of the S. aureus pathogenicity island (Úbeda et al., 2005; Maiques et al., 2006).

LGT is an important driver of evolution among viruses as well. Comparative studies, both within individual families of viral genes or proteins and at the genome scale, reveal limited vertical descent within a broader picture of molecular diversity, multiple origins and recombination (Brüssow, 2009). Viruses sharing a common habitat can undergo extensive recombination (Marston & Amrich, 2009). Where present, verticality may arise from geographical separation (Lee et al., 2007) or follow barriers (e.g. restriction-enzyme specificities) imposed by their host genera (Goerke et al., 2009). The canonical bacteriophages are considered to have narrow host specificities, but this may be less true of phages assembled from proteins of heterogeneous origin (Asadulghani et al., 2009). Tailed dsDNA phages and the corresponding prophages may all share, albeit nonuniformly, a common gene pool (Hendrix et al., 1999; Fraser et al., 2006).

Plasmids and conjugation

Plasmids encode a range of phenotypic traits and are important agents of LGT among bacteria (Frost et al., 2005; Thomas & Nielsen, 2005; Schlüter et al., 2007). Plasmids are the most common vectors of acquired resistance to antibiotics (Barlow, 2009) and indeed to many other factors. Some antibiotic-resistance determinants appear to have resided on plasmids for millions of years (Barlow & Hall, 2002), whereas others are mobilized from chromosomes perhaps at an increasing rate (Barlow et al., 2008). As a class of mobile genetic elements, plasmids are defined by three key features: the capacity to exist and replicate extrachromosomally, the ability to be transferred between distinct hosts and absence of genes essential to their hosts. Plasmids are highly diverse in size, structure, transmission, evolutionary histories and accessory phenotypes (Slater et al., 2008; Carattoli et al., 2009). This diversity is due in part to the successive layering of LGT events, resulting in size variation and deeply mosaic structures (Mellata et al., 2009). Plasmids carrying the newly recognized NDM-1 resistance to carbapenem, for example, range in size from 50 to 500 kb (Kumarasamy et al., 2010). Rather than providing a comprehensive review of plasmid diversity, here we focus on features of plasmids and their transfer that contribute to or constrain their movement within GECs.

Plasmids are typically mobilized by conjugation. While not all plasmids encode the functions essential for cell-to-cell DNA transfer, nonconjugative plasmids can be mobilized by coresident conjugative plasmids. Not only plasmids but also ICEs are transferred from donor to recipient cells by conjugation. Unlike conjugative plasmids, which can be maintained and replicate autonomously in their host, ICEs can be maintained only through integration into the host replicon (Wozniak & Waldor, 2010) and thus are not considered to be plasmids (Norman et al., 2009). We discuss ICEs in more detail in Mobile genetic elements and LGT. Partial or even entire chromosomes can also be transferred via conjugation if the interbacterial junction is stable for long enough – up to an hour or more (Thomas & Nielsen, 2005). Transfer of large chromosomal blocks via conjugation, driven by origins of transfer in mobile GIs, has been proposed for Streptococcus agalactiae strains (Brochet et al., 2008) and Clostridium difficile (He et al., 2010).

The remarkable ability of conjugation to mediate plasmid transfer between taxonomically and genetically unrelated bacterial hosts facilitates gene sharing within broad GECs (Ochman et al., 2000). Conjugation commonly crosses species and genus boundaries (Davison, 1999) and, as we discuss further in GECs: knowledge-driven approach, can extend across biological domains (Buchanan-Wollaston et al., 1987; Heinemann & Sprague, 1989). Because of the existence of general mechanisms such as exclusion, which constrain the conjugative transfer of plasmids, not all strains or species within a community are equally efficient as transfer donors. Some subpopulations of bacteria, including bacteria hosting plasmids bearing antibiotic-resistance genes, have high donor activity; these so-called amplifiers (Dionisio et al., 2002) can accelerate the spread of plasmids within their GEC.

Successful conjugation requires donor and recipient cells to be compatible, as determined by surface proteins on the recipient (Thomas & Nielsen, 2005). Donor cells encode proteins involved in processes that comprise a conjugation event. In the case of highly complex systems such as the large self-transmissible plasmids of Gram-negative bacteria that use a type IV secretion apparatus to produce a pilus (Juhas et al., 2008), these processes include pilus assembly and retraction, identification of compatible recipient cells and signalling the commencement of DNA processing and transfer. In some Gram-positive enterococci, identification of compatible recipient cells is pheromone-activated and mediated by an activator–antagonist relationship, ensuring donor–recipient specificity (Hirt et al., 2002). The frequency of conjugation, however, likely depends primarily on the donor bacterium; recipient E. coli cells, for example, contain no nonessential genes essential for conjugation (Pérez-Mendoza & De La Cruz, 2009).

Bacteria have nonetheless evolved strategies to inhibit conjugation, and thus limit the production of transconjugants. Exclusion mechanisms limit conjugative transfer into bacterial cells in which plasmids encoding similar transfer systems already reside. The F plasmid, for example, does this by two mechanisms, surface and entry exclusion, mediated by plasmid genes traT and traS, respectively (Achtman et al., 1977). The outer membrane protein TraT hinders contact between the surface of the cell and the F pilus about 10-fold, and the inner membrane protein TraS impedes DNA entry about 100-fold (Frost et al., 1994). Mechanisms of entry exclusion are widely distributed and may be an essential feature of conjugative plasmids (Garcillán-Barcia & de la Cruz, 2008). While exclusion can create an effective barrier against conjugative transfer, it is by no means impermeable. Among F plasmids, for example, extensive interplasmid recombination suggests that some plasmids have evaded exclusion mechanisms to enter cells that carried closely related elements, and were subsequently able to replicate and be stably maintained (Boyd et al., 1996).

Plasmids can be classed into incompatibility (Inc) groups (Novick et al., 1987). Incompatibility has been described as a manifestation of relatedness: plasmids that utilize common mechanisms for replication or inheritance cannot proliferate in the same cell line (Carattoli et al., 2005). Inc groups thus constrain plasmid host range. Plasmids must nonetheless adapt to unfavourable hosts if they are to persist long term within a GEC. While some plasmids can be maintained only in one or a few bacterial hosts, others replicate in diverse bacterial genera. Broad-host-range plasmids have evolved diverse replication strategies including versatile replication systems, self-sufficiency in encoding proteins necessary to establish the replisome after conjugation and multiple replicons (Kramer et al., 1998; Toukdarian et al., 2004). Plasmids of the self-transmissible incompatibility group IncP-1 carry a wide range of resistance genes and are found in environmental (particularly wastewater) as well as clinical settings; they not only utilize mechanisms for transfer, replication and maintenance in diverse Gram-negative hosts, but can also mobilize the transfer of other plasmids into an even broader range of hosts (Schlüter et al., 2007).

Broad-host-range plasmids may not be equally stable in all hosts, particularly as their ability to persist in a bacterial population is determined in part by host-encoded traits (De Gelder et al., 2007). Under certain selective conditions, plasmids can expand their host range, often via a relatively small number of genetic changes (De Gelder et al., 2008). Plasmids in the same plasmid family can exhibit very different host ranges (Wu & Tseng, 2000), suggesting that broad-host-range plasmids can probably arise selectively from those of narrower host ranges (Thomas & Nielsen, 2005).

Fondi et al. (2010) introduced the concept of the pan-plasmidome, the set of all plasmids harboured by members of a taxonomic group. Based on analysis of 493 protein-coding sequences, the 29 plasmids found in strains of genus Acinetobacter were separated into two main groups: the pKLH family of eight plasmids from a number of Acinetobacter species and a group of 15 plasmids from Acinetobacter baumannii strains. Six further plasmids fell within neither group. Patterns of identity reveal extensive gene sharing within group (more so within the pKLH plasmid family), but less sharing between group; thus, in this case, as presumably in many others, plasmids mediate preferential flow of genetic information within and between GECs.

Evasion of host defence systems

Bacteria mount multilayered defences against foreign DNA (Horvath & Barrangou, 2010). In transformation, conjugative transfer and transduction by ssDNA phages, DNA enters the bacterial cytoplasm in a single-stranded form and as such is available for recombination into the host chromosome. Host defences against these processes must therefore target ssDNA. Restriction–modification systems act only against dsDNA, and thus in principle provide defence only against reconstituted plasmids and dsDNA phages.

Restrictionmodification systems identify and destroy foreign DNA: the modification component recognizes a specific short oligonucleotide sequence and methylates a defined nucleotide within it, protecting that site from cleavage by a restriction endonuclease (Wilkins, 2002). If matters remained so simple, host DNA would be protected while reconstituted foreign plasmids would be prevented from gaining a foothold. Plasmids can, however, counteract host defence systems in several ways. Some plasmids encode proteins that are rapidly expressed upon entry and inhibit host restriction–modification systems; in certain other cases, not only plasmids but also plasmid-encoded inhibitory proteins enter the host during conjugation (Wilkins, 2002). Many plasmids encode distinct restriction–modification systems that protect both themselves and their new host. Yet another strategy is seen with IncP-1 plasmids, from which most restriction–modification sites have been eliminated by selection (Wilkins et al., 1996). As the success of restriction–modification as a barrier is approximately proportional to the number of target sites (Thomas & Nielsen, 2005), these mechanisms will have greater or lesser success against foreign dsDNA from one host–plasmid system to another. Differential outcomes likewise arise from the diversity of host–plasmid interactions that determine replicability and copy-number control (Thomas & Nielsen, 2005).

Most human strains of S. aureus can be placed into one of 10 clonal complexes (Feil et al., 2003) characterized inter alia by unique combinations of surface and virulence genes and by a unique specificity of the main restriction–modification system SauI (Lindsay et al., 2006). Genetic material is exchanged within each complex, but DNA originating from a different complex is recognized as foreign and cleaved (Waldron & Lindsay, 2006). Staphylococcus phages have evolved a degree of specificity for host complex group (Goerke et al., 2009) and conversely, strains in each complex accept only a few phage types. The SauI system likewise presents a (partial) barrier to conjugative transfer into S. aureus from enterococci (Waldron & Lindsay, 2006).

More recently, an ‘immune system’ based on clustered, regularly interspaced, short palindromic repeat (CRISPR) loci, together with CRISPR-associated (cas) proteins, has been recognized in diverse bacteria (including human pathogens) and many archaea. CRISPR loci interfere with infection by foreign phages and plasmids by specifying small RNAs, which, mobilized in a Cas complex, target complementary sequences on foreign DNA, with the result that the foreign DNA is degraded. Each CRISPR locus contains an array of direct repeats separated by sequences that originate from viral and plasmid genomes ‘captured’ by other components of the system. The small RNAs generated by transcription are thus complementary to foreign DNA previously presented to the cytoplasm, enabling the CRISPR/Cas system to present a programmable barrier to LGT analogous in this respect to the mammalian immune system (Horvath & Barrangou, 2010; Marraffini & Sontheimer, 2010a, b). Much remains to be learned about Cas complexes in diverse bacteria, including whether they can target ssDNA and/or RNA as well as dsDNA (Hale et al., 2009; Horvath & Barrangou, 2010); thus, it is not yet known whether CRISPR/Cas offers protection against transformation (Marraffini, 2010). There is evidence that phages can circumvent the CRISPR/Cas system by mutation or deletion of genomic regions complementary to the CRISPR spacers (Horvath & Barrangou, 2010).

CRISPR sequences have been recognized not only in genomes of diverse prokaryotes but also in bean mitochondria (Mojica et al., 2000) and the ocean metagenome (Sorokin et al., 2010), where they ‘retain the memory of the local virus population and a particularly ocean location’. The system is absent from S. pneumoniae and H. pylori, but is found in some other readily transformable bacteria (Marraffini et al., 2010). The disjunct phyletic distributions of some CRISPR loci, and phylogenetic trees inferred from sequences of Cas proteins, provide compelling evidence for lateral dispersion of the CRISPR/Cas system itself (Horvath & Barrangou, 2010) mediated by plasmids, megaplasmids and possibly prophage (Sorek et al., 2008). Diversity can be substantial within species: four main groups of CRISPR loci, and one minor group, have been recognized among 100 diverse strains of E. coli (Díez-Villaseñor et al., 2010).

Not only CRISPR/Cas but also restriction–modification systems can spread by LGT (Kobayashi et al., 1999; Sibley & Raleigh, 2004; Zhao et al., 2006), again illustrating the potential for the ongoing dynamic construction of GECs.


Recombination per se

Following the physical transfer of exogenous DNA into a new host cell, foreign DNA can be integrated into the recipient genome through homologous recombination, illegitimate recombination or a combination of the two. Escherichia coli can also integrate exogenous DNA into its chromosome or plasmids via an end-joining repair mechanism known as ‘alternative end-joining’ (Chayot et al., 2010) that does not involve recombination. Nonhomologous sequences, including those specifying antibiotic resistance, can be integrated in the absence of large-scale sequence similarity: three to eight consecutive nucleotides are required for ligation. Here, we focus on mechanisms of integration that involve recombination into the host genome.

The incorporation of foreign DNA via homologous recombination is a highly efficient process that occurs at an appreciable frequency and contributes to the propagation of alleles in a population (Lawrence & Retchless, 2009). To be recombined by this process, incoming sequences must contain a region of sufficient length (typically 25–200 bp) and similarity to the recipient genome (Thomas & Nielsen, 2005). In Bacillus subtilis, E. coli, Pseudomonas stutzeri and S. pneumoniae, the frequency of homologous recombination has been shown to decrease with increased sequence divergence in a log-linear relationship (Zawadski et al., 1995; Vulić et al., 1997; Majewski et al., 2000; Meier & Wackernagel, 2005).

In contrast, nonhomologous or illegitimate recombination results in the incorporation of less-similar genetic material, i.e. from distantly related donors, although at a lower frequency (Ochman et al., 2000). Its adaptive benefits can differ from those of homologous recombination (Vos, 2009). Homology-facilitated illegitimate recombination combines the features of both homologous and illegitimate recombination (Meier & Wackernagel, 2003). Introduction of a low-similarity sequence into a recipient genome can be stimulated up to 105-fold if it contains a high-similarly region that can initiate recombination and anchor its extension into the adjacent lower-similarity segment (de Vries & Wackernagel, 2002; Prudhomme et al., 2002).

Although homologous DNA can be integrated with considerable efficiency, the frequency of recombination is very low at an overall genome sequence divergence >25% (Matic et al., 1997; Vulić et al., 1997; Majewski & Cohan, 1998; Majewski et al., 2000; de Vries et al., 2001). Evolutionarily unrelated species do nonetheless exchange genetic material (Matic et al., 1996). Mismatch repair systems represent the principal barrier to recombination between divergent species, but this barrier is permeable to some genetic fragments (Gogarten et al., 2002) and can itself be subject to LGT (Denamur et al., 2000; Lin et al., 2007).

The efficiency of recombination correlates not only with sequence divergence but also with the length of the transferred region and with the genomic location of integration. This was illustrated particularly clearly by a genome-wide study of genetic gain and loss events across 20 E. coli strains (Touchon et al., 2009). These investigators found that the length of acquired regions varies widely; only 8% of gains involve >10 genes in a single event. The latter include pathogenicity islands and prophages. Recombination and genetic loss preferentially occur at sites that are highly conserved across E. coli genomes (integration hotspots), and 61% of all synteny breakpoints identified across these genomes fell within 133 integration hotspots, implying that these are hotspots of genomic rearrangement as well.

Recombination need not involve the integration of regions exactly delineated by gene boundaries. In many gene families, regions sufficiently conserved to promote homologous or homology-facilitated illegitimate recombination occur internal to the sequence (Gogarten et al., 2002), providing the possibility that portions of genes might be integrated. Indeed, evidence has been presented for LGT of DNA regions ranging in length from seven nucleotides (Denamur et al., 2000) to entire bacterial chromosomes (Lin et al., 2008), encompassing partial and entire genes, intergenic regions, multigene clusters, transposable elements, prophages and GIs (Ragan & Beiko, 2009). Chan et al. (2009) reported that among 1462 sets of orthologous genes, 286 (19.6%) show clear evidence of at least one recombination breakpoint inside the ORF; they found some evidence that recombination breakpoints avoided gene regions encoding protein structural domains (domons) if the domains were small, whereas domons corresponding to large domains were interrupted by recombination breakpoints uniformly at random.

Mobile genetic elements and LGT

The reassortment of resistance genes from different donors to create multidrug-resistant strains is a clear example of how recombination can allow bacterial populations to adapt to selective pressure over the short term (see Box 1). Here, we review the molecular mechanisms and elements that contribute to LGT through recombination with a host genome, focusing on how these elements contribute to the spread and emergence of new combinations of chromosomally encoded resistance.

View this table:
Box 1.

LGT and the emergence of antibiotic resistance in Staphylococcus aureus

Staphylococcus aureus has become known for developing resistance to antibiotics. Chambers & DeLeo (2009) describe the emergence of multidrug-resistant S. aureus strains as a series of epidemic waves that have successively given rise to epidemic penicillin-resistant S. aureus in hospitals and the community, methicillin-resistant S. aureus (MRSA) in hospitals, community-associated MRSA, vancomycin-intermediate S. aureus (VISA) and fully vancomycin-resistant MRSA (VRSA). LGT has played a central role in most or all of these waves. The penicillin resistance that appeared in the mid-1940s was specified by a plasmid-borne β-lactamase with narrow specificity for penicillin. MRSA was first seen in 1960, shortly after methicillin was introduced (Barber, 1961; Jevons et al., 1961). The origins of mecA, which encodes a protein that binds to and inactivates β-lactam antibiotics generally, are uncertain (Hiramatsu et al., 2001) although a mecA homologue 80% identical at the amino acid level occurs in methicillin-sensitive Staphylococcus sciuri and represents a potential precursor (Couto et al., 1996). mecA occurs in a mobile chromosomal cassette (SCCmec) that also encodes genes specifying recombination and excision; variants of SCCmec differ in recombination potential and hence in the ability to spread. Community-associated MRSA strains seem not to have arisen directly from hospital strains, but have been (and are being) assembled laterally based on different combinations of the SCCmec cassette, plasmids, prophages, pathogenicity islands and transposons carrying a variety of resistance and virulence determinants (Chambers & DeLeo, 2009). Whereas resistance in VISA appears to be chromosomally mediated, VRSA strains have emerged via plasmid-mediated conjugational transfer of the vanA operon into MRSA from a vancomycin-resistant Enterococcus faecalis (Showsh et al., 2001; Lowy et al., 2003; Weigel et al., 2003).
Methicillin and vancomycin are widely used in clinical practice, but resistance determinants have spread so far only within certain S. aureus lineages (Lindsay, 2010). Robinson & Enright (2003) estimated that methicillin-resistance determinants had at that point been transferred into S. aureus about 20 times. Similarly, the lateral dissemination of vancomycin resistance has been relatively slow; only about 10 VRSA strains have been isolated exclusively in health-care settings (Chambers & DeLeo, 2009), a modest number given the high incidence of vancomycin-treated patients who harbour both vancomycin-resistant enterococci and S. aureus (Weigel et al., 2003). The slow spread of these resistance determinants so far may be due in part to specific genetic mechanisms within the S. aureus lineage that constrain both intra- and interspecies exchange.
Waldron & Lindsay (2006) characterized Sau1, a type I restriction modification system composed of the hsdR (restriction) gene and two copies of the hasM (modification) and hsdS (sequence specificity) genes. The system recognizes and digests foreign DNA entering the cell, providing a barrier to the acquisition of foreign DNA by S. aureus isolates. The hsdS genes, responsible for recognizing specific foreign DNA sequences, differ significantly among S. aureus of different lineages. These differences constrain within-species transfer and have contributed to the emergence of the clonal structure of MRSA (Lindsay et al., 2010). More recently, Corvaglia et al. (2010) identified and characterized in clinical S. aureus a type III-like restriction endonuclease, a previously unidentified barrier that prevents transformation by DNA from other species. Critically, some clinical MRSA strains are deficient in this system, potentially rendering them more susceptible to acquisition of DNA from other bacterial species; this system is of particular clinical relevance because it may represent the primary barrier to acceptance of vancomycin resistance from E. faecalis. A clinical isolate of Staphylococcus epidermidis has been found to have a CRISPR locus identical in sequence to a region of the nickase gene nes found in all sequenced staphylococcal conjugative plasmids including those in MRSA and VRSA (Marraffini & Sontheimer, 2008); this locus is not known in strains of S. epidermidis in ATCC.
Not only do S. aureus strains limit participation in genetic exchange, they also encode mechanisms that promote the uptake of DNA. For example, S. aureus produce peptides that induce enterococci to enter the aggregative state, and this facilitates cross-generic exchange (Clewell et al., 2002; Flannagan & Clewell, 2002). Genetic exchange involving the S. aureus lineage appears to be tightly regulated by a balance between mechanisms that limit or promote transfer.

Mobilization of DNA within genomes (transposition) plays an important role in the intracellular and intercellular movement of genes. Transposons have long been associated with the dissemination of antibiotic resistance (Bennett et al., 2008). While their structure and genetic relatedness varies widely, in general, they are composed of a central DNA sequence flanked by inverted insertion sequences (IS) or other elements involved in transposition.

Recently, a new class of recombination system requiring only one insertion element for gene mobilization has been recognized (Toleman et al., 2006a, b). These mobile elements, known as insertion sequence common regions (ISCRs), transpose by a rolling-circle mechanism that is distinct from that used by transposons (Tavakoli et al., 2000). ISCRs lack the terminal repeats present in most IS elements, and instead are bounded by terminal sequences called oriIS and terIS for the orgin and the termination of replication, respectively. Recognition of the terIS has been shown to be somewhat inaccurate (Tavakoli et al., 2000), with the outcome that sequences adjacent to an ISCR, including antibiotic-resistance determinants, can be mobilized (Bennett et al., 2008). Numerous families of ISCR elements are now recognized, many associated with antibiotic-resistance genes or other selectable determinants (Toleman et al., 2006b). The tendency of ISCR elements to mobilize sequence beyond the terIS may be involved in the construction of complex class 1 integrons that harbour new assemblies of resistance gene arrays not seen in classical integrons (Toleman et al., 2006a, b).

Integrons are elements that capture mobile gene cassettes by site-specific recombination rather than by transposition (Hall & Collis, 1995; Mazel, 2006). Integrons share a common recombination system composed of a gene (intI) that encodes a site-specific recombinase (IntI) and an adjacent primary recombination site (attI) into which gene cassettes can be integrated. IntI-catalysed recombination events move gene cassettes within and between integrons, allowing these cassettes to be sorted into novel combinations. While one or more gene cassettes are found in most integrons, they are not a definitive part of the integron structure, and indeed, integrons lacking cassettes have been found in the environment (Bissonnette & Roy, 1992). Integrons can be chromosomally encoded and are not independently mobile; they are, however, often mobilized on plasmids and transposons, in this way moving among diverse genomes and taking on their observed broad phyletic distribution (Hall et al., 1999; Boucher et al., 2007; Nemergut et al., 2008). Movement of gene cassettes encoding antibiotic resistance within and between integrons, together with clonal expansion, plays an important role in the emergence of multidrug resistance (Krauland et al., 2009).

ICEs are self-transmissible mobile genetic elements that are increasingly recognized as contributing to lateral genetic flow within GECs. In particular, the role of ICEs in the dissemination of antibiotic resistance has been recognized in pathogens (Hochhut et al., 2001b; Whittle et al., 2002; Mohd-Zain et al., 2004). The term ICE was introduced by Burrus et al. (2002) to encompass all mobile genetic elements with conjugative or integrative properties, independent of their mechanism of integration or conjugation (Burrus & Waldor, 2004a; Burrus et al., 2006; Wozniak & Waldor, 2010). ICEs resemble conjugative plasmids in that they disseminate via conjugation, but unlike plasmids, may not be able to replicate autonomously, although this is still under investigation (for discussion, see Wozniak & Waldor, 2010). ICEs cross-circulate among a diverse range of hosts: Tn916 for example, one of the first ICEs recognized, has been identified in Proteobacteria, Actinobacteria and Firmicutes (Roberts & Mullany, 2009). Another well-studied example is the Vibrio cholerae-derived ICE known as SXT, which, together with related elements such as R391, have now been identified in Gammaproteobacteria worldwide (Burrus et al., 2006). In the laboratory, the conjugative range of SXT includes V. cholerae and strains of E. coli (Waldor et al., 1996).

To date, there has been limited exploration of the mechanisms that regulate and control the dissemination of ICEs within such diverse GECs. However, recent comparative analyses of the genomes of 13 SXT/R391 elements provide evidence not only of extensive recombination among ICEs but also of recombination between ICEs and other mobile elements such as plasmids and phage (Garriss et al., 2009; Wozniak et al., 2009). The reassortment of genetic regions encoding site-specific integration mechanisms among ICEs and other mobile elements may explain the expansion of ICE host ranges (Wozniak & Waldor, 2010).

SXT ICEs carry a conserved core set of sequences shared among all SXT elements, as well as highly mosaic sets of sequences shared only within subsets of these elements. Antibiotic-resistance genes are encoded in the latter and appear to undergo significant flux into and out of the elements (Burrus et al., 2006). This extensive swapping of constituent genetic regions among SXT ICEs may arise from their ability to form tandem arrays of closely related elements, as this would offer increased opportunities for homologous recombination (Hochhut et al., 2001a; Burrus & Waldor, 2004b). Garriss et al. (2009) assert that this inherent ability to reassort their variable gene content makes it possible for ICEs to contribute to the dissemination of new combinations of antibiotic-resistance genes.

GIs are large chromosomal regions that have been acquired by LGT. Unlike ICEs, they are no longer (or may never have been) self-transmissible (Hentschel & Hacker, 2001). GIs encode their own excision and integration, and can be mobilized by plasmids, phages and ICE conjugation systems (Dobrindt et al., 2004; Boyd et al., 2008; Novick et al., 2010). They typically exhibit an aberrant base composition vis-à-vis the host bacterial genome, a feature of recently introgressed regions (Lawrence & Ochman, 1997; Ragan et al., 2001b). Although otherwise diverse, GIs encode an integrase (and often other mobility determinants, for example enzymes for excision and transposition) and primary attachment sites, and are flanked by direct perfect or near-perfect repeats. Phylogenetic analysis of integrase proteins (Boyd et al., 2008) reveals GIs to be an evolutionarily distinct class: integrase proteins encoded by GIs and prophages (IntG and IntP, respectively) form distinct, divergent subtrees within the overall integrase tree. The deeply branched IntG subtree is in general agreement with accepted organismal relationships among Proteobacteria, implying that GIs have been stably associated with proteobacterial lineages over evolutionary time. On the other hand, S. aureus pathogenicity islands can be transduced to Listeria monocytogenes (Chen & Novick, 2009) and ‘it is probably only a matter of time’ until they are recognized more broadly among Firmicutes and perhaps even archaea (Novick et al., 2010).

The earliest-recognized GIs encoded virulence factors (Blum et al., 1994; Hacker et al., 1997; Dobrindt & Reidl, 2000; Hacker & Kaper, 2000), but the term now encompasses regions encoding genes of diverse functions including antibiotic resistance, superantigens, transporters and metabolic genes (Juhas et al., 2009). Salmonella genomic island 1 (SGI1) illustrates the role of GIs in the dissemination of resistance (Boerlin & Reid-Smith, 2008). SGI1, first described in strains of epidemic multidrug-resistant S. enterica serovar Typhimurium phage type DT104, contains a cluster of genes encoding resistance to ampicillin, chloramphenicol/florfenicol, streptomycin/spectinomycin, sulphonamides and tetracyclines (Boyd et al., 2001; Mulvey et al., 2006). This cluster is essentially a complex integron composed of adjacent tetracycline and chloramphenicol–florfenicol resistance genes of plasmid origin, flanked by two class 1 integrons (Boyd et al., 2001). Although SGI1 itself does not encode self-mobilization, Doublet et al. (2005) demonstrated in vitro that it is readily transferrable via a helper plasmid during conjugation between Salmonella strains and from Salmonella to E. coli. The discovery of SGI1 in Salmonella serovars other than Typhimurium (Velgea et al., 2005) strongly suggests that the association of SGI1 with other mobile genetic elements has contributed to the spread of antibiotic resistance within Salmonella.

Integration into host-regulatory and molecular interaction networks

If laterally acquired genetic material is to persist in its new host, its expression must be regulated and the gene products it encodes must interact successfully with host systems. Newly introgressed DNA must therefore avoid being silenced, and over time establish appropriate interaction partners by recruiting transcriptional regulators (Navarre et al., 2007; Wellner et al., 2007; Lercher & Pál, 2008), recognizing the host's signals for transcription, translation, folding and assembly (Lercher & Pál, 2008), and otherwise undergoing changes that fine-tune the kinetic and thermodynamic interactions of the encoded protein. Inappropriate expression would likely impose a significant upfront cost on host competitive fitness; enteric bacteria limit these costs by silencing the expression of foreign genes via the DNA-binding protein H-NS (Dorman et al., 2007; Navarre et al., 2007).

H-NS preferentially recognizes and silences sequences that have a G+C content lower than that of the host genome; this includes many lateral sequences, particularly those formerly resident in phage (Daubin & Ochman, 2004; Lucchini et al., 2006; Oshima et al., 2006; Navarre et al., 2007). Indeed, approximately 90% of H-NS-repressed genes in Salmonella show evidence of lateral origin (Navarre et al., 2007). Nonetheless, some laterally acquired genes confer relatively immediate functionality (e.g. for antibiotic resistance), suggesting that bacteria can counteract transcriptional downregulation by H-NS, where the newly introgressed DNA confers selective advantage. Amelioration to host G+C content and recruitment of antagonists of H-NS action have been put forward as strategies by which host organisms ‘fight back’ against H-NS silencing (Navarre et al., 2007). Different strains within a species may be differentially able to counteract H-NS silencing (Sankar et al., 2009), potentially yielding a competitive advantage under some circumstances.

While there is a growing understanding of the mechanisms that underpin the acquisition of antibiotic-resistance genes via LGT (Boerlin & Reid-Smith, 2008), the network and evolutionary dynamics that allow their efficient stoichiometric participation in cellular networks remain relatively unexplored. Genes encoding antibiotic resistance, especially single-function resistance determinants such as β-lactamases and aminoglycoside-modifying enzymes, tend to belong to simple sets of functional networks, and probably for this reason, have received limited attention in the network literature. In particular, little is known of the evolutionary histories of antibiotic-resistance genes resident on plasmids, making it difficult to predict how readily they (and their products) might join and interact with specific cellular networks. Plasmids carry resistance determinants against practically every type of antimicrobial agent, but except for genes encoding replication and transfer functions (Fernández-López et al., 2006; Cevallos et al., 2008), these determinants are diverse in sequence and structure, rendering traditional phylogenetic approaches ineffective (Bentley & Parkhill, 2004).

According to the complexity hypothesis (Jain et al., 1999), the likelihood that a gene can be established successfully in a new host is inversely correlated with the number of partners with which the corresponding protein must interact. The extended complexity hypothesis postulates that genes that encode proteins with many interaction partners are relatively less likely to be under adaptive evolution (Aris-Brosou et al., 2005). Thus, genes encoding proteins that function in large complexes, including those responsible for translation and transcription, are infrequently of lateral origin, whereas proteins that function more autonomously (e.g. β-lactamases) can immediately affect phenotype and confer a selective advantage to the host. Genes newly recombined into the chromosome may or may not have to establish transcriptional regulation: those replacing (part of) a preexisting similar gene via homologous recombination, for example, may be successfully controlled by existing promoters and require only fine-tuning, for example to optimize codon usage. Another strategy is seen with the IC element SXT-R391, which encodes a regulatory module that modulates its gene expression in response to environmental stimuli (Wozniak & Waldor, 2010).

More generally, genes of lateral origin exhibit more complex regulation (tend to be regulated by more regulators) than native genes (Price et al., 2008). Lercher & Pál (2008) found that transferred genes are generally integrated into the regulatory network of their host over millions of years. The rapid and broad dissemination of resistance determinants clearly occurs very much more quickly than this (see Box 1), suggesting a weaker integration into host-regulatory networks.

Selection and spread in the population

Antibiotic resistance is clinically significant to the extent that it allows pathogens to evade therapeutic intervention. Under a selective regime (e.g. the application of antibiotics), resistant cells remain reproductively successful while their susceptible counterparts do not. If the selective pressure persists, resistant cells will eventually become predominant in the population. This simple scenario is complicated by the costs that the capture, integration, maintenance and expression of antibiotic-resistance determinants may impose on host fitness; by linkage among selectable traits; by the genetic structure and dynamics of bacterial populations; and of course by contingencies of an individual selective regime for example its dose, uniformity and duration. Here, we consider the cost–benefit equation with regard to host fitness, and the effect of multiple selectable traits.

The cost–benefit equation for LGT

It has long been established that the presence of phages (Lenski et al., 1988a) or plasmids (Godwin & Slater, 1979; Helling et al., 1981) can reduce the reproductive fitness of the bacterial host. Plasmids that carry antibiotic resistance usually impose a fitness cost, although its magnitude can vary over at least two orders of magnitude (De Gelder et al., 2008; Andersson & Hughes, 2010), while chromosomal resistance mutations may impose a cost (Gagneux et al., 2006) or be largely ‘free’ (Böttger et al., 1998). Novel DNA of lateral origin might likewise reduce host fitness through increased mutational load, direct and indirect metabolic costs of replication and expression, and/or increased regulatory overhead. While dissecting these factors below, we bear in mind that selection acts at the level of organism or local population.

Mutational load refers, inter alia, to the interruption of genes or control regions for example by transposon insertion, suboptimal codon usage, deviation from replichore balance (Darling et al., 2008), pleiotropic detuning of regulatory and molecular-interaction networks, and reduced catalytic efficiency. It is inherently risky to express a foreign gene: experience from microbial genome-sequencing projects shows that many genes cannot be cloned in E. coli, often due to the toxicity of the protein product (Sorek et al., 2007). Perhaps not coincidentally, most genes unclonable in E. coli were single copy in their original genome. This risk may be mitigated by gene silencing, for example via the H-NS system (Dorman, 2007; Navarre et al., 2007).

Gene transmission and expression incur metabolic costs

Direct costs of substrate utilization, measured as phosphate groups per residue, can be under strong selective pressure (Nogueira et al., 2009). Costs might arise indirectly via slower replication or the requirement for larger cells (Kurland, 2005). In an E. coli–pBR322 system, fitness was reduced only if the resistance protein (later characterized as a tetracycline-H+ antiporter) was expressed in active form; expression of defective protein did not noticeably reduce fitness, perhaps because the cost arises from localization or activity in the membrane, i.e. not from substrate or energy costs per se (Lee & Edlin, 1985).

Are these costs significant in natural environments? Gene content can vary substantially within closely related groups (e.g. species) of bacteria: among 20 strains of E. coli isolated from mammalian intestinal or urinary tracts, for example, gene number per genome ranges from 4627 to 5129, of which only 1976 are core in the sense of being represented in all strains. A discrete cluster of several hundred marine Vibrio isolates >99% identical by 16S rRNA gene sequence exhibits extreme heterogeneity of genome size, genotype and Hsp60 allele type; as their environment, averaged over time, is essentially homogeneous at the cellular scale, these differences must be selectively neutral (Thompson et al., 2005). Among the genes present in both E. coli and S. enterica, those lacking a significant match elsewhere among prokaryotes (ORFans) show a Ka/Ks ratio (0.19±0.030) consistent with weak purifying selection, whereas those with sporadic matches exhibit a Ka/Ks only slightly greater (0.08±0.005) than that of core genes (0.05±0.001). Genes in the two former categories are likely of lateral origin, the ORFans probably via phage transduction (Daubin & Ochman, 2004).

Some costs to fitness may be transient

Two classes of mutations that confer resistance to phage have been recognized, reducing fitness by 15% and 45%, respectively, via maladaptive pleiotropic effects (Lenski, 1988a); however, these effects were largely compensated by subsequent mutations over 400 generations (Lenski et al., 1988b). In Mycobacterium tuberculosis, resistance to rifampin is mediated by missense mutations in rpoB, the gene encoding the β subunit of RNA polymerase (Gagneux et al., 2006); most mutations conferring resistance incur a cost (assessed by competition assays) in the range 10–40%, although S531L has little to no cost, and clinical S531L mutants have 4±4% higher competitive fitness than their rifampin-susceptible ancestors. Many instances are known in which the fitness costs of antibiotic resistance are subsequently mitigated by compensatory mutations (Andersson & Levin, 1999; Johnsen et al., 2009; Andersson & Hughes, 2010), and strains with lower-cost mutations will tend to be selected in populations (Gagneux et al., 2006).

Costs also arise from regulatory overhead. Gene expression must be regulated if genes and their products are to act in concert. Biologically plausible models of network growth require regulator number to scale quadratically with number of genes, in the case of prokaryotes constraining genomes to encode no more than about 10 000 gene products (Gagen & Mattick, 2005). Such models imply that antibiotic-resistance genes of lateral origin should be either (semi-)autonomously regulated (e.g. as in prophages) or only weakly connected to the cellular network (Integration into host regulatory and molecular interaction networks).

The cost–benefit equation extends to the population level. Genes of lateral origin, and/or those associated with mobile genetic elements, are more likely than others to specify proteins that are secreted and modulate cooperative traits. Nogueira et al. (2009) suggest that laterally mobile elements such as plasmids, ICEs and temperate phages are so prevalent in bacterial populations because they code for factors that are ‘powerful generators of microbial social networks’. These social networks, in turn, can promote the stability of biofilms (Xavier & Foster, 2007), which, as we have seen, can be hotspots of lateral transfer and recombination.


Antibiotic-resistance determinants are often physically proximate to genes that specify other selectable traits including resistance to heavy metals or detergents, transmission between hosts, colonization of substrates or production of biofilms and can thereby spread in populations even in the absence of antibiotic use. Given the recombination frequencies typical of bacteria (Gogarten & Townsend, 2005), simple co-occurrence on a bacterial chromosome offers considerable linkage. Plasmids likewise often specify multiple selectable traits, and moreover can be maintained, in the absence of ongoing selection, in host populations by stability systems such as PSK (Thomas, 2000; Kroll et al., 2010). Linkage to other selected traits, together with compensatory mutations that reduce the carrying cost of resistance, are presumably largely responsible for the maintenance of antibiotic resistance in communities for years after the use of that antibiotic has ceased (Bean et al., 2005; Johnsen et al., 2009).

Linkage of multiple selectable traits to virulence genes can prove particularly problematic, as selection pressure from different fronts can then drive the spread of virulent clones. For example, S. pneumoniae serotype 14, commonly responsible for invasive diseases including necrotizing pneumonia and haemolytic uremic syndrome, has acquired two large conjugative transposons and a resistance island. The larger conjugative transposon is a composite of three other transposons, and carries five genes specifying resistance to chloramphenicol, erythromycin (two genes), streptothricin and kanamycin. The smaller conjugative transposon is likewise a composite of two transposons, and carries genes specifying tetracycline and erythromycin resistance. Two of 16 other S. pneumoniae genomes contain variants of both composite transposons, although with some variation in specific gene content. The resistance island carries another chloramphenicol-resistance gene, a site-specific recombinase and several IS elements. Further to these three regions, the genome encodes penicillin-binding proteins, a multidrug-resistance efflux pump, two β-lactam-resistance factors, three metallo-β-lactamases, a bacitracin-resistance protein, various heavy-metal resistance proteins and a large number of virulence determinants associated with the capsule and cell surface, lytic activity, hydrogen peroxide production, lantibiotic synthesis and other functions. Together, these present a worrisome picture of stepwise lateral transfer, recombination and gene deletion events combining to yield a multidrug-resistant, highly virulent pathogen (Ding et al., 2009).

Other species in which stepwise lateral transfer and recombination involving different mobile genetic elements has produced multidrug-resistant, virulent strains include enterohaemorrhagic E. coli (Venturini et al., 2010) and Legionella pneumophila (Cazalet et al., 2008; D'Auria et al., 2010). Stepwise LGT involving multiple plasmids and integrons has led to the emergence of multidrug resistance in a clinical isolate of Vibrio fluvialis (Rajpara et al., 2009), while stepwise acquisition of some 137 genes, including virulence determinants, has been mediated by phages and plasmids in the pathogen M. tuberculosis (Veyrier et al., 2009).


In LGT and the construction of GECs, we surveyed the diverse opportunities and barriers that can be differentially exploited to construct GECs. Here, we consider features of actual GECs, first adopting a knowledge-driven approach based on the scientific and medical literature. Although each report is necessarily local (e.g. to a host–pathogen system, marker set, vector type, analytical method and/or time scale) and contingent, from many such fragmentary and disconnected glimpses we can hope to infer the general properties of actual exchange communities. Thereafter (GECs: data-driven approach), we describe data-centric bioinformatic approaches that aim for a more-synoptic view of LGT across the biosphere. A third approach, based on experimental laboratory or field biology, may be possible in principle, if perhaps impractical at the scale necessary: leading references are Gamage et al. (2004) for laboratory-based investigation; Sørensen et al. (2005) and Babić et al. (2008) for direct visualization; and Ragan (2001a), van Elsas et al. (2003) and Kelly et al. (2009b) for field or mesocosm studies.

GECs: knowledge-driven approach

GECs can be a single species, strain or clonal complex in a single type of host

This is the model for the exchange of an antibiotic-resistance determinant within a bacterial population, and may be combined with selective sweeps (e.g. encounters with antibiotic) that strongly disadvantage nonresistant individuals. Near-identity of donor and host genome sequences, and of genomic environment more generally, facilitate homologous recombination, expression and subsequent regulation (Zawadski et al., 1995; Majewski & Cohan, 1998; Didelot & Maiden, 2010). As described above, within S. aureus the SauI restriction–modification system limits exchange of genetic material across the clonal complexes that comprise the species as recognized. The limiting case is provided by certain obligate bacterial symbionts, which, depending on lifestyle, may only rarely encounter foreign DNA (Bordenstein & Reznikoff, 2005; Moran et al., 2008).

GECs can link different species in a common host or environment

Antibiotic-resistance plasmids can variously be transferred among strains of Escherichia, Salmonella, Klebsiella and Pseudomonas when these bacteria share a common environment (e.g. Gebreyes & Altier, 2002; Schjørring et al., 2008; Kelly et al., 2009a; Shakibaie et al., 2009). Mathew et al. (2009) report identical class 1 integron variable regions in identically sized plasmids of E. coli and Salmonella spp. from a single swine farm, consistent with recent lateral transfer; one plasmid confers resistance to streptomycin and spectinomycin, and the other to trimethoprim. Hinnebusch et al. (2002) describe the transfer of a resistance plasmid from E. coli to Yersinia pestis in an insect-gut model.

GECs can be a single species or strain living in diverse hosts and/or environments

In principle, a distinction can be made between an antibiotic-resistant strain that, on the one hand, infects a new kind of host or colonizes a new environment or, on the other, transfers a resistance plasmid to a related strain already established in a different host or environment. As an example of the former, methicillin-resistant S. aureus (MRSA) strains can cross-infect humans, domestic animals and cattle (Juhász-Kaszanyitzky et al., 2007; Monecke et al., 2007; Brody et al., 2008). Alternatively, virulence factors can be transferred by a plasmid between hospital-acquired human MRSA and strains of bovine S. aureus (Brody et al., 2008), and genes are exchanged among strains of E. coli commensal in humans, animals and birds (Grasselli et al., 2008; Mellata et al., 2009). Resistance to vancomycin and to other antibiotics is transferred readily, in the absence of selective pressure, from porcine to human strains of Enterococcus faecium during experimental infection of the mouse intestinal tract (Moubareck et al., 2003).

GECs can link different genera across diverse hosts and/or environments

Strains of the Gram-negative bacteria Streptococcus, Enterococcus (formerly a section of Streptococcus), Staphylococcus and Listeria form exchange communities in various environments. Conjugative plasmids encoding antibiotic resistance can be transferred between Streptococcus and Enterococcus (Sedgley et al., 2008), from Enterococcus to Staphylococcus (Noble et al., 1992), from Enterococcus and Streptococcus to Listeria (Charpentier & Courvalin, 1999; Zhang et al., 2007), among diverse Listeria (Charpentier & Courvalin, 1999), and from Listeria to Enterococcus (Bertrand et al., 2005). A tetracycline-resistance plasmid has moved from a piscine Lactococcus (formerly Streptococcus) to human Listeria (Guglielmetti et al., 2009). More generally, genetic material can flow from Enterococcus, Streptococcus and Staphylococcus into Gram-negative bacteria including Campylobacter, Escherichia, Haemophilus and Klebsiella (Courvalin, 1994; Wagner & de la Chaux, 2008).

Conjugatively self-transferrable IncA/C family plasmids transfer antibiotic resistance efficiently between unrelated bacteria from different environments. Structurally similar plasmids have been characterized in the fish pathogens Aeromonas hydrophila and Photobacterium damselae, agricultural S. enterica and multidrug-resistant V. cholerae and Yersinia species. In the laboratory, IncA/C plasmids transfer conjugatively from Pseudomonas putida to diverse marine bacteria including the phylogenetically distant Planctomyces maris (Dahlberg et al., 1998), and from E. coli to diverse Gram-negative (Guiney, 1993) and Gram-positive (Trieu-Cuot et al., 1987) bacteria and to Saccharomyces cerevisiae (Heinemann & Sprague, 1989). TraBDF transfer proteins from IncA/C plasmids have homologues on certain ICEs from Photobacterium, Proteus, Providencia, Shewanella and Vibrio species (Fricke et al., 2009). The transfer efficiency of different IncA/C plasmids can vary over four orders of magnitude; instances of nontransferability may be due to chromosomal features or the absence of helper plasmids (Fricke et al., 2009).

Genetic exchange may moreover allow bacteria to extend their range of hosts and environments. The plant pathogen Erwinia carotovora ssp. atroseptica, for example, shares the common enterobacterial genomic backbone, but has acquired from other plant-associated bacteria numerous genes that support its plant-pathogenic lifestyle (Toth et al., 2006).

Environments can delimit GECs

Phylogenetic analysis reveals a nonrandom association of integron integrase gene lineage with the type of environment. All intI genes from soil and freshwater bacteria, together with those from the IntI1 and IntI3 families of mobile integrons, form a monophyletic group, whereas all intI genes from marine environments constitute an older, paraphyletic assemblage (interestingly, a different outgroup rooting could render the marine integrases monophyletic as well). By contrast, taxa map much less cohesively onto the intI tree, with beta-, gamma- and deltaproteobacterial sequences admixed (Mazel, 2006).

GECs sometimes cross boundaries of biological domains

Agrobacterium tumefaciens, commonly considered a plant pathogen, has a surprisingly broad host range extending well beyond plants; under laboratory conditions, it can form a conjugative structure (type IV secretion system), transfer T-DNA (derived from its Ti plasmid) and plasmid-encoded virulence proteins and genetically transform the nuclear genomes of a wide range of eukaryotes including green plants, S. cerevisiae, filamentous fungi, mushrooms and cultured human cells (Lacroix et al., 2006). Transformation of eukaryotes by T-DNA is arguably the clearest example of a unidirectional transfer mechanism, as eukaryotes are not known to initiate conjugation with prokaryotes. Likewise, E. coli can conjugatively transform yeast (Heinemann & Sprague, 1989; Nishikawa et al., 1992). The intracellular parasite Wolbachia has transferred much of its genome to arthropod and nematode nuclei (Dunning Hotopp et al., 2007), and nuclear genomes of bacteriophagic protozoa show evidence of recent LGT from bacteria (Gomez-Valero et al., 2009). Other putative examples of prokaryote-to-eukaryote transfer are summarized by Ragan & Beiko (2009). Transfer in the other direction, from eukaryote to bacterium, seems less frequent (Ragan & Beiko, 2009). Particularly interesting is the abundance of eukaryotic-like proteins encoded by Legionella genomes; several lines of circumstantial evidence point to multiple origins by LGT, presumably via transformation (Gomez-Valero et al., 2009).

GECs: data-driven approach

Since early in the multigenome age, computational analyses have been applied to multigenome datasets with the aim of identifying minimal (Mushegian & Koonin, 1996) or universal (Gaasterland & Ragan, 1998b) ORF sets, predicting protein function (Koonin et al., 1997; Tatusov et al., 1997; Pellegrini et al., 1999), delineating sets of orthologous genes (Bansal et al., 1998; Koonin et al., 2005) or gene regions (Wong & Ragan, 2008), mapping patterns of gene conservation and innovation (Gaasterland & Ragan, 1998a; Huynen & Bork, 1998), exploring the origins of eukaryotes (Ragan & Gaasterland, 1998), inferring genome phylogenies (Snel et al., 1999; Clarke et al., 2002) or dynamics (Dagan & Martin, 2007) and examining the distribution of genomes in the biosphere (Chaffron et al., 2010). Typically in studies of this nature, each genome is treated as a set of gene (or protein) sequences and these are compared pairwise in all combinations, for example using blast (Altschul et al., 1990) or ssearch (Pearson et al., 1991), to yield a matrix of match scores. This matrix can be filtered, clustered and/or otherwise analysed to identify profiles or groups. Groups of sequences delineated in this way can then be aligned, or the underlying pairwise match scores used directly, for phylogenomic inference of gene or protein trees or networks (Beiko et al., 2005; Ge et al., 2005; Kunin et al., 2005; Dagan et al., 2008). With certain caveats, conflicting (topologically incongruent) phylogenetic signals can be interpreted as prima facie evidence of LGT. The breadth and detail of GECs identified by these multigenome approaches seem to be limited only by the input data, and by computability.

The first such multigenome analyses focused only on sets of chromosomal genes. Kunin et al. (2005) found most gene flow to follow a tree of vertical inheritance, but with numerous tiny ‘vines’ of LGT entangling its branches; they identified species of Bradyrhizobium, Erwinia and Pirellula as most involved in lateral gene exchange with other genomes, and the reduced genomes of Chlamydia, Rickettsia and Treponema as among the least connected. Closely related organisms were more connected laterally than distant ones, probably because incoming DNA could be integrated via homologous gene replacement. Beiko et al. (2005) found that the extent of LGT was usually much greater within than between high-level taxa; that the Alpha-, Beta- and Gammaproteobacteria (perhaps not coincidentally the best-represented genomes in their analysis) were particularly active in gene exchange; and that ecologically versatile groups such as cyanobacteria, and species such as P. aeruginosa and Ralstonia solanacearum, were the most involved in lateral transfer. These investigators identified substantial exchange between an ancestor of Y. pestis and the common ancestor of E. coli and Salmonella.

With increasing numbers of genomes available for analysis, data management, computation and visualization have become more challenging, necessitating tradeoffs between coverage and resolution. All large-scale analyses so far, for example, have used genes (or proteins) as the unit of analysis, although many genes are mosaics of regions with conflicting evolutionary histories (Chan et al., 2009). In visualizing their results, Beiko et al. (2005) and Kunin et al. (2005) aggregate by taxon, thereby losing the details of the actual GECs. One way forward is to focus on a specific type of gene, for example transposases that transfer via IS elements (Hooper et al., 2009). Working with annotations of bacterial species as generalists (found in multiple habitats) or specialists (aquatic, marine, soil or living in a host), Hooper and colleagues found that most (but not all) lateral movement of transposases has taken place within a habitat type, and that generalists often had a narrow range of exchange partners. Streptococcus pneumoniae and two Bacillus species were identified as bridges between groups of taxa that do not exchange directly with each other.

The same computational approach has been used to map exchange networks among vectors. Comparing 47 completely sequenced plasmids from strains of Escherichia, Shigella and Salmonella, Brilli et al. (2008) found that plasmids rarely form tight clusters by host species, but instead show complex evolutionary histories that reflect ‘massive’ LGT and gene rearrangement. A single GEC may extend across these three genera, while E. coli and Shigella form a subset within which certain plasmids and genes are shared uniquely. Transposases are among the most highly connected proteins. Focusing on antibiotic resistance, Fondi & Fani (2010) analysed 5030 resistance-associated sequences encoded in 956 plasmids representing 364 organisms and 134 distinct bacterial genera, using a normalized similarity ratio approach (cf. Clarke et al., 2002) to identify prima facie instances of LGT. As intrageneric LGT could not confidently be distinguished from vertical transmission, their unit of analysis became the bacterial genus. In this analysis, most plasmid proteins are seen to be connected to the rest of the network via only a few links; relatively few are highly connected, and these represent most main functional classes. More than half of these bridging proteins are from bacteria (especially Staphylococcus) normally associated with a eukaryotic host; fewer come from bacteria living in multiple habitats (Corynebacterium, Enterococcus) and the fewest from soil or water bacteria.

Lima-Mendez et al. (2008) used a similar computational approach to reconstruct a phage network. In this analysis, phages were also revealed as extensively mosaic due to successive LGT. These investigators grouped phage proteins with similar phylogenetic profiles into evolutionarily cohesive modules; among temperate phages, the proteins in each of these modules typically contribute to a common function (e.g. replication), whereas among virulent phages, each evolutionary module tends to be functionally heterogeneous.

It remained for Halary et al. (2010) to move beyond the chromosome/vector dichotomy, using pairwise blast to compare 119 381 families representing 98 bacterial and nine archaeal chromosomes, four protistan nuclear genomes and 165 529 phage, plasmid and environmental virome sequences – 578 527 sequences in all. The resulting similarity network is partly disconnected: at this level of resolution, it is not the case that ‘everything talks to everything’. The network is highly structured by vehicle type: over some 98.5% of the network chromosome is connected to chromosome, plasmid to plasmid or phage to phage: thus, ‘when a DNA family enters a type of DNA vehicle … it mainly evolves in it’. In general, plasmids, not viruses, mediate LGT among bacterial chromosomes.

As the sequence similarity threshold is decreased, networks of increasingly ancient transfers come into view. At 100% identity – the most recent transfers – strains of Legionella, Xanthomonas and Yersinia share DNA with plasmids, whereas Streptococcus preferentially exchanges with phage. At 85% identity, a cluster of plasmids unites Xanthomonas, Prochlorococcus, Synechococcus, Streptococcus, Rhodopseudomonas, Burkholderia and Yersinia, while other plasmids mediate transfer between Burkholderia/Xanthomonas and Streptococcus. In general, DNA tends to reside long-term in plasmids, but short-term in phages.

Halary and colleagues identify 106 central nodes that bridge regions of the similarity graph that are locally well connected, but relatively isolated from each other. Most of these central nodes represent plasmids, i.e. plasmids are key in redistributing genetic material between GECs. These plasmids tend to be phylogenetic mosaics; many carry drug- and/or metal-resistance determinants that likely contribute to their ability to be successful in different taxa and/or habitats. What taxa and habitats do these nodes link? Through the generosity of Halary and colleagues, we were able to examine the underlying data (E. Skippington & M.A. Ragan, unpublished data). Among the 60 most central plasmids, 48% are annotated as resident in Gammaproteobacteria, 32% in Firmicutes, 7% in other Proteobacteria and 7% in Actinobacteria. Although these proportions may, to some extent, simply reflect taxon coverage, they also suggest key roles for Proteobacteria and Firmicutes in acquiring and redistributing genes between otherwise genetically separate communities.

While most of these highly central nodes are plasmids, phages and chromosomes are also represented, particularly at lower sequence-identity thresholds. We introduce the term central-node neighbourhood to describe the set of nodes directly connected to a central node (excluding the central node itself). In this dataset, central nodes that are plasmids most frequently connect to other plasmids. Of 84 instances (60 unique plasmids, some neighbourhoods that vary by identity threshold), more than half have neighbourhoods composed exclusively of other plasmids. In contrast, phage and chromosomal central nodes more frequently connect to a different type of vehicle, although the numbers are small. Regardless, the Halary and colleagues data contain examples of plasmid, phage and chromosomal central nodes connected to all other vehicle types: no vehicle type invariably presents an impermeable barrier to dissemination.

The central-node neighbourhoods in the Halary and colleagues dataset are taxonomically diverse. Indeed, we find neighbourhoods that differ at every level of taxonomic rank. One that differs at the rank of phylum, for example, contains nodes that all fall within the same domain, but represent more than one phylum. Of 106 neighbourhoods, only 10 contain nodes that all belong to the same order; of these 10, only seven contain nodes that all belong to the same family. Only three neighbourhoods contain nodes that all belong to the same genus and in only one do all nodes represent the same species. Thus, even among central nodes that connect the least diverse neighbourhoods, transfer is almost always intergeneric or greater. If we accord viruses a separate domain, 35% of central-node neighbourhoods have members from two or more biological domains. These central nodes construct exchange communities that are usually orthogonal to, and inter-relate, accepted taxa.

Summary and prospectus

Given the ubiquity of DNA and phages in natural environments, the diversity of transfer mechanisms and the extent of the variable gene set in many bacterial genomes, the microbial biosphere may at some minimal level constitute a single exchange community. At a finer scale, of course, the microbial world is highly heterogeneous both with regard to the groups of bacteria and vectors that exchange at appreciable frequency and the specific genetic material that circulates therein. We began this review by setting out a framework, based on abstraction of genetic exchange as a graph, within which criteria for delineating GECs arise naturally (GECs: conceptual framework and parameters). We defined a GEC as a set of entities, each of which has over time both donated genetic material to, and received genetic material from, every other entity in that GEC, via LGT. This definition avoids biome-scale GECs while not setting impossibly high evidentiary standards.

At each successive step, LGT offers opportunities and barriers that can be differentially exploited: DNA uptake, phage and plasmid host range, plasmid exclusion, restriction–modification and CRISPR/Cas systems, combinatorial association with systems for transposition and recombination, homologous and illegitimate recombination, gene silencing, integration into genetic regulatory and biomolecular interaction networks, the cost–benefit equation on host fitness, genetic linkage with selectable traits and host population structure. GECs are constructed through the contingencies and stochasticities of their interplay in dynamic environments.

Much remains to be understood about the number, structure, dynamics and inter-relationships of GECs as they exist in natural habitats and in clinical, agricultural and other settings. To the extent known, GECs can vary widely in spatial extent, taxonomic diversity, density of internal connectivity and involvement of vector types. Determinants that may be benign in one part of the GEC may be pathogenic in another. Large-scale computational analyses confirm that LGT can be successful at different levels of granularity, from physically proximate exchange among closely related strains to long-distance transfer crossing biological domains. Plasmids are key agents of transfer within and across many GECs. New DNA sequencing technologies are poised to increase the available data on real bacterial communities – both natural and in clinical settings – by orders of magnitude. From this will arise more complete maps of the highways, roads, streets and footpaths of DNA transfer within and among GECs. These maps, converted to computational models, will guide us toward being able to reduce or block the spread of antibiotic resistance and other unwelcome traits.


Re-use of this article is permitted in accordance with the Term and Conditions set out at http://wileyonlinelibrary.com/onlineopen#OnlineOpen-Terms


We acknowledge the support of the Australian Research Council grant CE0348221. E.S. is supported by an Australian Postgraduate Award and a Queensland Government Smart State PhD Scholarship.


  • Editor: Fernando Baquero

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs licence (http://creativecommons.org/licenses/by-nc-nd/3.0/) which permits non-commercial reproduction and distribution of the work, in any medium, provided the original work is not altered or transformed in any way, and that the work is properly cited. For commercial re-use, please contact journals.permissions@oup.com


View Abstract