OUP user menu

A classification scheme for mobilization regions of bacterial plasmids

M. Victoria Francia, Athanasia Varsaki, M. Pilar Garcillán-Barcia, Amparo Latorre, Constantin Drainas, Fernando de la Cruz
DOI: http://dx.doi.org/10.1016/j.femsre.2003.09.001 79-100 First published online: 1 February 2004

Abstract

Transmissible plasmids can be classified according to their mobilization ability, as being conjugative (self-transmissible) or mobilizable (transmissible only in the presence of additional conjugative functions). Naturally occurring mobilizable plasmids carry the genetic information necessary for relaxosome formation and processing, but lack the functions required for mating pair formation. Mobilizable plasmids have a tremendous impact in horizontal gene transfer in nature, including the spread of antibiotic resistance. However, analysis of their promiscuity and diversity has attracted less attention than that of conjugative plasmids. This review will focus on the analysis of the diversity of mobilizable plasmids. For this purpose, we primarily compared the amino acid sequences of their relaxases and, when pertinent, we compared these enzymes with conjugative plasmid relaxases. In this way, we established phylogenetic relationships among the members of each superfamily. We conducted a database and literature analysis that led us to propose a classification system for small mobilizable plasmids in families and superfamilies according to their mobilization regions. This review outlines the genetic organization of each family of mobilization regions, as well as the most relevant properties and relationships among their constituent encoded proteins. In this respect, the present review constitutes a first approach to the characterization of the global gene pool of mobilization regions of small mobilizable plasmids.

Keywords
  • Bacterial conjugation
  • Mobilizable plasmids
  • Relaxases
  • Horizontal gene pool

1 Introduction

Identification and classification of plasmids has attracted considerable scientific and technological attention because of several reasons: (a) analysis of their distribution in nature and their relationship to host cells, (b) discovery of their genetic relatedness and evolutionary origins and (c) analysis of horizontal gene transfer, a process with tremendous impact in risk assessment of the release of genetically modified organisms. A plasmid classification scheme should be based on genetic traits that are universally present and constant. One of the loci always present in plasmids is the basic replicon. Not surprisingly, conjugative plasmids were classified classically according to their incompatibility, a property directly related to replication (for review see [1]). However, the number of known plasmids increased exponentially, making incompatibility testing not feasible anymore, if only because many plasmids cannot reside in a universally common host. Furthermore, today we know that a single base change in the element conferring incompatibility (i.e., an antisense RNA) in two closely related plasmids may render them compatible [2]. Thus, an alternative classification scheme was widely accepted that used similarity of plasmid-encoded replication (rep) region DNAs as the classificatory rule [1]. This approach is used with relative success, and was extended by direct DNA sequencing of rep regions, which uncovered some remote relationships among previously considered unrelated systems [24]. Despite it being considered an essential criterion for plasmid classification, rep regions are frequently unrelated since bacterial plasmids found several biochemical solutions to the problem of plasmid replication. Besides, many plasmids contain multiple, or even recombinant rep regions, a situation that complicated matters even further (see above reviews for more details).

During bacterial evolution, the ability of bacteria to exploit new environments and to respond to new selective pressures can often be more easily explained by the acquisition of new genes by horizontal transfer rather than by sequential modification of gene function following accumulation of point mutations. For example, the spread of antibiotic resistance and xenobiotic degradation genes is a consequence of horizontal gene transfer coupled to selective pressures caused by the presence of increasing amounts of these substances in the environment (for reviews see [5, 6]). Gene transfer takes place by transformation, transduction and conjugation. Bacterial conjugation is a unique process that allows the transfer of plasmid DNA from a donor to a recipient through cell-to-cell contact. It has been observed in all kinds of bacteria and occurs even between bacteria of different divisions, for instance between proteobacteria and gram-positive bacteria, cyanobacteria, etc [711]. Conjugative plasmids encode a self-sufficient conjugative transfer system and have monopolized the attention of several studies [1215], in contrast to small mobilizable plasmids for which no comparative reviews are presently available. Mobilizable plasmids usually carry a mobilization region (mob) encoding specific relaxosome components and the origin of transfer (oriT). Since plasmid mobilization is an almost universal procedure for gene spread among bacteria, a classification of plasmids according to their mobilization properties could be universal and will be of paramount importance, since it could provide a more suitable alternative to classification by rep regions. Analysis of the diversity of several classes of mobilizable plasmids was previously dealt with in separate reviews: RSF1010 [11], ColE1 [12], pMV158 [16, 17] and CloDF13 [18].

Relaxases are the crucial relaxosome component for initiation of DNA transfer in both conjugative and mobilizable plasmids. Similarities among relaxases encoded by different conjugative systems were found early, and suggested a shared DNA relaxation mechanism. According to those similarities, three common motifs were defined [12, 19]. Motif I contains the catalytic Tyr residue involved in DNA cleavage-joining activity (remaining temporally covalently linked to the 5-terminus of the nic site). Motif II was reported to be involved in DNA–protein contacts through the 3 end of the nic region and a Ser residue is usually present [20]. Motif III contains three conserved His residues and is known as 3H motif. The His residues are proposed to help to the nucleophilic activity of the Tyr residue in Motif I by coordination of the required Mg2+ ions and direct activation of the active Tyr [20]. These three motifs would form part of the catalytic centre of the relaxase, although the precise cleavage mechanism remains unknown. Interestingly, rolling-circle replication initiation proteins contain motifs analogous to relaxase Motifs I and III, albeit in inverse orientation [16, 21].

In this study, we use the amino acid sequences of plasmid relaxases as the classification criterion. We suggest that most mobilizable plasmid relaxases are evolutionarily related, and thus we propose a broadly valid classification scheme of small mobilizable plasmids according to the similarity of their relaxases and the phylogenetic relationships among them. Besides, we review the information available on the mobilization properties of each of the four main mobilizable plasmid superfamilies, as well as their relationships to well studied conjugative plasmids. We believe that this method of classification systematizes most mobilizable plasmids known so far, and will offer a useful tool for the rapid and accurate classification of new mobilizable plasmids.

2 Classification of small mobilizable plasmids

To attempt a new classification of small mobilizable plasmids, we conducted a database and literature analysis using the amino acid sequence similarity of their relaxases as a first criterion. As prototype plasmids we used RSF1010, ColE1, CloDF13 and pMV158, being among the best characterized mobilizable plasmids and taking into consideration the knowledge of their mobilization properties and functions. Additionally, their relaxases had been purified and characterized in vitro, allowing a comparison of their properties.

In order to classify mobilizable plasmids, we used the iterative PSI-BLAST program [22], setting a score value of P=0.0001 (except otherwise indicated) as a threshold for successive iterations. This program has been successfully used for the analysis of many protein superfamilies [22]. Then, CLUSTAL W [23] was used to align the selected relaxases. Since plasmid relaxases are often large multidomain proteins, only the N-terminal 300 amino acids, corresponding to the relaxase domain, were used in all analysis. Neighbour-joining phylogenetic analysis [24] with bootstrap values (1000 replicates) [25] were carried out using the software Molecular Evolutionary Genetics Analysis Mega version 2.1 (http://www.megasoftware.net/) [26, 27]. Pairwise matrices genetic distances were calculated using p-distance parameter. All figures and tables were updated as to November 2002.

3 Main superfamilies of small mobilizable plasmids

3.1 The MOBQ family

Plasmid RSF1010, the prototype of this family, has received considerable attention because of its extremely broad host range [28, 29]. Plasmid R1162, although isolated from a different host (Pseudomonas aeruginosa instead of Escherichia coli), is nearly identical in sequence [28] and has been also the subject of considerable analysis (see below and [12]). An important characteristic of RSF1010 and probably of many other plasmids from this family is that they can be mobilized efficiently by helper plasmids from different incompatibility groups, such as IncIα, IncM, IncX and particularly by the broad host range IncP plasmids [30]. RSF1010 mobilization can be brought about by chromosomally located Tra functions, such as the icm-dot macrophage killing virulence system of Legionella pneumophila. By using helper plasmids, IncQ plasmids could be mobilized to a large number of hosts, including many phyla of gram-negative and gram-positive bacteria, as well as yeast, plant and animal cells [11]. A PSI-BLAST search starting with RSF1010 relaxase resulted in 44 non-redundant hits above the P=0.0001 cut-off after three iterations, when the search converged. All 44 hits corresponded to plasmid relaxases. From these, 23 corresponded to Tra-Ti relaxases (relaxases of the transfer region of Ti plasmids) [31, 32], six to large conjugative plasmids of gram-positive bacteria such as pIP501, pMRC01, pGO1 and pSK41 [3336], and finally 15 hits corresponded to relaxases of mobilizable plasmids. The corresponding proteins are shown in Table 1. Since no additional proteins appeared before the search converged, all these relaxases form a coherent protein family, as previously reported [11].

View this table:
Table 1

The MOBQ family

PlasmidBacterial sourceRelaxase
NameAccession No.Size (bp)Name% identity to RSF1010 MobAAccession No. or reference
RSF1010NC_0017408684Escherichia coliMobA100AAA26445
R1162M13380Pseudomonas aeruginosaMobA100[14]
pIE1130AJ27187910,687Environmental sampleMobA87CAB75594
pIE1115http://NC_002524/AJ29302710,687Environmental sampleMobA83NP_065281
pDN1NC_0026365112Dichelobacter nodosusMobA82NP_073212
pAB6AF1264825597Neisseria meningitidisMobA49http://AAD31795.1
pSC101NC_0020569263Salmonella thyphimuriumMob46P14492
pPNC_0034554301Salmonella enteritidisOrf138http://NP_604396.1
pXF5823AF3229085823Xylella fastidiosaMobA36AAK13432
pDOJH10SNC_0042533661Bifidobacterium longumMob32http://NP_694605.1
pTF1X526996700Thiobacillus ferrooxidansMobL31P20085
pMG160NC_0027743431Rhodobacter blasticusMobL30NP_775695
pKJ50U766144960B. longumMobA30http://AAD00257.1
pKJ36NC_0026353625B. longumMobB28http://NP_072178.1
pDOJH10LNC_00425210,073B. longumMobA27http://NP_694596.1

The amino acid sequences of the relaxase domain of the proteins shown in Table 1 were aligned and their most conserved features are shown in Fig. 1. As seen in the figure, RSF1010 family relaxases display the three relaxase common motifs with some invariant amino acids, including the catalytic Tyr in Motif I and the three His in the 3H Motif III. Interestingly there is not conserved Ser in Motif II. Instead, we have found (see also discussion of other families) that most relaxases conserve either a Glu (as in this case) or an Asp residue in Motif II. From this alignment, a phylogenetic tree was obtained as shown in Fig. 2, which contains also some MOBP relaxases (the prototype is conjugative plasmid RP4 relaxase TraI) for comparison. MOBQ relaxases form a well-differentiated clade, which verify the existence of the previously reported MOBQ and MOBP families. Interestingly, the monophyletic MOBQ family contains also relaxases from gram-positive conjugative plasmids such as pGO1 or pMRC01 and the Tra regions of the autotransmissible Agrobacterium and Rhizobium Ti plasmids (underlined in Fig. 2).

Figure 1

Amino acid sequence alignment of representative MOBQ family relaxases. The CLUSTAL W alignment considered only the 300 N-terminal residues of each protein, containing the relaxase domain. MOBP plasmids RP4 and pRA2 are also included for comparison. Red on yellow=invariant amino acids. Blue on blue=strongly conserved. Black on green=similar. Green on white=weakly similar. Black on white=not conserved.

Figure 2

Phylogeny tree of MOBQ family relaxases. Names of the relaxase-containing plasmids are shown to the right of each branch. IncP (RP4) and IncQ (pRA2, pTF-FC2 and pTC-F14) MOBP plasmids are shown for comparison. Bootstrap percentages (over 50%) are shown adjacent to the node being considered. Clades including MOBQ and MOBP plasmids are shadowed in grey. The moment of acquisition of additional mob genes is indicated on the tree. The taxonomical family (or genus when family was unclassified) of the plasmid host is indicated by bars on the right. Relaxases from some conjugative plasmids (not included in Table 1) are also analyzed in this figure for better understanding, and are shown underlined. Accession numbers of relaxases from conjugative plasmids: pTiC58: S11839; pGO1: http://AAB09712.1; pMRC01: T43077; pNGR234: AAB91648; RP4: Q00191; R751: NP_044272.

3.1.1 MOBQ relaxases

MOBQ relaxases have a common domain structure consisting of an N-terminal relaxase domain and a C-terminal primase domain, as reported for RSF1010/R1162 [37, 38], with the exception of plasmid pTF1 [39]. The primase domain is active in vegetative plasmid replication but apparently is also required for optimal transfer [40, 41]. Linkage between both domains would appear to promote an efficient initiation of complementary strand synthesis in the recipient cell [40, 41].

It is noteworthy that, within the MOBQ family, relaxases of conjugative plasmids do not constitute an isolated phylogenetic group, but are embedded within those of mobilizable plasmids. This observation suggests that relaxase genes diversified by being incorporated to both conjugative and mobilizable systems, without this property being the primary evolutionary divergence force. Besides, as shown in Fig. 2, MOBQ-family plasmids are distributed in several classes of the Phylum proteobacteria as well as in actinobacteria. Evolutionary more recent monophyletic groups are usually restricted to a taxonomical class or even a family. For example, the clade enclosing pIE1130, pIE1115 and pDN1 has family Cardiobacteriaceae as host. If plasmid RSF1010 relaxase is included in the previous clade, the host range is broader, including family Enterobacteriaceae. Going one further step back, next clade (comprising also pAB6, pP and pSC101) is not restricted to the γ-proteobacteria class, but includes species of different bacterial classes. Thus, it seems that the broad host range of the plasmids belonging to this family [11] has contributed to the spread of the mobilization region. Nevertheless, this promiscuity is not unbounded, since recent clades are usually confined to lower bacterial divisions. Thus, it takes a considerable time (in an evolutionary scale) for a mobilizable plasmid to colonize and become prevalent in a new bacterial host.

3.1.2 Mobilization region organization

Three genes (mobA, mobB and mobC) comprise the RSF1010/R1162 mobilization region (Fig. 3). Genes mobA and mobB are overlapping, while mobC is transcribed in the opposite direction [37, 39, 42, 43] and at the other side of oriT. Promoters were located by S1 mapping in the inter-cistronic region between the divergently transcribed genes. Derbyshire et al. [43] proposed that these promoters may be regulated by MobA and MobC binding to the oriT site. There are many exceptions within the MOBQ family to this basic organization that diverge in the number of genes composing the mobilization region (Fig. 3). Only two mob genes are present in the Mob region of plasmids pSC101, pTF1 and pMG160, while just one mob gene is present in pAB6, pKJ50 or pDOJH10S. The only shared gene that is always present is the relaxase-encoding gene (mobA). Interestingly, Motif III of MOBQ family relaxases contains additional invariant amino acid residues, forming the consensus sequence (N/Q)xHxHxxxxR (Fig. 1). It should be remembered that the MOBQ family includes the relaxases from most large gram-positive conjugative plasmids (such as pIP501, pMRCO1, pGO1 and pSK41) and those of the Agrobacterium Ti plasmids [3136].

Figure 3

Genetic organization of MOBQ family mobilization regions. All DNA sequences were individually inspected and the corresponding orfs annotated in the figure. Orfs are indicated by arrowed boxes filled in different textures to indicate families of homologous proteins. Name of the respective genes are indicated above the arrow. Experimentally determined oriT=black arrowhead. Putative oriT=black vertical line.

3.1.3 In vitro characterization

MobA, MobB and MobC from plasmid RSF1010 were purified by Scherzinger et al. [44], while plasmid R1162 MobA was purified by Bhattacharjee and Meyer [45]. The corresponding relaxases were used to reproduce nicking reactions in vitro [37, 44, 45] and showed to promote strand- and site-specific cleavage within their respective oriT site [45]. MobA is able to form a specific complex with oriT DNA, which is remarkably stable, with a half-life of approximately 95 min [46]. Although MobA was self-sufficient for cleavage of an ssDNA substrate [37], it required MobC as an accessory protein for dsDNA nicking activity and was stimulated by the presence of MobB [44]. The demand of a divalent cation could be fulfilled by either Mg2+, Mn2+, Ca2+ or Ba2+ [44]. MobC assists in strand opening, extending the separation on the DNA strands around nic, and thereby increasing the efficiency of cleavage [47]. R1162 is mobilized at a low frequency in the absence of MobC [47]. A third mob-encoded protein, MobB, is also required for efficient mobilization, as it increases the proportion of plasmid molecules nicked at oriT and stabilizes the relaxosome [48]. A model was published, in which MobB regulates expression of R1162 genes by altering the stability of the relaxosome, in a manner that involves the coupling of plasmid molecules [49].

3.1.4 Origin of transfer

The oriTs of MOBQ family plasmids (Fig. 4) consist of no more than 38 bp of DNA, characterized by a 10 bp imperfect inverted repeat sequence (IR) and an adjacent AT rich region [43]. The exact position of the nic site was determined by primer extension and site-specific recombination studies and found to lie between nucleotides 3138/3139 on the RSF1010/R1162 DNA sequence (Fig. 4), adjacent to the IR sequence [44, 50]. In vitro, MobA binds tightly to the oriT strand that is transferred but poorly to the complementary strand, or to linear, duplex oriT DNA. The IR and the adjacent TAA sequence are required for MobA to shift the mobility of oriT DNA during electrophoresis, and they are also necessary for efficient termination of a round of transfer. It is the structure of the IR, rather than a particular sequence, that is important for binding MobA to single stranded oriT [51]. In contrast, only the IR arm proximal to the cleavage site is required for cleaving oriT in the relaxosome. oriTs from other MOBQ plasmids, such as the gram-positive conjugative plasmids above mentioned, are also homologous to the RSF1010/R1162 oriT, showing a greater level of divergence in sequence within the IR than in the sequence more proximal to the cleavage site [35]. It appears that the IR sequence provides additional specificity in the interaction of the oriT with MobA [52].

Figure 4

Alignment of MOBQ family nic sites. (a) OriT region of plasmid RSF1010. The characteristic IR is illustrated by arrows above the nucleotide sequence. (b) Comparison of the nic sites of representative MOBQ plasmids. Experimentally determined nic-cleavage sites are indicated with vertical arrows. Nucleotides conserved in at least 50% of the sequences are shown in black.

3.2 The ColE1-superfamily

The second group of mobilizable plasmids forms a superfamily that comprises members found in gram-negative organisms along with members from gram-positive bacteria. It has the well-known ColE1 plasmid as a prototype. ColE1 is a 6650 bp, multicopy, colicinogenic plasmid, originally found in E. coli and widely used for the construction of bacterial cloning vectors. It can be mobilized by members of many incompatibility groups, including IncIα, IncFI, IncW and IncP [53].

A PSI-BLAST search starting with ColE1 relaxase resulted in 76 non-redundant hits above the P=0.0001 cut-off, when the search converged after nine iterations. All 76 hits corresponded to relaxase proteins. From these, five corresponded to relaxases of large conjugative plasmids of Lactococci, such as pRS01 [54, 55], 14 to diverse conjugative and mobilizable transposons (e.g., Tn1549, Tn5252 or Tn4399) [5658], eight to relaxases of IncP and IncI conjugative plasmids [5961], six to the VirD2 proteins in the Vir-Ti plasmid subfamily [62, 63], and finally 43 hits corresponded to relaxases of mobilizable plasmids. Proteins of this last group are shown in Tables 2 and 3. Since no additional proteins appeared before the search converged, these relaxases form a coherent protein family. It is important to underline that relaxases of the ColE1-superfamily intersect with MOBP family relaxases and thus link both superfamilies to a common origin.

View this table:
Table 2

The ColE1 superfamily: MOBHEN family relaxases

PlasmidBacterial sourceRelaxase
NameAccession No.Size (bp)NameAccession No.% identity to ColE1 MbeA
ColE1NC_0013716650Escherichia coliMbeACAA33883100
pUB2380AJ0080068500E. coliMob3CAA0782897
pCAY0792015269Salmonella enteritidisMbeAhttp://AAL87017.197
pLG13AF2512896293E. coliMobAAAG1812896
pECL18Y168975571Enterobacter cloacaeMbeACAA7652396
pWQ799L397946915Salmonella entericaMbeAAAC9839895
pSFD10NC_0030794091Salmonella choleraesuisMobAhttp://NP_203138.194
pCOLD-157Y104126675E. coliMbdACAA7143577
ColKM29821E. coliMobAS0478975
pKNC_0034564245S. enteritidisMbeAhttp://NP_604399.171
ColANC_0013736720Citrobacter freundiiMobAS0479069
pEC3D451883000Erwinia carotovoraMobAJC472769
pSW200L425254367Erwinia stewartiiMobAAAA6949869
pUCD5000AF0228065229Pantonea citreaMobANP_04680766
pAsal3AJ5083845249Aeromonas salmonicidaMobAhttp://CAD48436.158
p11184AJ2496445804Plesiomonas shingelloidesMobACAB5651556
pHE1AJ2437354200Halomonas elongataMobAhttp://CAB96959.156
pJD1NC_0013774207Neisseria gonorrhoeaeCppCP0704750
pYFC1M837174231Pasteurella haemolyticaMbeAD5664946
pRAYNC_0009236076Acinetobacter sp.Orf3NP_04945946
pMHSCS1NC_0026374992Mannheimia haemolyticaMobANP_07321943
pUB6060AJ2496445804Plesiomonas shingelloidesMobAhttp://CAB56515.142
pIG1NC_0017745360Pasteurella multocidaMbeAyNP_05447541
pPvu1NC_0026324675Proteus vulgarisMobAhttp://NP_072076.141
pHel4AF46911210,970Helicobacter piloryOrf4Chttp://AF469112.331
pLQ510AF12981112,082Moraxella catarrhalisOrf3AAF6181130
View this table:
Table 3

The ColE1 superfamily: MOBP family relaxases

PlasmidBacterial sourceRelaxase
NameAccession No.Size (bp)Name% identity to ColE1 MbeAAccession No.
pAsal2AJ5083835424Aeromonas salmonicidaMobA27http://CAD48428.1
pAsal1AJ5083826371A. salmonicidaMobA25http://CAD48422.1
pSK639U402598013Staphylococcus epidermidisOrf33718AAC18947
pC221NC_0021294600Staphylococcus aureusOrfA16http://NP_052695.1
pS194X066274397S. aureusRlx16P12054
pIP1630AF04547214,400S. epidermidisMobA216AAD02405
pC223X128311823S. aureusOrf116http://CAA31314.1
pMD136NC_00127719,515Pediococcus pentosaceusORF615http://NP_037562.1
pIP1629AF0452407500S. epidermidisMobA115AAD02378
pAH82AF24338320,331Lactococcus lactisMobDEI15AAF98309
pNZ4000NC_00213742,180L. lactisMobA15NP_053037
pFN1AF1592495887Fusobacterium nucleatumRlxA14AF159249_1
pCI528L086012006L. lactisOrf214AAB28188
pRAS3NC_00312411,823A. salmonicidaRepB/MobA14http://NP_387464.1
pRA2U8808832,743Pseudomonas alcaligenesMobA14AAD40339
pTC-F14NC_00473414,155Acidithiobacillus caldusMobA13NP_835376
pTF-FC2M577175317Thiobacillus ferrooxidansMobA10AAA27389

The amino acid sequences of the relaxases shown in Tables 2 and 3 were aligned and their most conserved features are shown in Fig. 5. ColE1-superfamily relaxases display the catalytic Tyr in Motif I and the conserved Ser plus either Glu or Asp residues in Motif II as shown in the figure. Interestingly, regarding to amino acid residues conserved in Motif III, ColE1-superfamily relaxases could be divided into two families; one family of relaxases possessing the recently described HEN motif [64] (Table 2 and Fig. 5(a)) and a second family possessing the 3H motif (Table 3 and Fig. 5(b)). These two families are also clearly differentiated on the basis of amino acid sequence similarity as illustrated by the three motifs shown in Fig. 5. According to the phylogenetic tree obtained for this relaxase superfamily (Fig. 6), relaxases belonging to the HEN-family are clustered in a unique monophyletic group (grey shadowed in Fig. 6). It would appear that the canonical histidine triad is the original Motif III, being the acquisition of HEN motif a more recent event that could have taken place at * in the tree of Fig. 6. This idea is supported by experiments carried with mutant ColE1 relaxases, where the double substitution of ColEI essential amino acids E104 and N106 of the HEN motif by histidines, did not abolish the relaxase activity [64].

Figure 5

Alignment of representative ColE1-superfamily relaxases. Relaxases belonging to MOBHEN (a) and MOBP (b) families were aligned separately. Codes are the same as in Fig. 1.

Figure 6

Neighbour-joining dendrogram of aligned ColE1-superfamily relaxases. Names of the relaxases-containing plasmids or transposons are presented to the right of each branch. Conjugative plasmids and transposons are underlined. Bootstrap percentages (over 50%) are pointed out above the relevant branches (or arrows). The monophyletic group containing motif HEN is grey shadowed. Branches including different kinds of mob region organization are called A–K (as in Fig. 7), and are encompassed by bars on the right. Taxonomical family and phylum (in bold letters) of the plasmid/transposon host are respectively indicated by bars and keys, on the right. * and • denote ancestors that are referred to in the text. Probable acquisition and loss of certain genes (related only to mobilizable plasmids) is specified above some branches. Accession numbers of relaxases from conjugative plasmids and transposons: Tn1549: AAF72355; RP4: Q00191; R64: B38529; ColIb-P9: http://NP_052501.1; pTiC58: S11839; pTi-SAKURA: NP_053396;Tn5252: NP_358551; pRS01: AAB06502.1.

Furthermore, the same classification of HEN and 3H families as derived from the amino acid sequences might be inferred from the organization of the mobilization regions (shown in Fig. 7) and the alignments of their putative oriT sites (see Fig. 8) as will be discussed below.

Figure 7

Genetic organization of ColE1-superfamily mobilization regions. Codes follow the conventions of Fig. 3. Gray=genes with no similarity to ColE1 mob genes. Letters in the right of the figure indicate the different genetic organizations and are used as indicators in Fig. 6.

Figure 8

Alignment of ColE1-superfamily nic sites. (a) OriT region of plasmid ColE1. (b) Comparison of the putative nic sites of some of the MOBHEN plasmids. (c) Comparison of the nic sites of the lactococcal plasmids pNZ4000 and pCI528. The boxed area indicates the conserved hexanucleotide proposed as the putative nic site for these plasmids. The IR homologous to the one present close to the nic site in R64 is also shown. (d) Comparison of the nic sites of representative MOBP plasmids as described by [11]. Codes as in Fig. 4.

3.2.1 MOBHEN family

ColE1 plasmid is the best characterized member of this family. It can be isolated as a “relaxation complex” (produced by loss of supercoiling of in vivo formed relaxosomes), comprising three main proteins of 60, 16 and 11 kDa, probably encoded by mbeA, mbeB and mbeC genes, respectively [6567]. The ColE1 relaxation complex was the first to be described and is formed even in the absence of a conjugative plasmid [68].

3.2.1.1 Mobilization region organization

Genetic studies confirmed that ColE1 mobilization region is composed of five genes (mbeA, mbeB, mbeC, mbeD and mbeE) two of which (mbeB and mbeD) entirely overlap a third one (mbeA). Four of the mob genes (mbeA, mbeB, mbeC and mbeD) are essential for plasmid mobilization, while the fifth (mbeE) plays no essential role [67]. As shown in Fig. 7, a majority of the plasmids belonging to the MOBHEN family show conservation in gene organization with the mobilization region of ColE1, indicating a common origin. The only exception is mbeE, which is not found as a complete orf in most sequences. Since it does not apparently play any role in mobilization, it was not analyzed further. The phylogenetic tree (Fig. 6) obtained for the aligned relaxases is also in agreement with the observed similarity in mob region organization. Therefore, more related relaxases form part of highly similar mobilization regions. Such similarity is not restricted to the general organization of mob, but extends to sequence similarity of the corresponding Mob proteins. Thus, parallel BLAST searches were carried out with proteins MbeC, MbeB and MbeD. Results indicate a wide distribution of MbeC-like proteins among the ColE1 superfamily, appearing in both HEN and 3H families. Results obtained for MbeB and MbeD, are shown in Fig. 7, which contains an updated gene organization for many ColE1-like plasmids. MobC and MobA phylogenetic trees are highly congruent, underscoring the common phylogeny of both genes and thus suggesting the existence of an ancestral ColEI mob region encoding both MobA and MobC-like proteins. The original operon could have acquired new genes (look organization A, B, C, D and I in Figs. 6 and 7), and/or lost MobC (look F). MbeB is conserved in the HEN family (A, B, C and D), while a new Mob protein is encoded by some plasmids belonging to the 3H family (I and J). Conjugative plasmids of the MOBP family belong to the K group. All plasmids containing genes mobA, mobB and mobC are enclosed in the HEN family (grey shadowed in Fig. 6), with the exception of pAsal1 and pAsal2 (belonging to the 3H family). The most parsimonious criterion is to consider that the ancestor of the HEN family (see the corresponding indication (*) in Fig. 6) had a mob region with three genes (mobA, mobB and mobC-like). Branches encompassed on group C retained the same structure, those comprised on group E lost mobB, and others like those grouped on groups A, B and D gained new genes.

Another important observation from Fig. 6 is that all plasmids containing the HEN motif belong to Phylum proteobacteria, most of them to Class γ-proteobacteria. Furthermore, most clades of this group enclose plasmids belonging to the same taxonomical family. This also reinforces the idea of the relatively recent acquisition of the HEN motif by divergence from the 3H motif, since plasmids containing the canonical triad are dispersed among a greater number of bacterial classes. There is also a correlation between mob region organization and host distribution of the plasmids. The more recent the acquisition of a gene, the more confined is the mob region type to a lesser number of bacterial families, indicating that horizontal transfer takes a long evolutionary time.

3.2.1.2 In vitro characterization

Although ColE1 MbeA protein did not contain the canonical 3H motif, it was considered to be a relaxase since a protein with similar molecular weight remained associated specifically with the nicked strand of the open circular ColE1 DNA after treatment with SDS [66]. Recently, MbeA was purified and in vitro experiments confirmed that it is indeed the ColE1 relaxase [64]. These studies also showed an optimum temperature of 45 °C for DNA cleavage and an absolute requirement for divalent cations. Mg2+, Co2+ and Ni2+ were accepted with similar efficiency [64]. Site-directed mutagenesis revealed four amino acids essential for maintenance of the relaxase activity, Y19, H97, E104 and N106, reflecting once more the importance of Motifs I and III in the relaxase catalytic centre. Therefore, MbeA contains a divergent Motif III, with a HEN signature instead of the 3H [12, 64].

The roles of MbeB and MbeC during ColE1 mobilization are unknown. An “entry-exclusion” function was proposed for MbeD since it could bind the inner membrane via its N-terminal amphipathic α-helix [69].

3.2.1.3 Origin of transfer

The only plasmid of this family for which in vitro studies have been performed is ColE1 [64, 67]. ColE1 nic site was located upstream of the mob region (Figs. 7 and 8(a)). Based on sequence similarity and location in the mobilization region, it was possible to assign putative oriT regions for other members of the HEN-subfamily (Fig. 8(b)). The clue was the high sequence similarity shown by a 100 bp region including the putative nic site and one IR located about 30 bp upstream from nic in all cases. Such IR could represent part of the relaxase binding site by analogy to the organization of nic in the MOBQ-family.

3.2.2 MOBP family

ColE1-3H family relaxases show low identity with ColE1 MbeA (15–20%). A majority show >30% identity among themselves, although there are significant exceptions, such as the Aeromonas relaxases. Concordant results were observed for the MbeC proteins, implying that the 3H-family is large and widely diverged. Since these 3H relaxases also appear in BLAST searches started with TraI of plasmid RP4 (the prototype of conjugative MOBP relaxases), we believe they should be included within the MOBP family. It is interesting to note that several members of this family were already included in the MOBP family just by sequence similarity of their respective relaxases [11, 12]. The main difference between ColE1-3H relaxases and conjugative MOBP relaxases is that the former are almost always associated to a MobC protein homologous to ColE1 MbeC. Four ColE1-3H plasmids (pTF-FC2, pRA2, pRAS3 and pTC-F14) deserve a special mention here. They are the only three mobilizable plasmids containing a MOB region with exactly the same gene organization that conjugative MOBP plasmids. Besides, two of them contain relaxases showing relaxase and primase domains, similar to MOBQ relaxases (see Rawlings and Tietze [11] for further details).

ColE1-3H relaxases show the signature Hx(D/E)…HxH in Motif III (Fig. 5(b)). This conserved acidic residue was implicated in RP4 TraI catalysis mechanism [20]. The same residue appears to be conserved also in the previously discussed MOBHEN family (Fig. 5(a)) and in the pMV158 superfamily (Fig. 9).

Figure 9

Alignment of pMV158-superfamily relaxases. Color codes as in Fig. 1. (a) Clade A. (b) Clade B. (c) Consensus Motifs I and III for all relaxases in pMV158 superfamily.

3.2.2.1 Mobilization region organization

ColE1-3H type mobilization regions show wider variety than the HEN-family, in agreement with a wider dispersion in relaxase sequences (see evolutionary distances in Fig. 6). Individual members differ in the number and orientation of the constituent genes, as already discussed (see Fig. 7). The only gene that is always present is the relaxase. Special mention deserves plasmid pMD136, whose mobilization region is composed by five genes. One of them, orf1, shows homology to conjugative coupling proteins (Fig. 7). Coupling proteins are only very rarely encoded by small mobilizable plasmids, with just another exception, CloDF13 plasmid, as we will discuss below [70].

Interestingly, pRAS3, pTF-FC2, pTC-F14 and pRA2 plasmids encode five genes in their mobilization region, mobA, mobB, mobC, mobD and mobE (Fig. 7). They are the equivalent of genes traI, traJ, traK, traL and traM of TRAP conjugative systems, respectively [59, 71]. Similarity is not restricted to homology among gene products, but extends to the order and direction of transcription of these genes [72]. Three genes (mobA, mobB and mobC) are essential for plasmid mobilization, while the other two (mobD and mobE), although not essential, affect the mobilization frequency.

3.2.2.2 In vitro characterization

Although no plasmid from the 3H-ColE1 family has been subjected to in vitro analysis, a lot is known about the MOBP region of plasmid RP4, perhaps the best known of all conjugative plasmids, and thus the prototype of this group [12, 14]. In MOBP, TraI is the relaxase while TraJ and TraK are two accessory, DNA binding proteins, which are believed to assist TraI to access its binding site through different processes, involving alteration of the local DNA structure and DNA superhelicity changes, respectively. To our knowledge, TraL and TraM have unknown functions. A detailed analysis of the genetics and biochemistry of MOBP can be found in [12].

3.2.2.3 Origin of transfer

As above indicated, no plasmid from the 3H-ColE1 family has been subjected to in vitro analysis. Nevertheless, two IRs homologous to the NikB relaxase binding site of plasmid R64 were located in two of the pNZ4000 oriTs [73, 74]. Thus, and in agreement to the discussion above, a MOBP-type oriT was proposed for the multiple oriT-containing plasmid pNZ4000 (Fig. 8(c)). In addition, a perfectly conserved nucleotide hexamer also found in plasmids RP4 and R64 close to an IR, has been proposed as the nic site for the plasmids pTF-FC2, pTC-F14 and pRA2 [11, 75] (Fig. 8(d)). To our knowledge just one other member of this family has been analyzed for mobilization in vivo. A 700-bp DNA fragment was shown to contain the pC221 functional oriT [76]. When such sequence was checked for the presence of a putative ColE1-like or pNZ4000-like nic site, no convincing matches were found.

3.3 The pMV158-superfamily

The third superfamily to be described has pMV158, isolated from Streptococcus agalactiae, as a prototype [77]. It is 5536 bp, non-conjugative and could be established in a variety of hosts, both gram-positive and gram-negative. It can be mobilized by conjugative plasmids of the pIP501/pAMβ1 family as well as by broad host range plasmids such as RP4 (IncP) and R388 (IncW).

A PSI-BLAST search starting with pMV158 relaxase resulted in 69 non-redundant hits above the P=0.00001 cut-off after six iterations. From these, five corresponded to relaxases of diverse mobilizable transposons from the genera Clostridium and Bacteroides (e.g., Tn4451, Tn4453, Tn4545, Tn5520 or NBUs) [7880], nine to relaxases encoded by bacterial chromosomes (possibly also from inserted mobilizable transposons) and finally 55 hits corresponded to relaxases of small plasmids. The corresponding proteins are shown in Table 4. Since no additional proteins appeared before the search converged, these relaxases form a coherent protein family.

View this table:
Table 4

The pMV158 superfamily

PlasmidBacterial sourceRelaxase
NameAccession No.Size (bp)Name% identity to pMV158 MobMAccession No.
pMV158X156695536Streptococcus agalactiaeMobM100AAA25387
pVA380L238032343Streptococcus ferusMob97http://AAA19677.1
pSSU1NC_0021404975Streptococcus suisMob, ORF570BAA83679
pSMQ172AF2951004230Streptococcus thermophilusMob, ORF370AAK83121
pER13NC_0027764139S. thermophilusMob70http://NP_115336.1
pF8801AF196967600Pediococcus damnosusMob57http://AAL15563.1
pI4AF30045714,000Bacillus coagulansMob/Pre52http://AF300457-5
pIP823U409973712Listeria monocytogenesMob/Pre47AAA93296
pLB4M335313547Lactobacillus plantarumRepC42AAA25252
pLAB1000A146603300Lactobacillus hilgardiiMob42A35390
pS86AJ2231615149Enterococcus faecalisMob, ORF439CAA11139
pBM02AY0267673854Lactococcus lactisMob, Orf139http://AAK13009.1
pK214X9294629,871L. lactisMob39http://CAA63521.1
pBC16NC_0017054630Bacillus cereusMob, ORF-β37AAA84921
pIP1714AF0156284978Staphylococcus cohniiMob/Pre37AAC61672
pUB110NC_0013844548Staphylococcus aureusMob, ORF-β37AAF85649
pLC88U313333501Lactobacillus caseiMob37http://AAA74581.1
pLA106D884382800Lactobacillus acidophilusPre34BAA21093
pBM5AJ429478491Bacillus mojavensisMob33http://CAD22322.1
pGI2X134819672Bacillus thuringiensisMob/Pre32P10025
pTX14-2NC_0043346829B. thuringiensisMob14-232http://NP_795748.1
pTB19M6389111,887Geobacillus stearothermophilusMob31http://AAA98305.1
pTB53D148522083G. stearothermophilusPre31http://BAA03580.1
pTB913M6389111,887G. stearothermophilusMob31AAA98307
pGI1NC_0043358254B. thuringiensisMob131http://NP_705753.1
pTA1015NC_0017655807Bacillus subtilisMob1530NP_053784
pTA1060NC_0017668737B. subtilisMob6030NP_053788
pUH1M767152044B. subtilisPre29A48371
p1414NC_0020757950B. subtilisMob29AAD22622
p22RNC_0045289935Leuconostoc citreumMob28NP_775704
pSBK203U350363780S. aureusPre27http://AAA79055.1
pE194J017553728S. aureusPre, C-40327QQSA4E
pOM1L315794442Butyrivibrio fibrisolvensPre26http://AAB57761.1
pT181NC_0013934439S. aureusPre26NP_040472
pKH6NC_0017674439S. aureusPre26http://NP_053796.1
pBBR1X667302600Bordetella bronchisepticaMob25S25246
pRS2NC_0032012544Oenococcus oeniPre25http://NP_443752.1
pRS3NC_0030993948O. oeniPre, ORF225NP_254269
pBMY1AJ2439673377Bacillus mycoidesMob25CAB88024
pCC7120αNC_003276408,101Nostoc sp.Pre25http://NP_490305.1
pLo13M959543948O. oeniMob1325http://AAA19673.1
pBGR1NC_0043082723Bartonella grahamiiMob24http://NP_696963.1
pRAO1AB0228662140Ruminobacter amylophilusMob, ORF323BAA74512
pTX14-1NC_0020915415B. thuringiensisMob23NP_054010
pCC7120ΔNC_00327355,414Nostoc sp.Pre23http://NP_489420.1
pUIBI-1NC_0040594671B. thuringiensisMob23NP_660266
pBMYdxAJ2722663376Bacillus mycoidesMob22CAB88025
pRRI2AJ2788723240Prevotella ruminicolaPre22CAC38004
pFL1NC_0021322311Flavobacterium sp.Pre, ORFII21NP_052877
pTS1NC_0026504200Treponema denticolaMob21NP_073756
pWKS1NC_0041602697Paracoccus pantotrophusMob21http://NP_690578.1
pTX14-3NC_0014467649B. thuringiensisMob14-318S16658
pPL1NC_0020943874Marinococcus halophilusMob17NP_054019
pZM03X144382749Zymomonas mobilisORF126http://CAA32611.1
pYHBi1AF4547015059Prevotella intermediaMob17http://AAL73040.1

The pMV158 superfamily includes Mob proteins along with proteins called Pre (for plasmid recombination enzyme), mainly from gram-positive bacteria. Recently, some members have been identified in gram-negative bacteria [80, 81]. Members of this family, including pMV158, are able to replicate both in gram-positive and in gram-negative bacteria. Besides, they can be mobilized by a great variety of helper plasmids, including gram-positive conjugative plasmids such as pIP501 and pAMβ1 [82], pLS20 [83], pXO11 and pXO12 [84], and pAD1 [85], broad host range plasmids of gram-negative bacteria from incompatibility groups IncW [86], IncF [10] and IncP [10, 86], and conjugative transposons such as Tn916 [87], Tn925 [88] and Tn1545 [10]. This may be a reason for the wide distribution of these mob regions.

pMV158-superfamily relaxase amino acid sequences were aligned and their most conserved features are shown in Fig. 9. The most striking result is the observation of the invariant motifs in this superfamily, which are clearly deviant from those reported previously. There is a clear N-proximal motif (HxxR) which is new among relaxases, and a probable variant of the 3H motif, of signature HxDE … Phxh (Fig. 9(c)). The presence of these two common signatures is a strong indication of a specific phylogenetic origin of this superfamily. Within the pMV158 superfamily, two clades stand out in the phylogenetic tree of Fig. 10. Clade A relaxases, of which pMV158 is the prototype, show extensive sequence conservation. The most salient features are shown in Fig. 9(a). Apart from the shared HxxR and HxDE motifs, a third motif, which shows the signature NY(D/E)L, is found. It was suggested that it contains the catalytic Tyr [16], although its absence from other clades in the superfamily, as well as the lack of experimental evidence, make this claim arguable. Clade B (Fig. 10) has plasmid pBBR1 as a prototype, the only one biochemically characterized. Interestingly, this clade shows a clear but deviant 3H motif (signature HxDExxPHxh), in which the third His is sometimes substituted by Ser or Thr. Besides, there is not remnant of a motif that can contain a catalytic Tyr (see below). An overview of the relatedness between pMV158 superfamily relaxases is shown in the phylogenetic tree of Fig. 10. As seen in the figure, apart from the well populated clades A and B there are other obvious clades that underscore the wide diversity of this superfamily. These relaxases are broadly distributed among many taxonomic families of gram-positive and gram-negative bacteria, even phyla. We have tried to represent in Fig. 10 an important new concept, that the classical 3H motif contains several variants in the different clades. As a result, it seems that only the first His of the triad seems to be really invariant, while the other two can accommodate various substitutions. Perhaps more importantly, the classical motif I (containing the catalytic Tyr) seems to disappear.

Figure 10

Neighbour-joining dendrogram of aligned pMV158-superfamily relaxases. Codes as in Fig. 2. Clades A and B are grey-shadowed. Changes of relaxase motifs are indicated for the relevant branches and nodes. * indicates partial sequence of the corresponding relaxases. Accession numbers of relaxases from elements other than mobilizable plasmids included in the tree: Tn4453: AAF66230; Tn4555: AAD43599; NBU1: A49901; Nostoc: ZP_00105740; Magnetospirillum: ZP_00056499.

3.3.1 Mobilization region organization

The organization of the mobilization region is similar in all superfamily members. It is composed by only one gene, encoding the relaxase, and the oriT site upstream, overlapping the relaxase promoter, as shown in Fig. 11(a). It was demonstrated for pMV158 that only the oriT region and mobM, encoding the relaxase, were indispensable for mobilization [82]. Furthermore, mobilization of a plasmid containing pMV158 oriT has been reported in the presence of a second plasmid encoding mobM under the control of an inducible promoter [86]. Also, Selinger et al. [89] showed that a plasmid containing plasmid pUB110 oriT could be mobilized by pUB110 Pre protein provided in trans. The Pre/Mob and RSa/oriT identities have been demonstrated for two new members of this family [85]. Interestingly, relaxases from members of this superfamily are involved in site-specific recombination between oriTs that can give rise to gene amplification [85].

Figure 11

Alignment of pMV158-superfamily nic sites. (a) Genetic organization of the pMV158-superfamily plasmids. (b) Comparison of the nic sites of some of the pMV158-superfamily plasmids. Codes as in Figs. 3 and 4.

3.3.2 In vitro characterization

pMV158 relaxase MobM (clade A) was purified and its in vitro relaxase activity demonstrated [16]. MobM cleaves pMV158 DNA at the nic site and remains tightly associated with its target DNA. It is able to cleave its DNA target, supercoiled or single-stranded, in presence of Mg2+ [86, 90]. Although Tyr49 has been presumed to be responsible for the cleavage reaction by similarity to a related motif in RCR initiator proteins, no rigorous evidence is presently available.

Besides pMV158 encoded mobilization system, only the Mob region of pBBR1 (clade B), a broad host range plasmid isolated from Bordetella bronchiseptica has been biochemically characterized. Interestingly, it resides in gram-negative bacteria and does not replicate via a rolling-circle mechanism [91]. Its relaxase (Orf1) was purified and shown to specifically recognize a 52 bp sequence in the absence of any other protein [81]. This 52 bp sequence contains oriT and the promoter of the auto regulated orf1 gene [81], suggesting that binding of the relaxase to the 52 bp sequence may thus fulfil two functions, relaxosome formation and mob gene regulation. Extensive site-directed mutagenesis was carried out on orf1 to find out the catalytic residues. Mutagenesis included each of the seven Tyr, two Phe (F94 and F95), one Asp (D120) and one Glu (E121). Surprisingly, none of the Tyr mutations had an effect in pBBR1 mobilization, suggesting that no Tyr is implicated in the catalysis mechanism. On the other hand, the D120L and E121G mutations abolished mobilization [81]. Since these two residues are invariant in the pMV158 superfamily, the authors suggest that both residues could be part of the relaxase active centre, maybe providing part of a coordination site for Mg+2 or perhaps being directly involved in the attack of the scissile phosphodiester bond at the nic site. It should also be pointed out that a similarly located acidic residue exists in MOBP relaxases (see above).

3.3.3 Origin of transfer

The oriT/RSa sites in many of the pMV158 superfamily members display broad sequence conservation, so it was possible to suggest potential nic sites for a number of them (Fig. 11(b)). All these potential oriT regions present a similar configuration, including an IR with a 7–10 nucleotides long stem and a loop of usually 6 nucleotides. The nic site is known to be located in the loop of this IR for plasmids pMV158 and pBBR1 [16, 81]. Highly similar oriT sites are present even in plasmids that do not contain mob or pre genes [9294]. Recognition of oriTs from different plasmids by one specific relaxase has been demonstrated in several cases, suggesting that co-mobilization could be a common feature. This could explain why sequence similarity among clearly homologous sequences disappears in some cases right at the nic site.

3.4 The CloDF13 family

CloDF13 is a 9957 bp bacteriocinogenic plasmid isolated from Enterobacter cloacae [95]. Its genetic organization, gene expression and DNA replication have attracted attention during the past three decades [96]. CloDF13 exhibits broad host range for mobilization. It can be mobilized efficiently by plasmids from many incompatibility groups, including IncF, I, N, P and W [70]. CloDF13 mobilization displayed the unique property of being independent of the coupling protein encoded by the helper conjugative plasmid. It only requires the mating pair formation gene products. Thus, it was suggested that CloDF13 codes for its own coupling protein [70], a unique characteristic of this plasmid. Moreover, despite the fact that CloDF13 sequence is known [95], no protein-encoding sequence with the conserved relaxase signatures could be identified [12]. Two orfs were found in its mobilization region (Fig. 12(a)), which encode for proteins of 59.7 and 27.7 kDa, corresponding to MobB and MobC, respectively [96]. These proteins were purified and their in vitro activity was studied ([18], Núñez and de la Cruz, unpublished data), identifying MobC as CloDF13 relaxase. MobB was required to enhance MobC relaxase activity. A possible relationship of MobC to relaxases of pheromone responding conjugative plasmids pAD1 and pAM373 (TraX and Orf8 proteins, respectively) has recently been suggested [97]. Interestingly, neither of these proteins shows the 3H relaxase signature or the alternative HEN motif. When MobC is compared to TraX and Orf8, there is a 21 residue segment that is 65% identical, although this level of identity does not extend to the rest of the sequences. It is noteworthy that a 15 amino acid in frame deletion involving this region eliminated TraX nicking function, consistent with a relevant function for these amino acids in the mechanism of action of these relaxases [97].

Figure 12

The CloDF13-family. (a) Genetic organization of CloDF13 mobilization region, showing the approximate position and sequence of a 21 amino acid sequence region of high similarity among the three relaxases shown. (b) Nucleotide sequence of CloDF13, pAD1 and pAM373 nic sites. Codes as in Figs. 3 and 4.

3.4.1 Origin of transfer

A relationship among the oriT sites of CloDF13, pAD1 and pAM373 has also been suggested by the same authors [97]. In the three cases, the nic site would be located in the stem of a long IR (140 bp), proximal to a series of short DRs that have been proposed as binding sites for the respective relaxases [98]. A similar genetic organization is also present in the dso replication origins of the pMV158 family of RCR plasmids [2]. Only a small region close to the nic sites of pAD1 and pAM373 exhibits sequence similarity to CloDF13 nic sequence (Fig. 12(b)). CloDF13 oriT is not similar to any of the oriT groups described by Zechner and colleagues [12], neither to the families above described, nor are the oriTs of pAD1 and pAM373 plasmids. It has been suggested that this group could represent a new family of oriT sites, also reflecting the existence of specific relaxases as above indicated.

4 Concluding remarks

We scrutinized the NCBI database of plasmids sequences (Microbial Genomes section of GenBank; 250 plasmid entries in May 2003), and all plasmids contained in it belong to one or another of the relaxase superfamilies described in this review. Thus, we are confident that our survey is comprehensive of the diversity of plasmids known up to now. All known relaxases, with the only exceptions of those of the three plasmids in the CloDF13 family, show either the 3H signature or variants of it, suggesting that the architecture of the catalytic site admits further variation than anticipated and thus, that there is not an absolute requirement for the 3H constellation. The most important variation is the HEN motif shown by ColE1 and members of the MOBHEN family. Other variants are included in the pMV158 superfamily. Thus, several proteins lack the third His (pBBR1, pWKS1 and pZMO3 relaxases carry Thr, while pTS1 and pBGR1 relaxases carry Ser), while several others lack the second His (pRS3 and pLo13 relaxases carry Gly and pYHBi1 carries Arg). It thus seems that only the first His is invariant, the second can be substituted by Gly/Glu/Arg, and the third by Ser/Thr/Asn.

Besides variation in the 3H Motif III the other important caveat refers to the catalytic Tyr in Motif I. While a probable catalytic Tyr is always identified in the MOBF, MOBQ, MOBP and MOBHEN families, it is clearly not found in some members of the pMV158 superfamily. In fact, in the case of pBBR1 relaxase, mutagenesis analysis practically excludes the possibility of a catalytic Tyr [81]. Thus we should be aware that some plasmid relaxases may use other residues (probably Ser or Thr) for catalysis of the DNA strand-transfer reactions that must accompany conjugative DNA processing. In view of this fact, it can be argued that the sole existence of an expanded 3H motif is not guarantee of common ancestry, and that it could have been produced by convergent evolution.

It is also remarkable that evolutionary-related conjugative DNA processing functions are not restricted to plasmids, since conjugative and mobilizable transposons and other integrating elements (ICEs and IMEs) [99, 100] also encode identifiable relaxosome components, that can be unequivocally included in one or another of the previously described oriT/relaxase families. It would appear as if all these elements evolved from a pool of genes that were exchanged and adapted to particular situations. Such evolution by accumulation of modular units has been recently proposed, based on the sequence data, for some staphylococcal and lactococcal conjugative plasmids [36, 101], the integrating conjugative and mobilizable elements [99, 100, 102] and is well established for rolling-circle replication plasmids [103, 104] and phages genomes [105].

In spite of the caveats of possible convergent evolution and modular accretion, we still would like to propose a plasmid classification scheme by conjugative relaxases since they constitute the most universal common feature in plasmid evolution. We are obviously aware that plasmids are subject to modular evolution, and thus relaxase genealogy does not necessarily apply to all the plasmid backbone. But we believe it introduces a useful notion that allows us to rightly describe plasmid evolution by looking first to the relaxase and then focus on the genetic modules that were gained or lost during evolution. This method was applied to the evolution of RSF1010 and ColE1 superfamilies, and the results summarized in Figs. 2 and 6 provide nice examples of the success of this approach. Obviously a relaxase-based classification will be nicely complemented with a rep-based approach, whenever this is possible (see Introduction).

A previous relaxase classification by Zechner et al. [12] focused mainly on autotransmissible plasmids. It already defined four relaxase families although, due to the scope of that revision, it did not include an attempt to put forward a rigorous classification scheme. Its analysis was mostly biochemical, and centred basically on MOBP and MOBF superfamilies, the best known conjugative plasmid superfamilies. Thus we have not analyzed these families in detail again, since their analysis and conclusions still hold (perhaps interestingly, there are no MOBF members among mobilizable plasmids). From this starting point, we have systematically searched for relaxase families, being exhaustive in the screening of the database sources. We not only classify the relaxases, but also try to put this information in the context of the genetic organization of the MOB regions, to have a broader picture of the layout of the different mobilization systems. It should be emphasized that we checked the original DNA sequences for all the plasmids included in this review, and in many cases the genetic organizations shown in Figs. 3, 4, 7, 8 and 11 include orfs and sites not previously annotated, too numerous to individually mention them.

As a whole, analysis of relaxase superfamilies result in five groups of plasmid mobilization regions: MOBQ, MOBP, MOBF, MOBHEN and MOBV. These five families contain relaxases related to the 3H signature (signatures that can be used as preliminary tree-determinants are shown in Table 5, and may be useful for a rapid approximate relaxase classification), and probably structurally and mechanistically related DNA processing reactions. The MOBP and MOBHEN families clearly intersect, as discussed in this review. The pMV158 superfamily is still insufficiently analyzed and may contain several relaxase families in the sense described in this review (pMV158 and pBBR1 could be the prototypes of two such families). The only “extraneous” three plasmids are those belonging to the CloDF13 family, which may undergo a completely different mechanism of DNA processing for conjugation. Additionally, we should not forget the Streptomyces plasmids that probably transfer DNA also in a completely different fashion [106]. Nevertheless, since this form of DNA transfer (in mycelial cells) could be rather different from classical conjugation in many respects, we did not analyze these plasmids.

View this table:
Table 5

3H-motif signatures of the relaxase families

Family3H-Motif a
MOBQH(xn)HxH(x5)r
MOBPHx(D/E)(x4–6)HxH(x3)n
MOBHENHxDK(x1–8)RLELNFxxP
MOBVHxDE(xn)phxh
MOBFH(x3)R(x3)PxxHxH(x4)N
  • aUpper case letters represent invariant amino acids; lower case letters represent residues conserved in at least 90% of the sequences of a given family.

In conclusion, this review proposes that, in spite of great sequence plasticity, most small plasmids are mobilized by the concourse of evolutionarily related relaxases, underscoring the existence of a probable common, almost universal conjugative processing mechanism. Thus, the generally accepted idea about different conjugative strategies being used by gram-positive and gram-negative systems, for instance, or even different mechanisms for conjugative or mobilizable systems, do not apply to the mechanisms of DNA processing. Perhaps paradoxically, relaxases belonging to the same protein family are in many occasions found in phylogenetic widely separated bacterial genera. It seems that each of the relaxase superfamilies reported here has spread over considerable evolutionary time to colonize many bacterial genera, and persist because they contribute important idiosyncratic properties. We hope our classification scheme will serve bacterial geneticists to see the underlying order in the seemingly chaotic diversity of bacterial mobile genetic elements.

Acknowledgements

This work was supported by Grant BMC2002-00379 from the Ministry of Science and Technology (Spain) to F.C. and by Grant F.I.S.02/3029 from the Spanish “Fondo de Investigación Sanitaria” to M.V.F.

References

  1. [1].
  2. [2].
  3. [3].
  4. [4].
  5. [5].
  6. [6].
  7. [7].
  8. [8].
  9. [9].
  10. [10].
  11. [11].
  12. [12].
  13. [13].
  14. [14].
  15. [15].
  16. [16].
  17. [17].
  18. [18].
  19. [19].
  20. [20].
  21. [21].
  22. [22].
  23. [23].
  24. [24].
  25. [25].
  26. [26].
  27. [27].
  28. [28].
  29. [29].
  30. [30].
  31. [31].
  32. [32].
  33. [33].
  34. [34].
  35. [35].
  36. [36].
  37. [37].
  38. [38].
  39. [39].
  40. [40].
  41. [41].
  42. [42].
  43. [43].
  44. [44].
  45. [45].
  46. [46].
  47. [47].
  48. [48].
  49. [49].
  50. [50].
  51. [51].
  52. [52].
  53. [53].
  54. [54].
  55. [55].
  56. [56].
  57. [57].
  58. [58].
  59. [59].
  60. [60].
  61. [61].
  62. [62].
  63. [63].
  64. [64].
  65. [65].
  66. [66].
  67. [67].
  68. [68].
  69. [69].
  70. [70].
  71. [71].
  72. [72].
  73. [73].
  74. [74].
  75. [75].
  76. [76].
  77. [77].
  78. [78].
  79. [79].
  80. [80].
  81. [81].
  82. [82].
  83. [83].
  84. [84].
  85. [85].
  86. [86].
  87. [87].
  88. [88].
  89. [89].
  90. [90].
  91. [91].
  92. [92].
  93. [93].
  94. [94].
  95. [95].
  96. [96].
  97. [97].
  98. [98].
  99. [99].
  100. [100].
  101. [101].
  102. [102].
  103. [103].
  104. [104].
  105. [105].
  106. [106].
View Abstract