OUP user menu

The diversity of conjugative relaxases and its application in plasmid classification

María Pilar Garcillán-Barcia , María Victoria Francia , Fernando de La Cruz
DOI: http://dx.doi.org/10.1111/j.1574-6976.2009.00168.x 657-687 First published online: 1 May 2009

Abstract

Bacterial conjugation is an efficient and sophisticated mechanism of DNA transfer among bacteria. While mobilizable plasmids only encode a minimal MOB machinery that allows them to be transported by other plasmids, conjugative plasmids encode a complete set of transfer genes (MOB+T4SS). The only essential ingredient of the MOB machinery is the relaxase, the protein that initiates and terminates conjugative DNA processing. In this review we compared the sequences and properties of the relaxase proteins contained in gene sequence databases. Proteins were arranged in families and phylogenetic trees constructed from the family alignments. This allowed the classification of conjugative transfer systems in six MOB families: MOBF, MOBH, MOBQ, MOBC, MOBP and MOBV. The main characteristics of each family were reviewed. The phylogenetic relationships of the coupling proteins were also analysed and resulted in phylogenies congruent to those of the cognate relaxases. We propose that the sequences of plasmid relaxases can be used for plasmid classification. We hope our effort will provide researchers with a useful tool for further mining and analysing the plasmid universe both experimentally and in silico.

Keywords
  • plasmid classification
  • relaxase
  • bacterial conjugation
  • type IV secretion system
  • coupling protein

Introduction

Plasmid conjugation is a leading mechanism for genetic exchange in bacteria and thus an important component of bacterial evolution. It involves the cleavage of the transferring DNA in a site called oriT by a protein termed relaxase. As a result of the reaction, the relaxase becomes covalently bound to the oriT DNA. The resulting nucleoprotein complex is transported to the recipient cell by a protein export mechanism known as the type IV secretion system (T4SS). The DNA is actively pumped into the recipient cell by the type IV coupling protein (T4CP) (Llosa et al., 2002; Christie, 2004). If a plasmid encodes the complete protein machinery for conjugal transfer, it is called self-transmissible or conjugative. Some plasmids contain a minimal gene set that allows them to be conjugally transmitted just when in the presence of a helper conjugative plasmid. They are called mobilizable plasmids. They usually contain just an oriT, a relaxase gene and one or more nicking-accessory proteins. Conjugative plasmids thus tend to be large (>30 kb) with low copy number, while mobilizable plasmids are small (<15 kb) and have high copy number. We can generalize and say that all transmissible plasmids contain a MOB region, required for mobilization, while self-transmissible plasmids contain, on top of that, a T4SS that allows the assembly and functionality of the mating channel.

Relaxase proteins are large and usually contain two or more protein domains. The relaxase domain proper is located always at the N-terminus of the protein. At the C-terminus, a DNA helicase, DNA primase or other domain of unknown function is almost always found. In many cases (the exceptions will be noted in this work), relaxases contain a conspicuous signature (called 3H-motif) composed of a histidine triad that the protein uses to bind divalent cations. We already showed that relaxases are a convenient phylogenetic tool for plasmid classification, using a sample of mobilizable plasmids (Francia et al., 2004). This tool has to be compared with classification via plasmid replication regions, carried out first by Southern hybridization (Couturier et al., 1988) and later by PCR amplification (Carattoli et al., 2005). Plasmid classification by replicon typing encounters problems derived from plasmids frequently carrying multiple replicons, mosaicism in replicons (Boyd et al., 1996; Osborn et al., 2000), etc., as will be discussed in the fourth section. In this work, we proposed to complete the (Francia et al., 2004) analysis by extending our survey to conjugative plasmids. This allowed the classification of conjugation systems in MOB families. The resulting families were MOBF, MOBH, MOBQ, MOBC, MOBP and MOBV. The most relevant features of each relaxase family were reviewed from the literature.

Method used for relaxase analysis

Relaxases are usually multidomain proteins, in which the relaxase domain always occupies the N-terminal position. Thus, psi-blast (Altschul et al., 1997) searches (P=1e–4 unless otherwise stated) were carried out using the N-terminal 300 amino acids of prototype relaxases from each MOB family as was done previously by Francia et al. (2004). Additional iterative blast searches using the phylogenetically most distant family members were carried out in order to cover a maximum sequence space without corruption. Data from relaxases sequenced up to December 2007 were included. blast searches organized the relaxase universe in specific protein families. This resulted in well-defined family boundaries in all cases except in the MOBP family. In this case, when some deep branching proteins were used in iterative blast searches, they could retrieve additional members. However, it was considered too speculative to include data from such remotely related proteins for which no additional functional information was available. Thus, the MOBP family should be presently considered as unfinished. Parallel blast searches with T4CP sequences (generally adjacent to relaxases in genetic maps) were used as confirmation for the presence of a functional conjugative system. Only relaxases contained in plasmids or in demonstrated integrative and conjugative/mobilizable elements (ICEs/IMEs) with a T4CP gene in the genetic vicinity were included in the present phylogenetic and molecular analysis. The resulting tables of constituent plasmids and ICEs/IMEs are presented as supplementary information. Multiple alignments were carried out using clustalw (Thompson et al., 1994). The phylogenies were constructed with mega version 3.1 (Kumar et al., 2004). For each MOB family, we calculated a phylogenetic tree using neighbour-joining (NJ) analysis, tested with bootstrap values (1000 replicates). Topologies were confirmed by maximum likelihood (ML) analysis – not shown in the manuscript – using phyml (http://atgc.lirmm.fr/phyml/) (Guindon & Gascuel, 2003; Guindon et al., 2005).

Classification of large conjugative plasmids

Francia et al. (2004) classified relaxases of small mobilizable plasmids in four main families or superfamilies. To attempt the classification of large conjugative plasmids, we conducted a database and literature analysis using the amino acid sequence similarity of their relaxases as a first criterion. In general terms, the most important difference with respect to the survey of mobilizable plasmids of Francia et al. (2004) is the addition of two new relaxase families that are present predominantly in large conjugative plasmids: the MOBF and the MOBH families. They will be analysed first (see The MOBF family and The MOBH family, respectively) because they are well-resolved families in phylogenetic terms. The MOBC family is quantitatively smaller but has grown significantly since the previous review (it was then called CloDF13 family and contained only three plasmids). Because MOBC contains a clade of conjugative plasmids, it merits a specific discussion (see The MOBC family). Besides these three families, the remaining three families showed less well-defined contours and perhaps all three form part of a much larger protein class (hence, we will call it the MOBP cluster). First, the MOBP superfamily (see The MOBP cluster) is greatly enlarged; it now includes the MOBHEN family within it. It also contains several branches that are conspicuous but contain too few sequences to be properly analysed. The MOBQ family is also enlarged and partially overlaps MOBP (see The MOBQ family). Finally, the MOBV superfamily also grows, is very diverse, and overlaps MOBP, and, in our opinion, remains generally underanalyzed because of a relative paucity of sequences. In summary, we will describe six relaxase families or superfamilies (see The MOBV cluster). Because some of the relaxase families have already been discussed in length in previous publications (Francia et al., 2004; Fernandez-Lopez et al., 2006), they will just be updated here specifically in the contribution of large conjugative plasmids.

To our knowledge, all conjugative systems represented in the DNA sequence databases (>1000 in December 2007) contain relaxases that are contained in one or another of these six families, with the exception of Tn916 (see Other relaxases: Tn916 relaxase). A schematic representation of relaxase diversity is shown in Fig. 1. A total of 616 plasmid relaxase sequences were analysed. The figure represents the different relaxase MOB families by circles whose sizes represent their quantitative representation in databases. Circles overlap if there are members that belong to more than one group (because they appear in blast searches started from two different prototypes).

1

A scheme of the relationships between the main relaxase protein families. A first relaxase cluster (shown on a dark-grey background) contains relaxase groups that contain just one active Tyr in the catalytic centre (see text). A second relaxase cluster (light-grey) contains relaxases with two Tyr in the catalytic centre. For the remaining groups (white background) not enough is known about the biochemistry of the respective relaxases. According to the analysis of their amino acid sequences and of experimental work (see text) they seem to be nonhomologous, and thus may use different DNA-processing mechanisms. Some relaxase protein families also overlap other protein families, such as plasmid RC-replication proteins (Rep), IS91-like transposases (IS91) or HD hydrolases. Areas of circles are proportional to relaxase number. The MOBP area includes (MOBP+MOBHEN+MOBQ1).

It should be noted that in many examples shown in this work, terminal branches are compressed in the phylogenies. They represent several or even a large number of independent isolates, frequently containing plasmids with the same backbone but different cargo (i.e. antibiotic resistance genes). See, for instance, the F/R1 type of IncF plasmids (Boyd et al., 1996), the IncPβ plasmids (Heuer et al., 2004) or the IncW plasmids (Revilla et al., 2008). Thus, each terminal branch in our phylogenetic trees names a prototype that may represent several relaxases with even <95% amino acid sequence identity.

Main families of conjugative plasmids

The MOBF family

This is a well-analysed family, which contains 114 members (Supporting Information, Table S1). For two of their relaxases, there is detailed structural and biochemical knowledge: TraI_F (Larkin et al., 2005; Hekman et al., 2008) and TrwC_R388 (Guasch et al., 2003; Gonzalez-Perez et al., 2007). The relaxases of the MOBF family are large proteins that consist of two domains: an N-terminal relaxase domain in which there are two catalytically active tyrosines and a C-terminal helicase domain. The helicase domain is characteristic of MOBF relaxases and probably constitutes an important adaptation cue of these conjugative systems. The phylogenetic origin of the helicase domain in MOBF relaxases has already been discussed (Fernandez-Lopez et al., 2006). Besides these two domains, TraI_F was reported to contain a third domain in its C-terminus, implicated in conjugative transfer (Matson & Ragonese, 2005). Similarly, TrwC contains a third domain between the relaxase and helicase domains (Cesar et al., 2006). MOBF relaxases contain two catalytic tyrosines in their catalytic centre (Grandoso et al., 2000). This is another important peculiarity of this relaxase family and can have consequences in the mechanism of transfer (Gonzalez-Perez et al., 2007). Also, MOBForiTs are comparatively large and complex, presumably reflecting highly regulated DNA sequences. The diversity and long-term evolution trends of the MOBF family were already discussed by Fernandez-Lopez et al. (2006) and the reader is referred to this paper for an exhaustive analysis of the genetic assembly and evolution of the MOBF genetic determinant.

Figure 2 updates the phylogenetic tree of the family. The psi-blast search was carried out using the threshold value P=1e–8 and converged in the fourth iteration. Using a lower threshold, such as the standard P=1e–4, retrieved transposases of the IS91 family (Garcillan-Barcia & de la Cruz, 2002). This fact is interesting in itself and suggests that MOBF relaxases and rolling-circle transposases are phylogenetically related, albeit remotely. As observed before, the crystal structure of the MOBF relaxase TrwC_R388 clearly shows that it is a linear permutation of the structure of rolling-circle (RC) replication initiator proteins, to which IS91 belongs (Garcillan-Barcia et al., 2002; Guasch et al., 2003).

2

Phylogeny tree of MOBF relaxases. The dendrogram was constructed using a NJ algorithm. Bootstrap values for 1000 replicates are indicated. The tree was rooted with TrwC_R388 most distantly related homologues (pA387, pChr15 and pNAC3) that still belong to MOBF. The distribution of plasmid origins according to the bacterial phyla is shown by vertical bars. Some specific clades that are discussed in the text are emphasized in colour with different tones of red and labelled F1, F2, etc. Plasmids containing a T4CP gene in the vicinity of the relaxase gene (and thus presumed to be conjugative) are underlined. In the clades where the synteny relaxase-T4CP was conserved, this is shown in the figure by thick horizontal arrows where the T4CP is coloured dark grey, the relaxase gene is coloured red and any intervening gene(s), light grey or a thin black line. Plasmids whose relaxases have been analysed biochemically are boxed. Plasmids for which conjugation has been experimentally demonstrated are labelled with an asterisk. Rhosp, Rhodobacter sphaeroides; Burce, Burkholderia cenocepacia.

As can be observed in Fig. 2, there are several relatively well-resolved clades in the MOBF family. Clade MOBF1 is, for the time being, the most numerous, and includes plasmids from Proteobacteria and Cyanobacteria. Other clades include relaxases of the phyla Proteobacteria and Actinobacteria. They are not so well resolved at present, probably because an insufficient number of members have been analysed. The MOBF1 clade is of the outmost importance in Gammaproteobacteria and includes plasmids of the classical IncF, IncN, IncP9 and IncW incompatibility groups. Three differentiated subclades within MOBF1– also supported by ML analysis – are MOBF11, MOBF12 and MOBF13, which are coloured in Fig. 2. The MOBF11 subclade contains two defined branches, represented by the prototype plasmids R388 (IncW) and R46 (IncN) for the enterobacterial branch, and by pWWO (IncP9) for the Pseudomonas branch. IncN plasmids are involved in the dissemination of multidrug resistances in Enterobacteria (Shen et al., 2008). IncP9 plasmids are involved in the dissemination of catabolic pathways for xenobiotics (Greated et al., 2002). In all MOBF11 plasmids, relaxase and T4CP genes are adjacent in the DNA sequence. Subclade MOBF12 is comprised by the phylogenetically broad IncF complex. It is as broad, in phylogenetic terms, as the MOBF11 clade. It should be noted, however, that the IncF complex was defined not by incompatibility, but by sensitivity to F-specific phages. In fact, the IncF complex includes at least seven incompatibility groups (de la Cruz et al., 1979). IncF plasmids are prevalent in Enterobacteria, and frequently responsible for the dissemination of antibiotic resistance determinants of medical importance (see e.g. Womble & Rownd, 1988; Coque et al., 2008) and virulence factors (Rotger & Casadesus, 1999; Herrero et al., 2008). Along with the MOBF11 and MOBF12 clades, a possible subclade MOBF13 includes a series of plasmids from Cyanobacteria. When more sequences become available, this subclade will probably further subdivide; thus, its status is tentative. In any case, it seems clear that plasmid relaxases group in phylum-specific subclades, a result that will appear consistently in this review.

Besides clade MOBF1, the only other clade that is emphasized in the figure is MOBF2, which is well supported by NJ and ML analyses. It groups a set of plasmids of the phylum Actinobacteria. This is the only group within MOBF, together with the Streptomyces linear plasmids represented by pSV2, which do not contain an associated T4CP (and are thus presumed to be mobilizable, not conjugative). Besides the most conspicuous subclades that are numbered in Fig. 2, there are other scattered members in poorly resolved branches. Because of the low overall amino acid sequence conservation, resolution of the deep branches is not reliable. In general, this will be the case in all trees presented in this work. Thus it is not possible to present a complete or final plasmid classification, which will need many more sequences to be analysed.

The conserved signatures in the best-analysed clades of the MOBF family are shown in Fig. 3. Sequence conservation is extensive in the three relaxase motifs previously shown to configure the protein catalytic centre (Guasch et al., 2003; Francia et al., 2004). In TrwC, Y18 and Y26 are the catalytic nucleophiles, which attack the scissile DNA phosphodiester bonds. D85 abstracts a proton from Y18 (and presumably also from Y26) and thus helps to configure Y18 as a potent nucleophile. The histidine triad (H150, H161 and H163) is involved in coordination of a Mg+2 ion, essential in the cleavage mechanism. Considering the 114 relaxase proteins in the complete family, motif III is the most conserved. H161 and H163 are conserved in all proteins (the only two invariant amino acids). H150 is present in all proteins except in TraI_pLPL, where it is substituted by a Q. Within motif II, D85 is also highly conserved, with only three exceptions: in two cases it is changed to an E, and in the third (plasmid pACRY06) the alignment is uncertain. Motif I is more difficult to scrutinize, because there is smaller overall conservation. A nonexhaustive survey suggests that both tyrosines are invariant.

3

Comparison of MOBF relaxase and T4CP phylogenies. Branches of the main clades F11, F12 and F13 are coloured in different tones of red as in Fig. 2. Dotted lines indicate a swapped clade, whose branches are grey coloured. A grey circle is placed in the nodes of the T4CP tree that are not congruent with the relaxase phylogeny.

The phylogeny of the MOBF T4CPs is compared with that of the MOBF relaxases in Fig. 4. As can be seen, both phylogenies are highly congruent. The branches in the F11, F12 or F13 clades are almost completely conserved. Only the positions of the most profound clades vary with respect to one another. The same congruence is observed in ML topologies. This result indicates that relaxase and T4CP evolve together, a fact that was evident also in the analysis of other relaxase families (data not shown).

4

Conserved sequence motifs in representative relaxase protein clades within the MOBF family. The clades to which the sequences belong are shown at the left margin of the figure. The clustalw alignment considered only the 300 N-terminal residues of each protein, containing the relaxase domain. Vertical arrowheads point to residues known to be important for function in TrwC_R388 (Guasch et al., 2003). Colour code: red on yellow, invariant amino acids; blue on blue, strongly conserved; black on green, similar; green on white, weakly similar; black on white, not conserved.

The MOBH family

MOBH represents an entirely new family, not represented at all among mobilizable plasmids and thus not analysed in our previous review (Francia et al., 2004).

The MOBH phylogeny tree

The prototype MOBH relaxase is TraI from the IncHI1 group plasmid R27 (TraI_R27). It was used as a query in psi-blast, resulting in 25 plasmid hits (Table S2) as well as a large number of hits on ICEs and genomic islands (GIs). The search converged in the third iteration. Thus, the MOBH family is a new and well-resolved relaxase family. It bears no relationship with any other relaxase family. Even today, it is only composed of relaxases from large conjugative plasmids of the classical incompatibility groups (IncH, IncJ, IncT, IncP7 and IncA/C) and genomic elements derived from them. Each of these Inc groups represents a well-resolved clade within the MOBH tree (Fig. 5). Thus, the phylogenetic relationship among these groups is made clear. Another important hallmark of the MOBH family is that some clades are composed entirely of ICEs and GIs. Entering the field of chromosomally integrated GIs will add insurmountable complexity to this review; hence, the analysis of conjugative integrated elements had to be confined to a minimum. Because it is likely that a number of GIs may contain nonfunctional relaxases, their inclusion may corrupt the phylogenies of plasmid relaxases, logically assumed to be functional. Nevertheless, a related phylogenetic tree emphasizing the ICEs and GIs in MOBH has been reported (Salgado-Pabon et al., 2007).

5

Phylogeny tree of MOBH relaxases. The tree is rooted with the MOBH3 relaxases. Codes for underlines, asterisks, boxes, gene synteny and vertical bars are the same as in Fig. 2. Clade MOBH3 is in parentheses because its status is uncertain. Each clade is coloured in different tones of yellow. Shesp, Shewanella sp.; Neigo, Neisseria gonorrhoeae; Ralme, Ralstonia metallidurans; Polsp, Polaromonas sp.; Rhofe, Rhodoferax ferrireducens.

Considering mainly plasmid relaxases and active ICEs, the MOBH group is resolved into two well-supported clades (MOBH1 and MOBH2) bringing together plasmids of Gammaproteobacteria, and a less resolved and more distant clade MOBH3 composed of plasmids of Betaproteobacteria. Representatives of clade MOBH1 show considerable genetic divergence. In the NJ tree, we can distinguish at least two MOBH1 subclades that are not well split in the ML tree (data not shown).

Class MOBH11 brings together relaxases belonging to IncHI conjugative plasmids (R27/pHCM1 and R478), peculiar in that conjugation occurs at 30 °C, but not at 37 °C. A second clade (MOBH12) is much broader in phylogenetic terms and is divided in several sub-subclades. One sub-subclade joins together IncA/C plasmids with a group of ICEs [R391 from Proteus rettgeri (Coetzee et al., 1972) and SXT from Vibrio cholerae (Waldor et al., 1996)]. These two elements combine a set of conjugative transfer genes with a phage-based integration system. Comparison of the R391 and SXT DNA sequences showed conservation of a backbone region, with >95% identity over 65 kb (Beaber et al., 2002). Both R391 and SXT are capable of excision and subsequent conjugative transfer to recipient cells. R391 was initially classified as an IncJ plasmid, but the absence of any plasmid replicon (Boltner et al., 2002) confirmed that it is not a plasmid. ICEs differ from archetypal conjugative transposons, such as Tn916, in the absence of random chromosomal integration. In fact, R391 and SXT each integrate into a single site in the Escherichia coli chromosome. In this respect, they resemble GIs, which also integrate into just one or a few chromosomal sites (typically tRNA genes). Although ICEs are chromosomally located, they have been included in our analysis because they excise to form circular (nonreplicating) plasmids as part of their normal life cycle. The relaxase genes of R391 and SXT were identified, but not analysed experimentally.

Another MOBH12 sub-subclade includes plasmids Rts1 and pCAR1. Rts1 (Murata et al., 2002), a 217-kb conjugative plasmid originally isolated from Proteus vulgaris, is the prototype IncT plasmid. Rts1 conjugation is also thermosensitive, being most efficient at 25 °C. Conjugation-related T4SS gene products show similarities to F and R27 conjugation systems. Homologues of all F genes for pilus assembly except trbI were identified in Rts1. Besides TraI (Orf201), which belongs to MOBH12, there is another protein called Orf220 (accession no. NP_640180), which was annotated as a ‘nickase’, related to relaxase TaxC_R6K. However, detailed inspection of Orf220 showed that it only contains 7% identity to the C-terminus of TaxC. Plasmid pCAR1 (Maeda et al., 2003) is a 199-kb plasmid isolated from Pseudomonas resinovorans. It is the prototype IncP7 plasmid. pCAR1 contains transposon Tn4676 (Shintani et al., 2003), which contains a gene product annotated as ‘putative DNA nickase’ (Orf 145, accession no. NP_758688). Orf145 is homologous to TraO_pIPO2T, which is actually a DNA primase. These are two examples (among many) of poorly curated annotations that complicate database searches to a great extent.

Besides two relaxases, TraI_pKLC102 and TraI_pMAQU02, belonging to plasmids of Pseudomonas aeruginosa and Marinobacter aquaeolei, respectively, clade MOBH2 is constituted of chromosomally located relaxases. They form part of ICEs, some closely related to plasmid pKLC102, as shown in the phylogeny of Fig. 5. It is assumed that these ICEs directly originated from ancestral elements closely related to pKLC102 (Klockgether et al., 2004). Interestingly, in P. aeruginosa C strains, plasmid pKLC102 was found both as a plasmid and as a chromosomally integrated element. This situation could depict an ongoing evolution from a plasmid to an irreversibly fixed GI, because certain C isolates contained only the chromosomally integrated version of pKLC102 while another subgroup exhibited a diversity of transposition events affecting the integrity of pKLC102 and contributing to its fixation (Klockgether et al., 2004). T4CP genes of MOBH2 plasmids and ICEs are located far from the relaxase gene (Fig. 5).

Also included in clade MOBH2 is the prototype relaxase TraI of a GI in the Neisseria gonorrhoeae chromosome, which will be discussed below. Similar GIs have been detected in several organisms harbouring HD-hydrolases homologous to TraI_R27, such as Xylella fastidiosa, Xanthomonas campestris, Burkholderia fungorum and Ralstonia metallidurans (van der Meer & Sentchilo, 2003). These and other GIs and their relaxases were arranged phylogenetically by Salgado-Pabon et al. (2007) and will not be described here.

There might be a third clade, MOBH3, that includes the relaxases of conjugative plasmids pHG1 (Ralstonia eutropha) and pMOL28 (R. metallidurans). These plasmids, as well as plasmid 2 (Polaromonas sp.), plasmid 2 (R. metallidurans), plasmid 1 (Rhodoferax ferrireducens), plasmid 1 (Polaromonas sp.) – not included in the same group by ML analysis, pBVIE02 (Burkholderia vietnamiensis) and pPNAP01 (Polaromonas naphthalenivorans) (all from Betaproteobacteria), do exhibit neither the histidine triad nor the HD-hydrolase motif. Nevertheless, they all contain a T4CP adjacent to the relaxase gene and always appeared in blast searches starting from plasmids of other MOBH clades. When a blast search was started from the relaxase of pHG1, for instance, it also converged upon retrieving all MOBH plasmids. Thus, these plasmids are bona fide MOBH plasmids and we need to consider what this result implies on the structure and mechanism of MOBH relaxases.

The phylogeny of the MOBH T4CPs is shown in Fig. S1. As can be seen when comparing Fig. 5 and Fig. S1, MOBH relaxase and the T4CP phylogenies are congruent, with the exception of MOBH3, which was not well resolved in the relaxase tree either. The branches in the MOBH11, MOBH12 or MOBH2 clades are completely conserved. This result indicates, once again, that relaxase and T4CP evolve together.

MOBH relaxase signatures

As explained above, MOBH relaxases are clearly different from the other relaxase families. A clustalw alignment of MOBH relaxases that includes clade MOBH3 shows no invariant amino acid. Nevertheless, MOBH1 and MOBH2 clades share two sequence motifs, as shown in Fig. 6: a variant of the 3H motif (that we call alternative 3H motif) and a new, family-specific, HD hydrolase motif. The alternative 3H motif is a strongly conserved sequence of signature (HQ)-x2-PASE-x-HHH-x3-GG-x3-H-x-L, although the sequence environment is very different from that of the 3H motif (compared with MOBF relaxases in Fig. 3). The HD hydrolase motif contains the signature (LV)-x-HD-(AVLI)-GK. The common activity of proteins showing the HD-motif is that of a divalent cation-dependent phosphohydrolase (Aravind & Koonin, 1998). Thus, it is possible that the 3H and the HD motifs are alternative forms of a metal binding cluster. However, until experimental evidence becomes available, the functional status of these motifs remains uncertain. Finally, it was not possible for us to find a motif equivalent to motif I of MOBF or MOBP plasmids. Y63 is an invariant Tyr in the MOBH1 clade, but this residue is not conserved in clade MOBH2. Curiously, a different Tyr (Y47) was potentially assigned as the catalytic Tyr by Sherburne et al. (2000). Y63 neither coincides with the proposed catalytic Tyr of the chromosomal relaxase of N. gonorrhoeae (Y93). Although mutants in Y93 are nonfunctional, this residue is not conserved in MOBH1 or MOBH2 clades (several relaxases contain Phe in this position; see Fig. 6). The MOBH3 clade is intractable for signature search, because there is not a single invariant residue. At the present time, the structure and mode of action of MOBH relaxases are a mystery that deserves biochemical analysis.

6

Conserved sequence motifs in the MOBH family. Codes are the same as in Fig. 3. The conserved motifs in MOBH relaxases are different from those in the classical relaxases (like MOBF). The first motif is histidine-rich; hence, it is named alternative 3H motif to point out that it has no homology to the classical 3H motif. The second motif is shared by HD-hydrolases and it is named accordingly. See MOBH relaxase signatures for more details. The white vertical arrowhead points to Y93, an important residue of TraI_Neigo, which is the only MOBH protein that has been analysed genetically.

MOBH prototypes

The available experimental information on MOBH relaxases is scarce. In fact, not a single plasmid conjugative relaxase has been characterized at the biochemical level. The IncHI1 plasmid R27 has been analysed in most detail (Sherburne et al., 2000). This 180-kb-long plasmid is temperature sensitive for conjugation, with optimal mating temperature at 22–28 °C due to transcriptional regulation of its tra genes (Alonso et al., 2005). Most genes coding for its T4SS are encoded in one plasmid region called Tra2 (Sherburne et al., 2000), whereas the oriT, all relaxosomal components and some additional T4SS genes are encoded in Tra1 (Lawley et al., 2002, 2003). While Tra1 exhibits organizational similarities to both IncP and IncW transfer regions, Tra2 resembles the transfer region of plasmid F. The R27 MOB region was examined by mutagenesis in Lawley et al. (2002). The genetic organization is traRoriTtraH (nicking accessory protein)≫traI (relaxase)≫traG (T4CP)≫traJ (nicking accessory protein). The functional oriT is no more than 285 bp long, but no nic site was identified. Unfortunately, there are no reports of the biochemical analysis of the predicted relaxase (1011 amino acids). The existence of two potential nicking-accessory proteins also makes the MOBH system more related to F. Another similarity is the fact that plasmids containing MOBH and MOBF12 relaxases transfer with equal frequencies in liquid and solid surfaces (Bradley et al., 1980).

The most striking system in the family is the DNA secretion system of N. gonorrhoeae, included in a 57-kb GI called gonococcal genetic island (GGI) (Hamilton et al., 2005). The GGI island contains a T4SS related to IncF and IncH plasmids that is still mobile. It can be excised and inserted experimentally. The GGI element is responsible for chromosomal DNA secretion to the exterior of the cell. The secreted DNA is then a substrate for the Neisseria DNA uptake system. Thus, Neisseria seems to have developed a mechanism for DNA release/uptake that allows gonococcus strains to share DNA seamlessly resulting in an effective panmictic clone (Hamilton & Dillard, 2006). Genetic integrity and expression of the T4SS components is required for DNA secretion. The GGI relaxase (TraI_Neigo) and the DNA processing reactions have been studied in detail (Salgado-Pabon et al., 2007). DNA is secreted by the T4SS as ssDNA, as judged by its sensitivity to ss- but not ds-specific DNAses. In addition, the DNA is protected in its 5′-end, possibly by the secreted relaxase. The relaxase itself has some interesting and unique properties. Mutation of TraI identified Tyr93 as required for DNA secretion, making it the likely tyrosine that performs the nucleolytic attack on the DNA, by analogy to other relaxases. Residue Tyr201 was proposed as a possible second nucleophile, although results seemed inconclusive. More puzzling was the fact that two residues in the His-triad, also pervasive in conjugative relaxases, could be mutated without affecting the secretion activity (even the double mutation H106A/H108A showed wild-type levels of DNA secretion). Although DNA secretion is a clearly different mechanism from conjugation per se, the lack of implication of the 3H constellation in the release mechanism is difficult to conciliate with the current knowledge of the conjugative DNA-processing reactions. The authors pointed out that the conserved HD domain in relaxases of the TraI_Neigo family is known to bind divalent cations (Aravind & Koonin, 1998) so that it could substitute for the 3H domain. Unfortunately, mutations H161S or D162N did not affect DNA release either, casting further doubts on this interpretation. Nevertheless, the D120N mutation clearly affected DNA secretion. This residue was proposed by the authors to be analogous of D85 in protein TrwC (see section The MOBF family). Finally, TraI_Neigo still has another unique characteristic. Its N-terminal sequence resembles a signal sequence. Although it was demonstrated that TraI_Neigo is not secreted via the general protein secretion system (Sec-dependent), it is probably bound to the cell membrane by means of its N-terminal peptide. Thus, TraI_Neigo is a membrane-associated protein, contrary to most characterized conjugative relaxases. Lastly, the GGI element oriT is tentatively ascribed to an IR close to the relaxase gene (between genes yaf and ltgX), because an insertion at this point resulted in a non-DNA-secreting phenotype. In summary, the Neisseria DNA secretion system constitutes the best analysed of the MOBH systems. However, not being a truly conjugative system, caution has to be taken before extrapolating results obtained in this system to MOBH conjugative systems. Conversely, the betaproteobacterial plasmids in MOBH3, which are so poorly characterized in their transfer properties, might be using deviant conjugation mechanisms, perhaps similar to that suggested by the properties of the Neisseria DNA release system.

The MOBC family

The MOBC family is also a well-separated relaxase family. Two MOBC relaxases have been analysed experimentally: MobC_CloDF13 (Nunez & de la Cruz, 2001) and TraX_pAD1 (Francia & Clewell, 2002b). A psi-blast search using MobC_CloDF13 as a query and a threshold of P<10−4 converged at iteration 11, retrieving 54 proteins. Twenty-one were from unfinished bacterial genomes and, therefore, are not included in Table S3. The same hit list of 32 proteins (24 corresponding to plasmid relaxases) was obtained when TraX_pAD1 was used as a query. Besides ‘canonical’ MOBC relaxases, the hit list included four proteins encoded by small Gram-positive plasmids, which were annotated as RCR initiator proteins (probably because no ORF resembling a Rep protein was found in those plasmids). Nevertheless, the appearance of a downstream T4CP (mobB-like) gene in the same plasmids may indicate that they are in fact MOBC relaxases, as will be discussed later.

The phylogeny tree of the MOBC relaxases is shown in Fig. 7. Both conjugative and mobilizable plasmids are included, as well as demonstrated ICEs or related elements showing high homology and conserved gene synteny. The family contains elements isolated from Gammaproteobacteria, Firmicutes and Tenericutes. There are three well-resolved clades in the MOBC phylogeny, according to NJ analysis. Clade MOBC1, of which MobC_CloDF13 is the prototype, is composed of a series of plasmids and other genetic elements from Gammaproteobacteria. The case of MOBC1 is exceptional in that some of these plasmids contain T4CPs but are mobilizable, not conjugative. This is the unique conspicuous exception to the rule in all plasmids we analysed. Clade MOBC2 has conjugative plasmid pAD1 as a prototype and includes relaxases from Gram-positive bacteria (phyla Firmicutes and Tenericutes). Subclade MOBC21 is composed of enterococcal relaxases while subclade MOBC22 includes a number of as yet nonanalysed putative relaxases from Spiroplasma. Clade MOBC3– not well resolved by ML analysis – comprises the above-indicated putative RCR replication proteins, which are the most divergent in sequence. T4CP genes are usually found adjacent to and upstream from relaxase genes (Fig. 7). Exceptions included the MOBC1 plasmid pVS54, the MOBC2 plasmids pSci5, pSci6 and pSci1, in which no T4CP was found, and clade MOBC3 plasmids, which contained T4CP genes downstream from the relaxase gene. The oriTs, when known, were located upstream of the corresponding mob genes, except in the enterococcal members (MOBC21), where the oriT was located between oppositely transcribing mob genes.

7

Phylogeny tree of MOBC relaxases. Important clades (C1–C3) and subclades (C21 and C22) are highlighted in brown colours. Codes for underlines, asterisks, boxes, gene synteny and vertical bars are the same as in Fig. 2. Fratu, Francisella tularensis; Vibsh, Vibrio shilonii; Citko, Citrobacter koseri; Erwca, Erwinia carotovora.

The phylogeny tree of MOBC T4CPs is shown in Fig. S2. Comparison of this figure with Fig. 7 shows that both trees are highly congruent, showing evident conservation of the basic clades/subclades MOBC1, MOBC21, MOBC22 and MOBC3. The only exception is plasmid p4028 whose T4CP clusters closer to the Spiroplasma homologues.

Genetic elements CloDF13 (Nunez & de la Cruz, 2001), pAD1 (Francia & Clewell, 2002b), pAM373 (De Boever et al., 2000), pTEF1 (Coburn et al., 2007), ICE EcoR31 (Schubert et al., 2004), ICE Kp1 (Lin et al., 2008) and p29930 (Strauch et al., 2003) code for demonstrated relaxases. Conjugative plasmid pAD1 and the pathogenicity island (PAI) of Enterococcus faecalis V583 code for transfer systems with about 95% DNA identity, raising the possibility of the enterococcal PAI being self-transmissible (not demonstrated experimentally). The Mob regions in plasmids pTEF1 and pAM373 are closely related and both were shown to be conjugative (De Boever et al., 2000; Coburn et al., 2007). T4CP and relaxase gene homologues were found in the last four enterococcal genetic elements on opposite sides of oriT following the same synteny. Yersinia cryptic plasmids pCRY and pYtb32953 encoded relaxases and T4CPs homologous to the corresponding genes in plasmids CloDF13 and p29930. Moreover, they encode a T4SS related to that of p29930, suggesting that they might be conjugative. p29930 was shown to be conjugative between Yersinia strains and between E. coli and several Yersinia strains (Strauch et al., 2003). Several MobC homologues are present in GIs of E. coli, Erwinia carotovora, Klebsiella pneumoniae, Citrobacter koseri and Vibrio shilonii. Self-transmission of the E. coli and Klebsiella GIs was confirmed experimentally following excision from the chromosome (Schubert et al., 2004; Lin et al., 2008) warranting their classification as ICEs. Erwinia GI HAI7 carries all components of relaxosome and T4SS, all highly related to those present in ICE EcoR31 (Evans et al., 2009). Gene synteny is maintained in the mobilization region of most proteobacterial genetic elements, with oriT located upstream the mob genes. Interestingly, these elements are widespread among Enterobacteriaceae members, including a number of Yersinia enterocolitica and Yersinia pseudotuberculosis clinical isolates. Thus, their contribution to the transmission of virulence factors has been suggested (Schubert et al., 2004; Lin et al., 2008). A growing number of E. faecalis and Enterococcus faecium clinical isolates contain antibiotic resistant and virulence plasmids with relaxases similar to TraX_ pAD1, underscoring the importance of MOBC relaxases in antibiotic resistance and virulence determinants dissemination (M.V. Francia et al., unpublished data) (Clewell et al., 2007).

Inspection of the alignment of MOBC1 and MOBC2 relaxases reveals a conserved signature D-x6–17-E-x-E-(RL)-x2-K-x3-R-(YF) in a region with strong conservation in adjacent residues (Fig. 8). This is the only invariant signature in the family, further indicating that MOBC relaxases do not contain equivalents to motifs I and III of the other relaxase families. The signature does not degrade significantly when MOBC3 plasmids are included; thus, it could be considered a diagnostic signature for the whole MOBC family. As a consequence, MOBC relaxases appear to be unrelated to the relaxases of the other five families.

8

Conserved sequence motifs in the MOBC family. Codes are the same as in Fig. 3. pAD1 relaxase essential amino acids (see section The MOBC family) are represented by black arrowheads.

To provide a further insight into the mechanism of action of MOBC relaxases, MobC_CloDF13 and TraX_pAD1 were analysed in our laboratories. Site-directed mutagenesis of conserved TraX_pAD1 amino acid residues D152, E170, E172, K176, R180 and Y181 revealed that they were essential for transfer activity (M.V. Francia et al., unpublished data). As a result, Y181_pAD1 (see Fig. 8) is proposed to provide the nucleophile for the attack of the scissile phosphodiester bond. It has to be noted that Spiroplasma relaxases, containing Phe instead of Tyr at the equivalent position, have not been assayed experimentally. The triad composed of residues D152, E170 and E172 in TraX_pAD1 might fulfil a Mg+2-coordinating function similar to the His triad (or the HD motif) in other relaxase families.

Another unique characteristic of MOBC relaxases, demonstrated with MobC_CloDF13, is that the relaxase does not remain covalently bound to the nicked CloDF13 DNA upon cleavage, providing the only reported case of a free (non-protein-bound) nicked DNA molecule as an intermediate in a conjugation or mobilization system (Nunez & de la Cruz, 2001). Moreover, MobC could not relax CloDF13 DNA in vivo in the absence of MobB, suggesting a dual participation of MobB in CloDF13 mobilization. Besides its function as a T4CP, it could also be regarded as a nicking-accessory protein for MOBC activity. It is tempting to speculate a similar helper function for mobB-like products in the case of the four putative RCR proteins. In this respect, the mobB homologue (orf2) of the streptococcal plasmid pUA140 was shown to be essential for stable plasmid maintenance along with orf1 (mobC-like gene) (Zou et al., 2001).

The oriT sequences of MOBC plasmids and ICEs show a large inverted repeat adjacent to a series of short direct repeats. Although some plasmids show conservation of the stem-loop structure, where the nic site was located for the prototype plasmids CloDF13 and pAD1, specificity in the nicking reaction was proposed to be determined by the interaction of the respective relaxases with the direct repeats (Francia & Clewell, 2002b).

The MOBQ family

A psi-blast search (threshold e<10−4), which queried the microbial sequence databases with the sequence of MobA_RSF1010 relaxase, the known MOBQ prototype (Francia et al., 2004), converged in the fifth iteration. One hundred and fifteen hits were retrieved, originating from mobilizable as well as from conjugative plasmids (Table S4). From the first iteration, the family included a number of relaxases that also appeared in the blast search querying TraI_RP4 (the MOBP prototype), indicating that MOBQ and MOBP overlap. A phylogenetic tree was constructed from the alignment of all MOBQ sequences, and is shown in Fig. 9. Prototype relaxases of the MOBQ family are TraA_pTi (the conjugative relaxase of pTi plasmids of Agrobacterium, which has to be distinguished from VirD2 relaxase, responsible for T-DNA transfer to plant cells) and TraA_p42a (same from Rhizobium plasmids) besides MobA_RSF1010 and TraA_pIP501.

9

Phylogeny tree of MOBQ relaxases. Codes for underlines, asterisks, boxes, gene synteny and vertical bars are the same as in Fig. 2. Nitmu, Nitrosospira multiformis; Nitha, Nitrobacter hamburguensis; Messp1, Mesorhizobium sp. BCN1 – accession number YP_665951; Messp2, Mesorhizobium sp. BCN1 – accession number YP_665867; Sphal, Sphingopyxis alaskensis; Jansp, Jannaschia sp.; Lacca, Lactobacillus casei; Lacla, Lactococcus lactis [pDOJH10L (1), accession number NP_694600; pDOJH10L (2), accession number NP_694596].

Several clades are well resolved in Fig. 9, using both NJ and ML approaches. The first clade, MOBQ1, brings together mobilizable plasmids related to the IncQ plasmid RSF1010. The prototype is MobA_RSF1010. The members of this clade are related to MOBP, because they appeared in a blast using TraI_RP4 as a query. The intersection with MOBP only applied to the MOBQ1 clade. The rest of MOBQ relaxases have further evolved so that similarity with MOBP was not longer detected by psi-blast when using the standard threshold. MOBQ1 relaxases are highly divergent; hence, some internal nodes of this subclade change in the ML topology.

MOBQ2 is composed of conjugative plasmids of Rhizobium and Agrobacterium, among others in the order Rhizobiales. The prototype for this clade is relaxase TraA_p42a (Perez-Mendoza et al., 2006). MOBQ2 relaxases contain a DNA-helicase domain in their C-terminus, such as MOBF relaxases. It has to be pointed out that pTi plasmids contain two relaxases. One, called VirD2, is responsible for T-DNA transfer to plant cells and belongs to the MOBP type. The second, TraA, is responsible for plasmid transfer between Agrobacterium, and is the one represented in this group. Another perhaps surprising characteristic of these relaxases is that they seem to act preferentially in cis (Perez-Mendoza et al., 2006; Cho & Winans, 2007).

Other clades are also apparent. We will only mention clade MOBQ3, which contains plasmids of phylum Firmicutes, the prototype of which is TraA_pIP501 (Inc18). It is the relaxase itself, when binding to the oriT, that represses expression of the tra operon (Kurenbach et al., 2006). Although pIP501 is found mainly in Gram-positive bacteria, its T4SS is very similar to those of the Proteobacteria and apparently contains the same components that assemble a channel traversing both Gram-negative bacterial membranes (Abajy et al., 2007).

The phylogeny tree of MOBQ T4CPs is shown in Fig. S3. Comparison of this figure with Fig. 9 shows that both trees are highly congruent, showing complete conservation of all branches in the basic clades MOBQ2 and MOBQ3 (MOBQ1 is composed exclusively of mobilizable plasmids, which do not contain T4CPs).

The MOBQ family was already analysed in detail in our previous review (Francia et al., 2004). Since then, the resolution of the crystal structure of an RSF1010 relaxase fragment (minMobA), in fact the identical protein from the closely related plasmid R1162 (Monzingo et al., 2007), has represented an important advance in our understanding of plasmid relaxases. This is the first relaxase structure that does not belong to MOBF. The sequences of minMobA_R1162 and TrwC_R388 are <15% identical. Nevertheless, their three-dimensional structures are very similar. This is an important finding because it links the MOBF family to the MOBP cluster. In fact, when looking at the structures, it becomes clear that they contain a homologous catalytic centre, with the His-triad coordinating a divalent metal ion and the catalytic Tyr (Y25 in minMobA_R1162) in its proximity. Besides, MOBF and MOBQ sequences share the three classical relaxase motifs, as shown in Fig. 10. There are no arguments about motif I (containing the catalytic Tyr) and motif III (the His-triad), but there is an issue concerning motif II. In the TrwC_R388 structure, residue D85 (which is essential for relaxase activity) is located near the catalytic Y18, possibly abstracting a proton from the hydroxyl group that attacks the phosphodiester bond. In the MobA_R1162 structure, E74 occupies the same position as D85 in TrwC_R388. However, Monzingo et al. (2007) have shown that mutation of E74 does not affect mobilization of R1162; hence, they believe that this residue is not important. In apparent accordance with these data, E74 is not an invariant residue in Fig. 10. It can be substituted by aspartic acid (a conservative change), but also by serine (in the case of plasmid pAV2). A much more conserved Asp in motif II is E82. Although it lies far from the catalytic centre, E82 is in a flexible loop in the resolved protein fragment and we do not know what its position will be in the complete protein. Unfortunately, its involvement in MobA activity has not yet been tested by mutagenesis. Monzingo et al. (2007) propose that the function of TrwC residue D85 is possibly not required in MOBQ relaxases, because nic-cleavage is not the limiting step in plasmid R1162 conjugation. If they are right, this can be an important mechanistic difference between MOBQ and MOBF relaxases.

10

Conserved sequence motifs in the MOBQ family. Codes are the same as in Fig. 3. Vertical black arrowheads point to residues configuring the catalytic centre. The vertical white arrowhead points to E82, another potential key residue (see The MOBQ family).

The MOBP cluster

A psi-blast search using TraI_RP4 as a query and a score threshold P<10−4 converged after 14 iterations and retrieved 259 relaxase sequences (Tables S4–S6). MOBP represents therefore the largest relaxase family (see Fig. 1). Among the hits were relaxases belonging to clade MOBQ1, which were discussed in the previous section, plus all the previously described MOBHEN relaxases (Francia et al., 2004), of which MbeA_ColE1 is the prototype. Thus MOBP and MOBQ are neighbouring protein families, while MOBHEN is totally embedded in MOBP. This notion is represented in Fig. 1 by including MOBHEN within MOBP and intersecting it with MOBQ and MOBV. Thus, MOBP is a cluster of actively evolving relaxases, with several distinguishable clades. Figure 11 shows a phylogenetic tree of the MOBP cluster (collapsed at the branches) that emphasizes its large diversity and pinpoints the prototypical clades: MOBP1 (that groups the IncP complex, the IncI complex and the IncQ2/IncG/IncP6 group), MOBP2 (the VirD2 proteins from Agrobacterium plasmids), MOBP3 (the IncX complex), MOBP4 (the IncU plasmids), MOBP5 (that totally includes the MOBHEN family), MOBP6 (the IncI2 plasmids) and MOBQ1, among several other clades whose phylogeny cannot be well resolved yet.

11

Phylogeny tree of MOBP relaxases. In this tree, which contains no names, the terminal branches are collapsed to emphasize the diversity of this relaxase family. The named individual clades are represented in detail in subsequent figures and are represented in different tones of blue.

The phylogeny of MOBP1 relaxases is shown in more detail in Fig. 12. Its four well-resolved clades are shown by different tones of blue colour in the figure. These subclades are also distinguishable by ML analysis. Clade MOBP1 hosts are distributed in all divisions within the Proteobacteria. Subclades of relaxases extend also all over the Proteobacteria; thus, in this case assigning hosts in the phylogeny of Fig. 12 will not be informative. This promiscuity is probably an important property of MOBP1 plasmids and is certainly a characteristic of its most famous representative, plasmid RP4.

12

Phylogeny tree of MOBP1 relaxases. Codes for underlines, asterisks, boxes, gene synteny and vertical bars are the same as in Fig. 2. Azosp, Azoarcus sp. EbN1; Niteu, Nitrosomonas eutropha; Desps=Desulfotalea psychrophila; Nitmu=Nitrosospira multiformis.

Clade MOBP11 is composed exclusively of conjugative plasmids and has RP4 (IncP1α) and R751 (IncP1β) as prototypes. In fact, the IncP cluster (Heuer et al., 2004; Haines et al., 2006b; Bahl et al., 2007) is broad, and includes probably all plasmids listed in Fig. 12 from RP4 down to ‘plasmid 59 kb’. Thus, the IncP group is a deep, well-populated clade that contains a large number of conjugative plasmids. We propose to call it MOBP111. Because this is probably the most thoroughly analysed plasmid incompatibility group, it serves to underline the spread we can expect of an Inc group when it is analysed exhaustively. Although many other Inc groups are only represented by a single member in this work, each Inc group might be expected to encompass a clade that can reach the extension of IncP or IncF groups.

Clade MOBP12 is smaller in depth and breadth than MOBP11 and is interesting because it represents the conjugative plasmids of the IncI plasmid complex, being the prototype IncI1 plasmid R64. The IncI complex is formed by IncI1 (=IncIα), IncB, IncK and IncZ (see, for instance, Praszkier et al., 1991; Chilley & Wilkins, 1995). Most relaxases of plasmids belonging to the IncI-complex are very similar to NikB_R64 and are grouped with it in the tree. The exceptions are plasmids R387 and pSERB1 from Shigella flexneri and enteroaggregative E. coli, respectively. Although plasmid pSERB1 was ascribed to the IncI complex (Dudley et al., 2006), its replication protein (RepA) show 100% identity to that of the IncK plasmid R387 (our data), the only representative of incompatibility group IncK. Thus we can assume pSERB1 is IncK instead of IncI1. Accordingly, NikB_pSERB1 is only 75% identical to NikB_R64, while it is 93% identical to NikB_R387. Thus, in the MOBP12 phylogeny, the IncI1 and IncK exemplars (R64 and R387) seem to represent two valid phylogenetic lineages and may represent alternative evolutionary strategies. IncI and IncK plasmids are important contributors to the dissemination of resistance to extended-spectrum β-lactamases (Cloeckaert et al., 2007; Navarro et al., 2007), among other antibiotics.

Clade MOBP13 contains, among other groups, the IncL/M plasmids. The prototype is the IncL/M plasmid pCTX-M3, which is involved, as many other IncL/M plasmids, in the dissemination of resistance to extended-spectrum β-lactamase genes (Villa et al., 2000; Golebiewski et al., 2007; Novais et al., 2007). The overall homology of MOBP13 relaxases is small and some bootstrap values are not high. Thus, it is not surprising that some branches change their position in the ML tree.

Finally, clade MOBP14 is composed of a series of mobilizable plasmids; among them, those belonging to the so-called IncQ2 and IncG/IncP6 incompatibility groups. The IncQ2 plasmids pTF-FC2 and pTC-F14 (Rawlings et al., 2005) are in fact compatible, although closely related in molecular terms. On the other hand, pTC-F14 and the IncQ1 prototype plasmid RSF1010 are incompatible although less evolutionarily related. This is just one among many examples of the traps that lead to incompatibility analysis. Rms149, the prototype IncG/IncP6 group, replicates both in E. coli and P. aeruginosa, and thus has a double Inc attribution (Haines et al., 2005). Later, the same research group suggested that IncG should be merged with IncU due to homology among replicons and weak incompatibility between them (Haines et al., 2006a). However, IncU relaxases (like VirD2_pFBAOT6) belong to a different clade (MOBP4). In fact, MobA_Rms149 is about 70% identical to those of IncQ2 plasmids while it is <25% identical to VirD2_pFBAOT6. Besides, Rms149 replication protein is only 71% identical to that of pFBAOT6. Thus, we suggest that IncG and IncU are preserved, because they denote two differentiated molecular species.

The conserved motifs in MOBP1 are shown in Fig. 13. As can be seen, MOBP1 relaxases share the classical motifs of MOBF and MOBQ relaxases. The most conspicuous is the 3H motif, which shows the signature H-x-(DE)-T-(DE)-x2-H-x-H-x3-N-x3-P. The argument about the acidic residue in motif II, which was invoked in section The MOBQ family, continues here because MOBP1 relaxases also contain a conserved D/E (E80 in TraI_RP4, signalled by a white arrowhead in Fig. 13) that could play the same role as D85 in TrwC_R388. Site-directed mutagenesis of TraI_RP4 (Pansegrau et al., 1994) defined the three sequence motifs in MOBP relaxases and identified one essential residue in each motif (Y22, S74 and H116). E80 was not tested and, surprisingly, H109 or H118 showed to be nonessential (Fig. 13).

13

Conserved sequence motifs in the MOBP1 subclades. Vertical black arrowheads and black circles point to residues configuring the catalytic centre. Mutations in TraI_RP4 affecting those residues either affected conjugation (black arrowheads) or did not (black circles). The vertical white arrowhead points to E79 in TraI_RP4, another potential key residue (see The MOBP cluster). *The N-terminal sequences of these relaxases were wrongly annotated (by comparison with homologous relaxases). Visual inspection allowed us to realize that if the annotated CDSs were extended upstream (pRAS3: −57 codons; pTF-FC2: −41 codons; pPNAP06: −13 codons), motif I was found. Azosp, Azoarcus sp. EbN1; Niteu, Nitrosomonas eutropha C91; Desps, Desulfotalea psychrophila LSv54.

The phylogeny tree of MOBP1 T4CPs is shown in Fig. S4. Comparison of this figure with Fig. 12 shows that both trees are basically congruent, showing conservation of the basic clades and subclades. Thanks to the wide knowledge that has been accumulated about this clade, and more specifically the MOBP111-IncP subclade, it will be possible to find the causes for the small discrepancies in some of the terminal branches between both trees.

The phylogeny of the MOBP2, MOBP3 and MOBP4 relaxases is shown in Fig. 14. NJ and ML topologies are the same with the exception of some deep branches of subclade MOBP4. Among MOBP2 relaxases, all of them lying in plasmids from Alphaproteobacteria, the most extensively analysed to date are the VirD2 proteins of Agrobacterium plasmids. The proteins in Agrobacterium play similar roles as relaxases but for transfer of the T-DNA to plant cells in a phenomenon similar to conjugation (Zambryski et al., 1989). The molecular analysis of VirD2_pTiC58 demonstrated that VirD2 proteins are similar to conjugative relaxases in the molecular details of their activity (Pansegrau et al., 1993). Besides this, no MOBP2 relaxase has been studied in detail.

14

Phylogeny tree of MOBP2+P3+P4 relaxases. Codes for underlines, asterisks, boxes, gene synteny and vertical bars are the same as in Fig. 2. Silsp, Silicibacter sp. TM1040; pSWIT02 (1), accession number YP_001260320; pSWIT02 (2), accession number YP_001260361; Dessp, Desulfotalea psychrophila.

The MOBP3 clade is composed of a series of relaxases from IncX plasmids of Gammaproteobacteria, as shown in Fig. 14. TaxC, the relaxase of the IncX2 plasmid R6K (Nunez et al., 1997) can be considered the prototype of the group. Interestingly, plasmid R6K is peculiar in that it contains two functional oriTs (Avila et al., 1996). IncX plasmids were separated in two groups because of lack of hybridization in their replication regions (Jones et al., 1993). However, analysis of their relaxases, as shown in Fig. 14, demonstrates that the IncX group can be considered a real clade, with two branches represented by IncX1 and IncX2 plasmids. The IncX1 plasmids are widespread among opportunistic bacterial pathogens, and they code for multiple-drug-resistant efflux pumps as well as the formation of fimbriae (Norman et al., 2008).

The MOBP4 clade is extensive, with many deep branches encompassing the various classes of Proteobacteria. It includes the IncU plasmids RA3 and pFBAOT6, that are closely related and have a broad host range, being maintained in Alpha-, Beta- and Gammaproteobacteria (Rhodes et al., 2004; Kulinska et al., 2008). Recently, Qnr quinolone resistance genes were found located in IncU plasmids in Aeromonas isolates, underscoring the importance of these plasmids in Qnr diffusion outside Enterobacteriaceae (Cattoir et al., 2008). Other subclade is composed of plasmids isolated from unknown bacteria in soil, the representatives of which are pIPO2 (Tauch et al., 2002) and pSB102 (Schneiker et al., 2001). Other clades contain conjugative plasmids from Campylobacter (pTET and pCC31) or Actinobacillus actinomycetemcomitans (pVT745), a periodontal pathogen, among others.

The phylogeny of the MOBP5, MOBP6 and MOBP7 relaxases is shown in Fig. 15. As can be observed in the phylogenetic tree, the MOBP5 clade groups all MOBHEN plasmids together with plasmids pAsa1 and pAsa3, with a bootstrap value of 98%. The MOBHEN subclade contains only mobilizable plasmids and was discussed extensively in our previous review (Francia et al., 2004). Interestingly, the relaxases of plasmids pAsa1 and pAsa3 (together with their close relatives pAsal1, pAsal2 and pPNAP08, not shown in the figure) still contain a classical 3H motif. Thus, we assume the HEN motif arose by mutation of the 3H motif in ancestral plasmids similar to pAsa1 or pAsa3. Once the MOBHEN subclade arose, it was evolutionarily very successful and gave rise to the wide radiation that can be observed in the tree. It will be interesting to determine the functional consequences of this change in the relaxase active site.

15

Phylogeny tree of MOBP5+P6+P7 relaxases. Codes for underlines, asterisks, boxes, gene syntenies and vertical bars are the same as in Fig. 2. Psesy, Pseudomonas syringae; Niteu, Nitrosomonas eutropha; Lacla, Lactococcus lactis; Desps, Desulfotalea psychrophila; Lacbr, Lactobacillus brevis.

Clade MOBP6 contains plasmid R721, the prototype of the IncI2 incompatibility group. Not much is known about this plasmid. It produces two types of pili, as IncI1 plasmids: a thin type, not involved in conjugation but in the stabilization of aggregates and a thick type, which allows conjugation in solid surfaces (Bradley et al., 1984). Besides, R721 contains a shufflon structure, similar to that of IncI1 plasmids that allows them to alternate some surface antigenic determinants. The shufflon is a DNA region that undergoes complex rearrangement mediated by a plasmid-encoded site-specific recombinase called Rci (Kim & Komano, 1992).

Clade MOBP7 is a large clade with profound branches (their resolution is different in NJ and ML topologies). It is the only MOBP clade, out of all those described above, which extends over several bacterial phyla, including Proteobacteria, Firmicutes and Fusobacteria. Little is known about most of these plasmids. The exceptions are the E. faecalis pheromone-responsive plasmid pCF10 (Chen et al., 2007) and two small plasmids from Staphylococcus aureus called pC221 and pC223, which can be mobilized by the conjugative plasmid pGO1 (Caryl et al., 2004; Caryl & Thomas, 2006).

Although not included in any of the above-mentioned MOBP clades, pEF1 and pHTβ-like plasmids deserve a specific comment due to their recognized role in antibiotic resistance dissemination. pHTβ-like plasmids are highly conjugative plasmids frequently found in glycopeptide and aminoglycoside-resistant Enterococcus clinical isolates in Japan and United States (Tomita et al., 2002; Tomita et al., 2003), while vancomycin resistance was frequently associated with conjugative pEF1-related plasmids in E. faecium isolates from different continents (Freitas et al., 2008; Romo et al., 2008).

Several relaxases of the MOBP cluster have been analysed biochemically. In general, they are large proteins and contain the relaxase activity in their N-terminal domain. The prototype is the MOBP1 relaxase TraI_RP4 (Pansegrau & Lanka, 1996). Some details of its sequence and properties have been discussed above, when dealing with MOBP1 relaxases. Other relaxases that have been analysed are the MOBP1 relaxase NikB_R64 (Furuya & Komano, 2003), MOBP2 VirD2_pTiC58 (Pansegrau et al., 1993), MOBP3 TaxC_R6K (Nunez et al., 1997), MOBP7 PcfG_pCF10 (Chen et al., 2007) and MOBP7 MobA_pC221 (Caryl & Thomas, 2006). In general terms, they all behave similarly to TraI_RP4. With respect to the analysed MOBF relaxases, in the MOBP relaxases there is only one active tyrosine in their catalytic centre, and this seems to be an important mechanistic difference between MOBF relaxases and the MOBP cluster. MOBP relaxases will probably be structurally similar to MobA_RSF1010, which has been already been discussed (see section The MOBQ family).

The MOBV cluster

psi-blast searches were carried out using MobM_pMV158 as a query. When the standard threshold (e<10−4) was used, the set of retrieved proteins included MOBP3 and MOBP4 relaxases. This result suggests that MOBV and MOBP proteins are ancestrally related and that their similarity can still be detected by psi-blast. Iterative blast under these settings converged at the 18 iteration, but ended up retrieving many unrelated sequences. When a more stringent P<10−5 threshold was used, the search converged in the eighth iteration, retrieving 105 proteins (Table S7). Ninety-eight hits were plasmid associated while six relaxases were encoded by demonstrated ICEs or IMEs. The MOBV phylogenetic tree, rooted with TraI_RP4 and a number of MOBP3 and MOBP4 relaxases, is shown in Fig. 16. MOBV plasmids have extended along several bacterial phyla, mainly Firmicutes and Bacteroidetes (where they are known to be associated to various antibiotic resistance genes), but there are also small clades from Proteobacteria, Cyanobacteria and Spirochaetes. Irrespective of their huge diversity, MOBV elements, including the Bacteroidetes transposons, share the same genetic organization of the Mob region and similarities in their oriTs (Francia et al., 2004).

16

Phylogeny tree of MobV relaxases. Branches belonging to different clades (V1–V5) and subclades (V41 and V42) are represented in distinct colours from garnet to light pink. MOBP1 (dark blue), MOBP3 (light blue) and MOBP4 (green-blue) relaxase plasmids are included for rooting. Codes for underlines, asterisks, boxes and vertical bars are the same as in Fig. 2.

MOBV genetic elements with an experimentally demonstrated relaxase are the Streptococcus mobilizable plasmid pMV158 (Priebe & Lacks, 1989; Guzman & Espinosa, 1997) (de Antonio et al., 2004), the Bordetella plasmid pBBR1 (Szpirer et al., 2001) and the Bacteroides transposons Tn4555 (Smith & Parker, 1998) and Tn5520 (Vedantam et al., 2006). Besides, there is experimental evidence of mobilization for additional 21 plasmids in Firmicutes, Bacteroidetes and Proteobacteria and for three mobilizable transposons (Bacteroidetes and Firmicutes) as indicated in Table S7. Transposon CTnBST was also demonstrated to be conjugative (Gupta et al., 2003). Plasmids pVEIS01, pA and pCC7120α are assumed to be conjugative due to the presence of T4CPs and VirB4-like genes, although they are located far from the corresponding relaxase. Plasmid pCC7120α is the first and only plasmid in the phylum Cyanobacteria for which there is experimental evidence of conjugative transfer (Muro-Pastor et al., 1994).

The main well-resolved MOBV clades are outlined in different colours in Fig. 16. Clade MOBV1 is the most populated. It comprises relaxases encoded by Firmicutes plasmids showing different degrees of sequence conservation. The prototype is the streptococcal plasmid pMV158. Clade MOBV2 contains plasmids from Proteobacteria and Cyanobacteria. The prototype is the θ-replicating plasmid pBBR1. Clade MOBV3 is a small clade including genetic elements from Bacteroidetes, of which relaxase in Tn5520 is the prototype. Clade MOBV4 contains plasmids and mobilizable transposons from Bacteroidetes and Firmicutes. Tn4555 is considered the prototype. Other clades indicated in the figure are small and composed of nonstudied plasmids.

Although MOBV relaxases were first believed to be associated exclusively to RCR plasmids, θ-replicating plasmids as well as mobilizable and conjugative transposons do also contain MOBV relaxases. Although the family has grown considerably in the last years, Motif I (H-x2-R) and Motif III (H-x-DE-x2-PH-x-H), as defined in Francia et al. (2004), are still generally well conserved (Fig. 17). Major exceptions belong to Bacteroidetes relaxases in clades MOBV3 and MOBV4. Motif I in these relaxases shows the signature (HI)-x2-R-x2-E while Bacteriodetes relaxases from the genetic elements in Clade MOBV4 contain an extended Motif III (H-x-DE-x8–24-P-x2-H-x-H). Clade MOBV5 is the most deviant and Motif I is not found at all in its members. Nevertheless, psi-blast searches started from pW240, pRm1132, pPMA4326D or pC converged with MOBV relaxases in all cases. Thus, we believe they are bona fide members of the MOBV family.

17

Conserved sequence motifs in the MOBV family. Codes are the same as in Fig. 3.

Unfortunately, we have very little data on the characterization of MOBV relaxases. The MOBV1 relaxase MobM_pMV158 was purified by de Antonio et al. (2004) and shown to behave grosso modo as a classical relaxase. Specific recognition and nicking of its cognate oriT has also been demonstrated for the MOBV2 relaxase Mob_pBBR1 (Szpirer et al., 2001). The interaction of the MOBV3 relaxase BmpH_Tn5520 with its oriT has also been analysed (Vedantam et al., 2006). Nevertheless, when looking at the conserved motifs in MOBV relaxases, it seems obvious that they are quite different from MOBP or MOBF relaxases, although it seems that the 3H motif III is conserved (Fig. 17). Besides, mutagenesis experiments on the MOBV2 relaxase Mob_pBBR1 showed that residues D120 and E121 (invariant residues in motif III) were essential for function (Szpirer et al., 2001). Strikingly, the same authors mutated all seven Tyr residues in Mob_pBBR1 but none of them seemed required for function. This fact invokes a radically different mode of action for MOBV relaxases. Thus, until more work is conducted, we are still doubtful about the DNA-processing mechanism of this family of relaxases.

Other relaxases: Tn916 relaxase

Orf20_Tn916, the relaxase encoded by the enterococcal 18-kb conjugative transposon Tn916, is a 329-amino acid protein, which holds similarity to neither of the MOB families described in this work. Nevertheless, it cleaves Tn916 oriT DNA and remains covalently associated to it (Rocco & Churchward, 2006). Tn916 integrase acts as an accessory protein in the nicking reaction. It acts as a specificity determinant by binding to oriT just upstream of the cleavage site and giving cleavage specificity to Orf20 nuclease activity. A psi-blast search using Orf20_Tn916 as a query retrieved proteins that belong to the Rep_trans superfamily of replication initiation factors, which includes RCR initiators from plasmids and phages (pfam02486, COG2946 and PHA00202 in NCBI database) but resulted in no hits against known relaxases. Orf20 homologues are present in a number of well-known ICEs from Gram-positive bacteria, such as Tn5397 (Clostridium difficile) (Roberts et al., 2001), CW459tet(M) (Clostridium perfringens) (Roberts et al., 2001), Tn5251 (Streptococcus pneumoniae) (Provvedi et al., 1996), Tn5386 (E. faecium) (Rice et al., 2007), Tn5801 (S. aureus) (Kuroda et al., 2001), ICESt1 (Streptococcus thermophilus) (Burrus et al., 2002), ICEBs1 (Bacillus subtilis) (Burrus et al., 2002) and ICELm1 (Listeria monocytogenes) (Burrus et al., 2002). The high prevalence of these Tn916-like elements among many Gram-positive species and their association to a number of antibiotic resistance genes suggests that they play an essential role in the spread of antibiotic resistance. Analogues of RCR motifs II and III (that contain the histidine residues responsible of Mg+2 coordination and the tyrosine catalytic residue, respectively) were identified in Orf20_Tn916 (Rocco & Churchward, 2006). However, the authors propose a different origin for Orf20 and its closely related proteins from that of the RCR and relaxase proteins due to the observed motif differences. Thus, the status of Orf20_Tn916 remains unsettled. This could be an example of new relaxase families that remain to be characterized.

Related to Tn916 is conjugation system of C. perfringens plasmid pCW3 (Bannam et al., 2006). Although pCW3 contains an 11-gene conjugation locus, which includes a T4CP (Parsons et al., 2007), no relaxase-like gene was identified. Besides the Clostridium plasmids, some Streptomyces plasmids (Grohmann et al., 2003) and archaeal plasmids (Greve et al., 2004) have been reported in which no relaxase was identified. Because these potentially conjugative systems, as well as their bacterial hosts, are still in the early stages of analysis, we should not draw conclusions from them until some molecular details of the biochemical mechanisms involved become available.

Relaxase sequences as tools for the classification of conjugative systems

Complete plasmid sequencing and subsequent whole-genome comparison is the best criterion for plasmid classification. But this is obviously not attainable or even desirable when many samples have to be processed in the short term, when samples may include many repetitions of the same plasmid, etc. It is for this reason that specific sequences are usually selected as representatives of complete genomes and used as express classification tools. The great success in the use of 16S rRNA gene sequences for the classification of living organisms (Woese et al., 1990) attests on the validity of this point of view. In this vein, plasmids are now routinely classified by the molecular characteristics of their replicons. This technique, called replicon typing (Couturier et al., 1988; Carattoli et al., 2005), has also been quite successful although it has shown several drawbacks. The first and most important caveat was that the set of sequences used for replicon typing was limited essentially to the classical enterobacterial incompatibility groups. A second crippling problem was that many plasmids, especially large plasmids, contain several replicons, making classification difficult. For example, IncFI plasmids harbour up to three replicons, some of them exhibiting recombination (Bergquist et al., 1986; Couturier et al., 1988); IncX2 plasmids contain a single replicase protein, but three different origins of replication (Crosa et al., 1976; Shon et al., 1982; Miron et al., 1992). A third problematic issue was that several different replication systems exist, some requiring protein initiators of either RC-replication or θ-replication, others using not protein but RNA primers, etc. Thus, on many occasions, replication regions are difficult to identify even in completely sequenced plasmids. Plasmid classification using relaxases overcomes these three problems.

First, and most importantly, MOB phylogeny trees cover the whole bacterial diversity, at least for those bacteria in which plasmids have been described and sequenced. The classification of relaxases is most complete for the phylum Proteobacteria, which contains roughly 60% of the sequenced plasmids. Second in the rank come the Firmicutes, which contain an additional 25% of the sequenced plasmids. For the remaining phyla, the number of sequences analysed is still clearly insufficient. However, with the newly available high-throughput sequencing techniques, the number of plasmid sequences is bound to increase exponentially in the next five years. We hope our classification scheme serves to put some order in the avalanche of this new information and as a tool for the researchers concerned in the classification of mobile elements and their constituent gene products [i.e. ACLAME database (http://aclame.ulb.ac.be/)] (Leplae et al., 2004).

Second, plasmids rarely code for more than one relaxase. The reasons for this fact are not completely evident, but may have to do with the mechanism of conjugative DNA-processing itself. The survival of a plasmid DNA molecule may be compromised if more that one conjugative DNA processing event starts at the same time. The fact is that, in the list of over 600 plasmids we used, if we exclude the Agrobacterium plasmids (that contain a conjugative MOBQ system together with a MOBP2 system for T-DNA transport to plant cells) only 15 plasmids contain two relaxases (conjugation was tested in none of them). In several of these plasmids, the situation might be similar to that of Agrobacterium. For instance, plasmid p42a from Rhizobium etli contains a MOBQ-type TraA relaxase as part of a complete Tra genetic system, together with a MOBP2-likeVirD2 relaxase as part of a complete Vir genetic system although there is no obvious T-DNA associated to it (Gonzalez et al., 2006). Although Rhizobium is not known to transport DNA to plant cells, one wonders what the function of the Vir system in these bacteria is. As a second example of the origin of bi-relaxase plasmids, we have the formation of cointegrates, a phenomenon that sometimes occurs under selective pressure. For instance, the enterococcal plasmid pAMα1 contains genes almost identical to those in the tetracycline-resistance Bacillus RCR plasmid pBC16 plus those of pS86, a cryptic plasmid from E. faecalis. In the absence of tetracycline, the cointegrate resolves in E. faecalis cells and plasmid pS86 is mostly maintained. However, in the presence of the antibiotic, pAMα1 generates amplification of the tetracycline resistance determinant. These processes occur via site-specific recombination involving pAMα1 relaxases, MobB and MobE, and their oriT sites (Francia & Clewell, 2002a). Both relaxases belong to the MOBV superfamily. We suspect other bi-relaxase plasmids may have analogous causal explanations.

Third, relaxase proteins are now easily identified in all plasmids capable of conjugation. We do not know of any conjugative plasmid that is not contained in one or another MOB family described in this review (but see Other relaxases: Tn916 relaxase). Although mechanisms of conjugation may differ in substantial ways among the six relaxase families, as discussed in this review, the six families together embrace all the diversity available in the databases. We hesitated and discarded to give easy PROSITE signatures to identify relaxases, because blast alignments are easy enough to run and will univocally assign any relaxase to one of the MOB families.

An additional problematic issue in relation to plasmid classification is a consequence of the presumed modular evolution of plasmids (Novick et al., 1989; Berg et al., 1998; Osborn et al., 2000). We showed in this review (Figs 4 and S1–S4) the congruence between the relaxase and the T4CP phylogeny trees for the MOBF, MOBH, MOBC, MOBQ and MOBP1 families. It is clear from these phylogenies that relaxases and T4CPs evolve congruently for long periods of time. In other words, MOB regions are stable evolutionary units. The coevolution of MOB regions and T4SSs was already observed and analysed for the MOBF family of plasmids (Fernandez-Lopez et al., 2006). Although MOB and T4SS modules recombine more often, congruence between MOB and T4SS trees is still high unless we are dealing with very deep branching clades. Occasionally, however, MOB and T4SS regions swap between plasmids. This suggests that different MOB and T4SS can rearrange and adapt to each other so that they can reformulate a new transfer system. In fact, experimentally, it is easier to swap complete T4SSs than their individual components (Bolland et al., 1990 and our unpublished data). In general, we infer from the data analysed here and elsewhere that relaxases rightly represent the whole transfer machinery for considerably broad clades.

Phylogenetic trees of a slowly changing protein as a relaxase, that can be considered a molecular clock, are a better representation of diversity than the Inc groups, which depend on the interaction of replication and stability proteins and can change with a single mutation. The trees give approximate evolutionary distances between branches. Thus, we see that the IncF, IncI or IncP complexes group relaxases with phylogenetic distances up to 0.4, while other groups, such as IncW or IncN, are probably more compact phylogenetically, with members presently not separated for more than 0.05 distance units.

Although the Inc notation is archaic, it has not yet been substituted. We do not propose to do it here, because many of the clades and subclades we proposed are still clearly underdetermined because there are few sequences available. But with the new generation sequencing strategies and the expected sequencing of many plasmid-rich metagenomic samples, the amount of relaxase sequences is expected to grow exponentially. The MOB notation that we introduce in the phylogenetic trees shown in this review can be expanded when new information becomes available without invalidating the previous assignments.

The main relaxase families, clades and subclades, and their prototype plasmids are shown in Table 1. As can be inferred from the table, most classical Inc groups of plasmids (specifically those of Enterobacteriaceae) are now located in the relaxase phylogenies. They could be used as evolutionary markers to define the place of a plasmid in the universe of plasmid diversity. When two relaxases are >90% identical, there is an almost certainty that the corresponding plasmids will share the complete backbone, as discussed before for IncW plasmids (Revilla et al., 2008). Identification of the plasmid backbone should be the main use of classification by relaxases. A finer classification, sometimes necessary for epidemiological analysis, might need additional detail, such as multilocus typing (Garcia-Fernandez et al., 2008) or even whole plasmid sequencing. Finally, it should be stressed that classification by relaxases is not meant to substitute replicon typing, but to complement it and provide additional information. As shown in Table 1, the classical Inc groups can also be named by one of the branches of the relaxase classification. This allows researchers to properly locate their test plasmid within the overall spread of plasmid diversity.

View this table:
1

Representative relaxase protein clades

Plasmid familyCladePrototype relaxase_plasmid (incompatibility group)Representative references
MOBFF11TrwC_R388 (IncW)Gonzalez-Perez et al. (2007)
TraI_R46 (IncN)Paterson et al. (1999)
pWWO (IncP9)Greated et al. (2002)
F12TraI_F (IncFI)Hekman et al. (2008)
TraI_R1 (IncFII)Csitkovits et al. (2004)
TraI_pED208 (IncFV)Lu et al. (2002)
MOBHH11TraI_R27 (IncHI1)Lawley et al. (2002)
TraI_R478 (IncHI2)Gilmour et al. (2004)
H12TraI_pIP1202 (IncA/C)Welch et al. (2007)
TraI_R391 (IncJ)Boltner & Osborn (2004)
TraI_Rts1 (IncT)Murata et al. (2002)
Tra I_pCAR1 (IncP7)Maeda et al. (2003)
H2TraI_Neigo (chromosomal GI)Salgado-Pabon et al. (2007)
MOBCC1MobC_CloDF13 (mobilizable)Nunez & de la Cruz (2001)
C2TraX_pAD1Francia & Clewell (2002b)
MOBQQ1MobA_RSF1010 (IncQ1/IncP4)Parker & Meyer (2007)
Q2TraA_pTiC58Cho & Winans (2007)
TraA_p42dPerez-Mendoza et al. (2006)
Q3TraA_pIP501 (Inc18)Abajy et al. (2007)
MOBPP11TraI_RP4 (IncP1α)Pansegrau & Lanka (1996)
TraI_R751 (IncP1β)Thorsted et al. (1998)
P12NikB_R64 (IncI1)Furuya et al. (1991)
NikB_R387 (IncK)Tschape & Tietze (1980)
P13pCTX-M3 (IncL/M)Golebiewski et al. (2007)
P14MobA_pTF-FC2 (IncQ2α) (mobilizable)MobA_pTC-F14 (IncQ2β) (mobilizable)van Zyl et al. (2003)
MobA_Rms149 (IncG/IncP6)Haines et al. (2005)
P2VirD2_pTiC58 (T-DNA transport)Scheiffele et al. (1995)
P3TaxC_pOLA52 (IncX1)Norman et al. (2008)
TaxC_R6K (IncX2)Nunez et al. (1997)
P4VirD2_pFBAOT6 (IncU)Rhodes et al. (2004)
TraS_pSB102Schneiker et al. (2001)
P5 (MOBHEN)MbeA_ColE1 (mobilizable)Varsaki et al. (2003)
P6NikB_R721 (IncI2)Komano et al. (1990)
P7PcfG_pCF10Chen et al. (2007)
MobA_pC221 (Inc4)Caryl & Thomas (2006)
MobA_pC223 (Inc10)Caryl et al. (2004)
MOBVV1MobM_pMV158de Antonio et al. (2004)
pE194 (Inc11)Horinouchi & Weisblum (1982)
pUB110 (Inc13)Lotareva et al. (2001)
V2Mob_pBBR1Szpirer et al. (2001)
V3BmpH_Tn5520 (chromosomal Tn)Vedantam et al. (2006)
V4MobA_Tn4555 (chromosomal Tn)Smith & Parker (1998)
UnclassifiedOrf20_Tn916 (chromosomal Tn)Rocco & Churchward (2006)
  • * Incompatibility groups were described for plasmids in Enterobacteria, Pseudomonas and S. aureus. They were abandoned as a classification method as explained in the introduction. IncD exists (Coetzee et al., 1985), but no member has been sequenced. IncY is composed of plasmids related to phage P1 (Capage et al., 1982). There are no references for IncE, IncR, IncS or IncV.

Concluding remarks

The subject of this review is the classification of relaxase proteins. Relaxases are essential proteins in plasmid conjugation. Because they start and end the DNA-processing reactions, they are decisive in the conjugation pathway. High relaxase sequence similarity is then synonymous to identical conjugative DNA-processing mechanism. Because evolution of relaxases and T4CPs is highly congruent, this review is also about classification of MOB regions, that is, DNA processing mechanisms during conjugation. Thus, MOB genetic systems can be classified in six families, as shown schematically in Fig. 1. Each family is somehow specific in the details of its DNA-processing mechanism. The MOBF family forms a coherent, well-resolved group of plasmids. MOBF relaxases contain two Tyr in their catalytic centre and mostly occur in conjugative plasmids. The MOBP/MOBQ/MOBHEN/MOBV cluster unites a large number of MOB regions whose relaxases contain just one tyrosine in the active centre (with a caveat in the case of MOBV, according to the mutagenesis experiments on pBBR1). In spite of these differences, the mechanism of DNA processing in the MOBP cluster seems to be similar to MOBF, because the 3D structures of MOBF relaxases (TrwC_R388 and TraI_F) and that of the MOBQ1 member MobA_RSF1010 are almost identical (see The MOBQ family). In fact, MOBF, MOBQ and MOBP show conservation of three motifs that configure the protein catalytic centre and are related to the relaxase mechanism of action, as shown in Figs 3, 10 and 13, the most conspicuous being the 3H motif. We propose to call this major group of relaxases the 3H class (defined by the existence of the Histidine triad in the catalytic centre). Judging from the point of view of relaxase classification, the present knowledge examined in this review indicates that 3H-relaxases containing MOB systems constitute the predominant mechanism of plasmid conjugation. The remaining MOB groups are clearly deviant from this mechanism. The MOBH family is composed of another coherent and well-resolved group of proteins. Although two tyrosines have been invoked as participating in DNA processing, these relaxases are clearly different from the previous groups in significant aspects (see The MOBH family). It might be that the mechanism of DNA processing of MOBH systems is different from that of 3H relaxases, although it is not yet certain. Finally, the MOBC family is the most remotely related of all. We are not sure whether there is a Tyr in their catalytic centres or, even, how is the mechanism of DNA processing as the relaxase-cleaved DNA does not remain covalently bound to the protein, but shows free 3′-ends (see The MOBC family). Further biochemical analysis of MOBC systems is required to solve these important questions.

Conjugative relaxases have remote homologues in other phosphodiesterase protein families with different functions. For instance, MOBC relaxases are related to RC-replication (Rep) proteins (see The MOBC family), MOBF relaxases to RC-transposition proteins (see The MOBF family) and MOBH relaxases to HD hydrolases (see The MOBH family). This implies that the phenomena of conjugation, replication and transposition are evolutionarily connected, at least in some of their variants. This represents to us striking examples of the plasticity of the various relaxase protein folds, which can be adapted to widely different functions. Alternatively, conjugation systems evolved by recruitment of phosphodiesterases that were performing other unrelated functions.

In conclusion, relaxases might be useful to classify plasmids (besides conjugative systems themselves). We can learn much about the properties of plasmids just by sequencing a 1-kb sequence containing the relaxase. The prevalence of a given MOB system provides information about the ecological preferences of some plasmid types. We do not know enough about mechanisms to explain these differences. Each MOB type contains details of plasmid idiosyncrasy that need to be explored further. We hope our work can be useful in this context, because it provides phylogenetic as well as biochemical data that group plasmids according to a functionally important property, and thus probably reflect outstanding functional decisions in the evolution of plasmid sequences. It is also expected that our effort in classifying the actual diversity of relaxases (and hence of plasmid diversity) will help us to reach a better understanding of the physiology of plasmids and of their conjugation mechanisms.

Supporting Information

Additional Supporting Information may be found in the online version of this article:

Table S1. List of MOBF plasmidsa.

Table S2. List of MOBH plasmidsa.

Table S3. List of MOBC plasmidsa.

Table S4. List of MOBQ plasmidsa.

Table S5. List of MOBP plasmidsa.

Table S6. List of MOBHEN plasmidsa.

Fig. S1. Phylogeny tree of MOBH T4CP. Codes for colour and name of each clade are the same as in Figure 4.

Table S7. List of MOBV plasmidsa.

Fig. S2. Phylogeny tree of MOBC T4CP. Codes for colour and name of each clade are the same as in Figure 6.

Fig. S3. Phylogeny tree of MOBQ T4CP. Codes for colour and name of each clade are the same as in Figure 8.

Fig. S4. Phylogeny tree of MOBP1 T4CP. Codes for colour and name of each clade are the same as in Figure 11.

Please note: Wiley-Blackwell is not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.

Acknowledgements

Work in the FdlC laboratory was supported by grants BFU2005-03477/BMC (Spanish Ministry of Education), RD06/0008/1012 (RETICS research network, Instituto de Salud Carlos III, Spanish Ministry of Health) and LSHM-CT-2005_019023 (European VI Framework Program). Research in the MVF laboratory was supported by grants FIS 02/3029 and FIS PI07/0664 (Spanish Fondo de Investigacion Sanitaria, Instituto de Salud Carlos III), REIPI RD06/0008 (Ministerio de Sanidad y Consumo, Instituto de Salud Carlos III-FEDER, Spanish Network for Research in Infectious Diseases) and LSHE-CT-2007-037410 (European VI Framework Program).

Footnotes

  • Editor: Eduardo Rocha

References

View Abstract