OUP user menu

Common domains in the initiators of DNA replication in Bacteria, Archaea and Eukarya: combined structural, functional and phylogenetic perspectives

Rafael Giraldo
DOI: http://dx.doi.org/10.1111/j.1574-6976.2003.tb00629.x 533-554 First published online: 1 January 2003


Although DNA replication is the universal process for the transmission of genetic information in all living organisms, until very recently evidence was lacking for a related structure and function in the proteins (initiators) that trigger replication in the three ‘Life Domains’ (Bacteria, Archaea and Eukarya). In this article new data concerning the presence of common features in the initiators of chromosomal replication in bacteria, archaea and eukaryotes are reviewed. Initiators are discussed in the light of: (i) The structure and function of their conserved ATPases Associated with various cellular Activities (AAA+) and winged–helix domains. (ii) The nature of the macromolecular assemblies that they constitute at the replication origins. (iii) Their possible phylogenetic relationship, attempting to sketch the essentials of a hypothetical DNA replication initiator in the micro-organism proposed to be the ancestor of all living cells.

  • DNA replication initiators
  • Origin binding proteins
  • AAA+ domain
  • Winged–helix domain
  • Phylogeny

1 Introduction

Since the early times of Molecular Biology, the universality of the processes involved in the transmission of genetic information has been interpreted to reflect the common evolutionary origin of all known organisms. Thus, it is currently believed that life emerged from the confluence between self-replicating RNA molecules (the prebiotic ‘RNA world’) Last Universal Common Ancestor (LUCA) of cellular life on Earth [1115].

In the classic replicon model [16], regulated DNA replication requires a trans-acting factor (the initiator) able to specifically bind to a cis-acting DNA sequence (the replicator, or replication origin), thus resembling the proposal made shortly before for the regulatory circuits in gene expression. For the last 40 years, the replicon has consistently received multiple experimental support (reviewed in [17]). In evolutionary terms, for the primordial forms of cellular life the appearance of both initiator proteins and origins of replication might be the answer to the requirement for a controlled copy of the genetic information. This is thus integrated with the rest of cellular functions, being genomes replicated once, and only once, each cell cycle. This need, risen by the appearance of the cellular level of organization, did not previously occur in the prebiotic world of self-replicating molecules.

DNA replication initiators are either single proteins, or multisubunit complexes, that bind sequence-specifically to the origins of replication (thus they are also known as Origin Binding Proteins, OBPs), where they usually assemble into oligomers [1820]. Initiators play two roles: (i) they melt the two strands of DNA and (ii) they bring to the resulting replication bubble other protein factors essential for replication. These are responsible for the extension of the replication fork (helicases), synthesis of an RNA primer (primases) and copying the template with high fidelity and processivity (DNA polymerases) [17, 18]. It is a common trait that all OBPs, in spite of their different nature and precise mechanisms of action (Table 1) [1729], require to be activated in order to exert their triggering role in DNA replication [30, 31]. Thus ATP binding and hydrolysis can define active or silent conformational states [19]. In addition, posttranslational modification (e.g., by phosphorylation) [3133] and/or down-regulation by proteolysis [3436] often are key events in controlling initiator function. Apart from the specific sequences where OBPs bind, replication origins include conserved AT-rich repeats with enhanced tendency to be unwound under superhelical stress, or by the action of non-sequence specific DNA binding proteins [17].

View this table:
Table 1

A summary of the distinct macromolecular assemblies, also termed SNUPS (Specialized NUcleoProtein complexeS, displayed as cartoons), and initiation mechanisms for the diverse OBPs (shadowed ovals) found in viruses, plasmids and chromosomes across the three ‘Life Domains’

Embedded Image
  • Dashed horizontal lines mark the boundaries between Bacteria, Archaea and Eukarya, being the former thicker to indicate that the OBPs in Archaea/Eukarya are thought to be more similar each other than to those in Bacteria [8]. The initiator proteins that are the subject of this review are typed outlined. Data to elaborate this table were taken from references [1729].

The mechanisms to achieve DNA replication initiation, astonishingly diverse, are summarized in Table 1. The structure and function of the OBPs in a number of viruses with miscellaneous types of initiation [23, 2529] and in plasmids replicating through a rolling circle-type mechanism (that implies origin nicking and extension of the resulting 3′-end, displacing the non-template strand) [2022] have been recently reviewed elsewhere. The same occurs with details concerning the macromolecular machines directly involved in DNA synthesis at the replication fork [18, 37]. Therefore, this review is focussed in those OBPs that initiate DNA replication by a functionally analogous mechanism, implying origin binding and DNA melting, namely OBPs in Gram-negative bacteria plasmids and in all bacterial, archaeal and eukaryal chromosomes. In particular, the presence of common structural domains in this kind of OBPs, beyond their marginal sequence similarities, leaves the way open to discuss their possible phylogenetic relationships [8, 9]. With the knowledge already available on present OBPs, maybe it is time to start outlining the structure and properties expected for the primordial initiator of DNA replication in the hypothetical ancestral micro-organism known as LUCA.

2 The structure and function of OBPs in Gram-negative bacteria

All known bacteria, either Gram-negatives or -positives, replicate the DNA of their single chromosome by means of an essentially identical mechanism that, in terms of OBPs, is triggered by binding of the DnaA initiator protein to a conserved, unique origin of replication (oriC) [3841]. However, initiation of plasmid extrachromosomal DNA elements follows a variety of mechanisms, classified according to the shape of the resulting replication intermediates: rolling circle (σ), strand displacement (D-loop) and ‘theta’ (θ) [20, 21]. The last type of mechanism is the most common in Gram-negative bacteria. A number of such plasmid replicons have been intensively used over the last 25 years, apart from their intrinsic interest as vectors of threatening antibiotic resistances and in biotechnological and environmental applications, as valid model systems to get hints on chromosomal replication. This is due to the fact that, albeit many plasmids encode for their own OBP (termed Rep), they still depend on the cellular factors required for DnaA/oriC replication [20, 21].

2.1 DnaA, the universal initiator of chromosomal replication in Bacteria, is an origin-specific unwinding AAA+

DnaA is the OBP functional in all bacteria characterized so far. DnaA has been extensively studied in Escherichia coli, both from the biochemical and genetic points of view, illustrating that it is not only central to initiation, but a key regulator of bacterial cell cycle [41, 42]. Since there are a number of excellent recent reviews on its function [3841], this article deals with the structure of DnaA domains.

Based on sequence similarities across different bacterial species Leucine Zipper (LZ)-like oligomerization motif (1–23) [43], required to support both chromosomal and plasmid replication [44], and a tight binding site for the DnaB helicase (24–86) [45, 46]. This overlaps with (II), the less conserved region (residues 57–129), carrying various insertions in different bacterial species [38, 39]. (III) An ATP binding domain (residues 130–350), essential for chromosomal initiation, that includes canonical Walker-A/B motifs [47] (see below), the residues (135–148) initially contacted by DnaB [45] and two potential α-helices (327–344 and 357–374) that bind acidic phospholipids, altering the conformation of the domain, to release ADP [4851]. (IV) A DNA binding region (374–467) [5254] that, based on secondary structure predictions, was proposed to consist in an α-helical bundle [55].

The crystal structure of a fragment, including domains III and IV, from Aquifex aeolicus (Ae) DnaA has been recently solved [56] (Fig. 1A). As it had been previously modelled [55], the fold of the ATP binding domain III belongs to the AAA+ superfamily [5759]. The AAA+ group includes chaperones (often coupled to proteolytic subunits) that unfold protein targets, inducers of membrane fusion, motor proteins and remodellers of DNA. The structure of AAA+ consists in two subdomains: an N-terminal α/β RecA-like fold (a five-stranded parallel β-sheet flanked, respectively, by two and at least three α-helices) plus a C-terminal α-helical bundle [56, 60]. The N-terminal subdomain includes two Walker motifs, for nucleotide binding and hydrolysis, respectively: A (or P-loop), GX4GKT (residues 172–179 in E. coli DnaA) and B, LLIDD (in DnaA, 232–236). In Walker-A the conserved Lys contacts the β- and γ-phosphates of ATP, whereas the Thr contributes to coordinate the essential Mg2+ [56, 60]. In Walker-B the two Asp carboxylates act (i) as partners for the metal and (ii) as a base activating a water molecule for nucleophilic attack on the γ-phosphate [56, 60]. Residues responsible for ATP binding are not only located in the Walker-A motif at the N-terminal subdomain, but also in two additional motifs, termed sensor-1 and 2. More specifically, a conserved Arg residue in sensor-2, found at the N-terminus of the third α-helix in the C-terminal subdomain, senses the presence or absence of the γ-phosphate [56, 60]. In addition, the purine and ribose rings of ATP stay in a pocket constituted by residues from both subdomains. Therefore, nucleotide binding and hydrolysis result in a change in the position of each subdomain respect to the other [56, 61]. Furthermore, in AeDnaA, a long α-helix (α12) connects domain III with the C-terminal DNA binding domain IV [56]. It probably would act as an effector, transmitting to the later the conformational changes occurring in the former upon ATP hydrolysis [61]. These movements, coordinated across the distinct subunits of the assemblies that they usually establish [58], appear to be crucial for the function of AAA+ proteins. Thus, among others, in pulling apart the two strands of DNA (unwinding, in replication or recombination), unravelling the secondary structure elements in a protein (unfolding, often for proteolytic degradation) or prying subunits apart (in disassembling oligomers) [59].

Figure 1

The structures of the OBPs of: (A) a bacterial chromosome, AeDnaA from A. aeolicus (Protein Data Bank, PDB, entry 1L8Q) [56], (B) a Gram-negative bacteria plasmid, RepE54 from E. coli mini-F plasmid (PDB entry 1REP) [94], and (C) an archaeal chromosome, PaCdc6 from P. aerophilum (PDB entry 1FNN) [60]. In the AeDnaA model (A), the AAA+ domain III (in purple) has been displayed, as a ribbon model of the peptide backbone, with the same orientation shown for the equivalent domain in PaCdc6 (C). The Mg2+-ADP cofactor (in red) is sandwiched between the two characteristic α/β plus all α-helical subdomains [58]. The DBD IV (HTH), structurally similar to Trp repressor [62] is in grey. In RepE54 model (B), corresponding to the monomeric species active in origin binding and initiation [84], DNA (present in the original crystal structure) has been removed for clarity. Two different orientations of RepE54 have been displayed to highlight: Left hand, the pseudo-two-fold axis (dashed line) relating the two WHs (three α-helices bundle, plus a three-stranded antiparallel β-sheet), in orange and pink, respectively, for the N-terminal and C-terminal domains (additional secondary structure elements are in grey). Right hand, the same WH1 orientation shown below for the WH in PaCdc6. In the PaCdc6 model (C), the N-terminal AAA+ domain has been colored in green, with its Mg2+-ADP cofactor in red. The C-terminal WH is shown in blue. Models were rendered with Swiss-Pdb Viewer (http://us.expasy.org/spdbv).

In AeDnaA crystal structure Helix–Turn–Helix (HTH) motif related to that found, among many other DNA Binding Domains (DBDs), in the Trp repressor Fig. 1A). In addition to the HTH motif, an extra basic loop (including a conserved Arg residue) could contact DNA minor groove or the phosphate backbone [56]. ATP is not strictly required for the sequence-specific binding of DnaA monomers to five, double-stranded, 9-mer repeats (DnaA boxes R1-R4 and M: 5′-TTA/TTNCACA) found at oriC [6365], where up to 20–30 DnaA molecules finally form an oligomeric nucleoprotein complex [6668] (Fig. 2A). However, it is essential for the specific binding of DnaA to six secondary sites (ATP-DnaA boxes: 5′-AGatct) found in the AT-rich repeats adjacent to the high-affinity R1 box [69, 70]. Cooperative binding of DnaA to the R1 and ATP-DnaA boxes, involving its oligomerization domain I [43], results in DNA unwinding, stabilization of the single strands by ATP-DnaA [70] and then in loading the DnaB helicase by displacing its loader DnaC (both are hexameric ATPases) [71]. Subsequent ATP hydrolysis, stimulated by the processivity factor of DNApol III (β-clamp) and Hda [72, 73], exerts a conformational change in DnaA [74] that becomes unable to initiate further replication rounds, a process termed RIDA, standing for ‘Regulatory Inhibition of DnaA’[41, 75, 76]. DnaA can then be re-activated either by acidic phospholipids [4851] or DnaK chaperone [77, 78] (Fig. 2A), that exchange ADP by ATP. It is noteworthy that DnaB, although it stabilizes its oligomeric state in a similar ATP-Mg2+ dependent way than the hexameric/heptameric assemblies characteristic of some AAA+ proteins [5759], is not a member of this family, on the contrary to its adaptor DnaC [19].

Figure 2

Functional similarities between the OBPs in bacterial chromosomes (A, DnaA initiator in E. coli oriC) [3840], Gram-negative bacteria plasmids (B, RepA initiator in Pseudomonas pPS10 oriV) [86], and eukaryotic chromosomes (C, ORC initiator in S. cerevisiae ARS) [130]. For the three OBPs, initiator function requires two steps: (I) dissociation and conformational activation; (II) specific binding to origin DNA repeats (blue boxes on the double helix). For DnaA-dependent initiation at oriC, the subsequent unwinding step (III) of the A+T-rich origin repeats (red boxes) has been also depicted, but analogous steps have been shown to occur in plasmid and eukaryotic initiation. In the absence of origin sequences, OBPs tend to form inactive aggregates (colored in salmon). These are either dimers, as described for RepA repressor species (established through its N-terminal WH) [90] or for the isolated Orc4 subunit (through its C-terminal WH) [187], or oligomers as characterized for ADP-bound (magenta) DnaA [75, 76] and the N-terminal AAA+ domain of Orc4 [187]. Protein chaperones of the Hsp70/DnaK family (yellow) [110, 111, 113116, 7778, 187], or the allosteric effect of specific origin DNA sequences (oriV iterons, for RepA) [99], untangle those aggregates (grey arrows) to yield the OBPs species active in initiation. This implies a conformational change (RepA N-terminal WH) and/or the assembly of other OBP subunits (Orc1-3,5-6, in green). Specific origin recognition requires ATP (blue) in the nucleotide binding sites (white pockets) of the AAA+ domains in DnaA (purple) [56, 6970] and Orc1/5 [134]. For DnaA [4851] and ORC [148], acidic phospholipids (PLs) contribute to ADP→ATP interchange. The modules responsible for origin DNA binding are the C-terminal α-helical bundle in DnaA (grey) and the activated WHs in RepA and Orc4 (orange), together with the C-terminal WH in RepA (pink) or the DBDs in other ORC subunits.

2.2 Plasmid Rep OBPs as minimal initiators consisting in origin binding WH modules

A number of plasmid replicons have been studied in their functional details by means of genetic and biochemical approaches [20, 21]. Thus P1 [79], R6K [80], RK2 [81] and pSC101 [82] replicons, among others, contribute in great manner to our current view of plasmid replication in Gram-negative bacteria. This section is focussed in RepA (26.6 kDa), the initiator protein of pPS10, a plasmid isolated from the phytopathogen Pseudomonas savastanoi [83], and RepE, the initiator of E. coli mini-F plasmid [84], since both proteins are the best characterized in structural terms. There are clear sequence similarities between pPS10 RepA, mini-F RepE and other plasmid Rep proteins [20, 21]. Thus both OBPs are valid general model systems to study the structure and function of plasmid initiators and their relation with chromosomal DNA replication.

Rep proteins usually bind to directly repeated sequences (iterons) found at their respective replication origins (oriV) to establish the initiation complex [8486] (Fig. 2B). In addition, protein–protein interactions between Rep molecules bound to two iteron tracks, located distant in either the same or in different DNA molecules, have been proposed to be a means of negative control of initiation (termed ‘handcuffing’) by pairing origins together [8789]. Some Rep proteins also bind to an inversely repeated sequence (operator) that overlaps with the promoter of the rep genes, thus acting as self-repressors [84, 90]. An HTH motif at the protein C-terminus is the main determinant of Rep binding to both operator and iteron DNA sequences [91, 92]. The most abundant Rep protein species in solution are dimers, that bind to the operator, whereas Rep monomers bind to the iterons [8486, 90]. In pPS10 RepA, mutations in a LZ sequence motif found at its N-terminus (residues 12–33) [93] enhance dimer dissociation [85]. It was predicted [86] that RepA consists of two WH domains (residues 1–132 and 133–230, respectively), a proposal then confirmed by the crystal structure of RepE54 (Fig. 1B), a mutant monomer of the related mini-F OBP, bound to iteron DNA [94].

WH are a large family of DBDs, found in proteins of both prokaryotic and eukaryotic sources, which first members to be identified, on the basis of their crystal structure, were the ‘Fork-Head’ transcription factor HNF-3γ, the globular core of the linker histone H5, CAP and LexA (reviewed in [95]). WH domains are composed of a bundle of three α-helices, plus an extra three-stranded antiparallel β-sheet [95]. In a quite relaxed way, members of the family present diverse topologies to link their secondary structure elements into a WH fold, but the canonical order is: α1, β1, α2, α3, β2, β3, plus two loops (‘the wings’) interconnecting the last three bits [95]. α-Helices 2 and 3 constitute a HTH DNA recognition module (with the interhelical angle ranging 100–150°), with α3 binding to the major groove in DNA whereas the loop linking β2 and β3 binds to the minor groove [95]. However, in a recently described WH member, RFX1, it is the loop between α3 and β2, the element that contacts DNA major groove, instead of α3, which follows the minor groove [96]. In addition, charged and hydrophobic residues exposed in both the helical bundle and the β-sheet have been implied in dimerization [97] and in interactions with other proteins [98].

It has been shown that dissociation of pPS10 RepA dimers into monomers results in a structural change from a compact arrangement of the two WH domains into a more elongated form GGACAGGG) through the major groove, whereas the N-terminal domains (WH1) form the dimerization interface GGACAGATTCA), recognizing the same sequence found in the operator (both underlined), while WH1 changes its structure and becomes able to contact the 5′-iteron end, through both the phosphodiester backbone and the minor groove [86]. The idea that Rep proteins should have two domains was independently inferred by means of sequence logo analysis of the informational content of the iteron repeats in a number of plasmids, concluding that iterons are composed of two conserved halves [79]. In the RepE54 monomer structure (Fig. 1B), the conserved N-terminal leucines are clustered along two α-helices, buried in the hydrophobic core of WH1, that resemble a folded jack-knife. The shorter helix (α1) includes the first Leu residue (Leu12 in RepA) and the larger (α2) the third and fourth (Leu26 and Leu33), whereas the second (Leu19) is found in the intervening turn [94]. Both WH domains in RepE54 are related by a pseudo-two-fold symmetry (Fig. 1B) [94]. Apart from these facts, the predictions made for pPS10 RepA [86] proved to be correct: thus the polarity of both WH domains when bound to iteron DNA, the recognition by WH1 of the phosphodiester backbone (rather than contacting bases in the major groove) and the requirement of a conformational change in WH1 upon dimerization [86]. The later was confirmed after docking studies attempting to model RepE54 dimers based on the crystal structure of the monomers, concluding that severe steric clashes would occur if WH1 does not change its structure in the dimers [94].

Recently the detailed biophysical characterization of RepA-2L2A, a mostly monomeric species of RepA in which the two first conserved Leu residues were changed to Ala [99], suggests that this mutant resembles a transient folding intermediate in the way from dimers to active monomers. The mutated α1 is disabled to fold-back into the core of the WH1, where the key hydrophobic interactions between the original leucines and Trp94 were disrupted, thus allowing the α-helix to move freely, resulting in an extended conformation of the proposed jack-knife [99]. If the N-terminal leucine residues have a direct contribution to the Rep dimerization interface or if they favour protein association indirectly (e.g., through the stabilization of the dimeric conformation) remains to be determined. In vitro, micromolar amounts of a single iteron DNA sequence actively induce in RepA both the dissociation of dimers into monomers and the predicted (see above) conformational change in the WH1 domain, consisting in a significant increase of the overall β-sheet component at the expense of the α-helical one [99]. On the contrary, binding of RepA dimers to the operator sequence neither dissociates them, nor changes their conformation [99]. The ligand-induced monomerization of RepA dimers, with a coupled conformational change, would thus be a case for the allosteric effect of a DNA substrate in the structure of its protein DBD [100102].

By specific binding to the iteron sequence repeats at oriV, Rep OBPs establish homo-oligomeric nucleoprotein complexes (Fig. 2B) similar to those formed by DnaA at oriC [6368] (Fig. 2A). However, in most plasmid replicons, these Rep–oriV complexes are insufficient to trigger DNA replication by themselves. Thus DNA supercoiling and other protein factors borrowed from the host (DnaA and the pseudo-histones HU/IHF) are required to melt the AT-rich repeats adjacent to iterons [103106]. A feature of Rep initiators is that, opposite to DnaA (see above), they do not bind ATP. Although Rep proteins can promote some structural transitions in DNA [107], most plasmids still require DnaA to aid in origin unwinding [103106] and DnaB helicase loading [44, 108]. This would explain the ubiquitous presence of one or more 9-bp DnaA boxes in most plasmid replicons [20, 21, 81]. The precise role of DnaA in plasmid origin unwinding seems not to be the same discussed for oriC melting (see above) since the ADP-bound DnaA species [109], and even a deletion mutant lacking its AAA+ domain [44], are functional in plasmid replication. As noted before, Rep OBPs experience conformational activation in a different way than DnaA. Besides the allosteric effect of iterons on RepA (see above) [99], molecular chaperones, either the triad DnaK–DnaJ–GrpE or ClpA, have been implicated in dimer dissociation, and coupled conformational activation, of the Rep OBPs (Fig. 2B) in plasmids P1 [110115] and F [116]. ClpX [117] and DnaK plus ClpB [118] chaperones have a role in the activation of the initiator of RK2 plasmid. In addition, the ClpA hexamer unfolds P1 Rep OBP, threading it into the proteolytic chamber formed by ClpP heptamers [119121]. Interestingly, ClpA, ClpX and ClpP belong to the AAA+ superfamily [5859].

In summary, plasmid Rep OBPs are examples of the viability of a DNA binding module (a duplicated WH domain) as a minimalist DNA replication initiator, provided it is able to experiment origin-specific conformational activation and then to recruit other proteins (HU, DnaA) that help with DNA unwinding and helicase loading.

3 The structure and function of OBPs in Archaea and Eukarya

The study of DNA replication initiation in eukaryotic replicons initially followed a path slower than in bacteria Autonomous Replication Sequences (ARS) (see Origin Recognition Complex (ORC) [128, 129]. ORC was first characterized in S. cerevisiae and then in any other eukaryote checked for (reviewed in [130]), soon followed by other components of the machinery responsible for DNA replication initiation. More recently, studies on DNA replication in Archaea, ‘the third Domain of Life’ [131], are growing at an impressive path, to show that these prokaryotic organisms provide simplified, yet more robust, model systems to get insights into the structure and function of the eukaryotic replication machinery. Several excellent reviews can be found in the recent literature, discussing the mechanisms and regulatory circuits of DNA replication, both in Eukarya [31, 126, 127] and Archaea [132, 133]. Therefore this section focuses in the structure and function of their OBPs.

3.1 Eukaryal ORC and Cdc6 as combined AAA+/WH machines in origin binding and remodelling

ORC initiator is composed of six protein subunits, labelled Orc1-6 (in S. cerevisiae: 120, 72, 62, 56, 53, 50 kDa, according to SDS–PAGE), that specifically bind to replicators in an ATP-dependent way Fig. 2C). In S. cerevisiae (Sc) all six ORC subunits are essential for viability and remain associated with ARS through all the cell cycle [135, 136, 125]. Yeast ORC seems to include a single copy of each subunit [130] but subcomplexes defective in Orc1 [137] and Orc6 [138] have been isolated in higher eukaryotes, together with others including extra copies of some subunits [139], being proposed to control DNA replication and cell cycle progression. Thus those two subunits would be associated in a looser way with an Orc2-5 core [140, 141]. ARS are composed of two kinds of conserved elements: A (also termed ACS, ARS Consensus Sequence, 5′-A/TTTTAT/CA/GTTTA/T) and B1–B3 [123, 126]. ORC binds to the A and B1 elements [142, 143] where, according to crosslinking experiments, all its subunits except Orc3 and Orc6 contact the DNA double helix in two patches separated by about 50 bp [144]. The role of B2 sequence is under discussion, being either the place where DNA is unwound or the binding site for other components of the pre-replicative complex, such as Cdc6 and Mcm2-7 [145]. B3 is the binding site for Abf1, a transcription factor enhancing DNA replication [146]. ORC interacts preferentially with one of the strands in double-stranded ARS [144]. ATP binds to Orc1 and Orc5 subunits (in higher eukaryotes, also to Orc4), but only the site in Orc1 presents some hydrolytic activity on the nucleotide and it is essential for ORC function [134]. Single-stranded DNA, generated after ARS melting, stimulates ATP hydrolysis and exerts in ORC a change from an elongated to a curved shape [147], somehow resembling the conformational changes described in the bacterial OBPs DnaA [70] and Rep [99] (see above). Moreover, it has been recently found that, as in bacterial DnaA [4951], acidic phospholipids also bind to ORC interfering with ARS binding, most probably through the release of ATP [148]. ORC positions nucleosomes around ARS in a way suitable for initiation [149] and contributes to repress the yeast mating type loci HML/HMR-E [150]. In the replication of the Epstein-Barr virus, ORC cooperates with the viral initiator (EBNA1) in assembling the components of the replication fork at the viral origin (oriP) [151, 152].

The regulation of the assembly and activation of replicative complexes across S. cerevisiae cell cycle is extremely elaborated, but leans on the initial ORC–ARS complex, a landing pad for a number of essential replication factors Cyclin-Dependent Kinases (CDKs) activity Anaphase-Promoting Complex/Cyclosome (APC/C) [155] and to the presence of the Cdc28 inhibitor Sic1 [156], pre-replicative complexes assemble at ARS, containing ORC and the newly synthesized co-initiator Cdc6 [157]. The ATP-bound form of Cdc6 is required for its association with ORC [158], enhancing the specificity and affinity of the later for ARS [159]. The complex formed by Cdt1 [160] and Mcm2-7 [161] then enters into the nucleus (‘licensing’) and binds to ARS [162], interacting with ORC and Cdc6 [163, 164]. During late G1, Cdc28-Cln1/2 CDKs phosphorylate Sic1, that is then poly-ubiquitinated by SCFCdc4, to promote its degradation in the 26S proteosome, and the APC/C becomes inhibited [165167]. Thus, at the G1-S transition, origins are fired when Cdc28-Clb5/6 phosphorylate Cdc6 at its N-terminus, now becoming tagged for quick proteolysis [168], and promote Cdct1 exportation out of the nucleus [162]. These two processes, together with phosphorylation by CDKs of Orc2 and Orc6 subunits and MCMs [154], co-operate to prevent re-initiation until the next cell cycle. The crucial step for the transition towards DNA synthesis is phosphorylation of Mcm2 by the Cdc7-Dbf4 kinase (DDK), resulting in a conformational change in Mcm2-7 [169]. Cdc45 becomes then associated with Mcm10 [170] and Mcm2-7 [171]. The later, detached from the pre-replicative complex by Cdc45 [136], becomes an active hetero-hexameric helicase ring, functionally similar to bacterial DnaB (see above). Mcm4, 6 and 7 would work as catalytic subunits, whereas Mcm2, 3 and 5 would have a regulatory function [172174], resembling the cycle of ATP synthesis in F1-ATPase [175]. Cdc45 also contributes to load the single-stranded DNA binding protein RPA, the PCNA clamp and the DNApol-α that, with the help of its primase activity, starts DNA synthesis (S-phase) [176]. It has been recently found that DNA replication and cell proliferation (ribosome assembly and synthesis of proteins) may be linked through the interaction between ORC and a nucleolar protein, Yph1 [177].

Regarding to the structure of the ORC subunits, it has been found that Orc1 and Orc5 from different eukaryal organisms share sequence similarity with Orc4 and altogether with eukaryotic and archaeal Cdc6 [178] (see below). Although the three-dimensional structure of any entire ORC subunit is still lacking, two independent studies have recently addressed this issue. On one hand, the crystal structure of Cdc6, an Orc1,4,5 orthologue, from the Crenarchaea Pyrobaculum aerophilum (PaCdc6, 45 kDa) [60] shows that these OBPs are composed of an N-terminal AAA+ domain, as previously proposed [158], linked to an unexpected C-terminal WH domain (Fig. 1C). This study also included the mutational analysis of both domains in the homologous Cdc6 from Schizosaccharomyces pombe (SpCdc18), followed soon by a similar report in ScCdc6 [179]. However, if the WH domain has a role in DNA binding and/or in protein–protein interactions [95, 97, 98] remains to be determined, although it has been shown that it modulates autophosphorylation in archaeal and eukaryal Cdc6 [180]. The AAA+ domains in bacterial DnaA and eukaryotic/archaeal Cdc6 are nearly identical (Fig. 3A). Some ORC subunits include characteristic extra domains. Thus, SpOrc4 bears at its N-terminus nine repeats of an AT-hook motif for DNA binding through the DNA minor groove [181, 182], shown to be sufficient for specific origin recognition by ORC [183, 184]. A portion of the N-terminus of Orc1 shares similarities with Sir3 and its three-dimensional structure has been recently shown that it is composed of Bromo-Adjacent Homology plus α-helical subdomains [185]. The later is sufficient for binding to Sir1, being thus responsible for the role of ORC in chromatin silencing [185, 150]. It is noteworthy that the Mcm proteins, as well as the subunits of the DNA polymerase clamp loading δ′:γ3:δ complex [186], are also members of the AAA+ superfamily [58, 59], that thus becomes the most versatile domain among those involved in remodelling nucleoprotein complexes for DNA replication.

Figure 3

Stereo views of the peptide backbones of similar OBP domains superposed by least squares. A: The AAA+ domains in AeDnaA [56] (Fig. 1A) and PaCdc6 [60] (Fig. 1C). Rmsd among 108 Cα atoms: 1.58 Å. B: The WHs in RepE54 [94] (Fig. 1B) and PaCdc6 [60] (Fig. 1C). Rmsd (31 Cα atoms): 1.66 Å. The figure stresses the nearly identical three-dimensional folds and topologies of the compared OBP domains, in spite of the extreme phylogenetic distances separating Bacteria and Archaea/Eukarya. Models were built using Swiss-Pdb Viewer.

An independent biophysical and biochemical study on the ScOrc4 subunit arrived to the same proposal made for the structure of its orthologue PaCdc6 [60] (see above), correctly assigning a WH fold to its C-terminal domain [187]. In addition, the same study also managed to underline functional similarities between the WHs of ScOrc4 and the Rep OBPs in Gram-negative bacteria plasmids (see above), such a role in protein–protein interactions [187]. More in detail, ScOrc4 was found to consist in two domains (residues 1–365, including the AAA+, and 366–529). The C-terminal one, although sharing low sequence similarity (19% identity) with the N-terminal domain of pPS10 RepA OBP, was proposed to fold into a similar WH, based on the conservation in ScOrc4 of key residues in the hydrophobic core of RepA/RepE54 WH1 [94] (Fig. 3B). These structural similarities between eukaryotic and plasmid initiators were not pointed out in the crystallographic study on PaCdc6 [60].

The AAA+ N-terminal domain of ScOrc4 shows a functional feature of the RepA-type prokaryotic initiators [110, 111, 113, 114, 116, 118] (Fig. 2B,C): binding to chaperones of the Hsp70 family [187]. This kind of interaction has been also reported for the initiators E1 (human papilloma virus) [188189] and UL9 (herpes simplex virus) [190] (Table 1). Yeast Hsp70 chaperone modulates the association state of the ScOrc4 subunit by dissociating oligomers into dimers [187]. Thus it could be a step towards the assembly of the complete multisubunit ORC (Fig. 2C). The conformation of some ORC subunits has been proposed to be different when the complex is assembled free or associated with ARS DNA, a process in which ATP could be the allosteric effector [147]. By analogy with its prokaryotic DnaK homologue on Rep proteins [115, 86], Hsp70 could also control the compactness between the N- and C-terminal domains of ScOrc4, thus switching between different functional states of this subunit in the replication complex. It is noteworthy that in AeDnaA, RepE54 and PaCdc6 OBPs the first long α-helix in their DBDs (α14, α2 and α16, respectively) is preceded by a shorter one (namely, α13, α1 and α15) (Fig. 1A) [56]. In pPS10 RepA (see above, on the jack-knife mechanism) [99], as well as in the loosely related Ets-1 WH [191], it has been proposed that changing the relative orientation of those α-helices is a key step in the conformational activation of their DBDs.

3.2 Archaeal Orc1/Cdc6 OBPs are simplified versions of eukaryal AAA+/WH

Well before the crystal structure of PaCdc6 (see above) made explicit the similarities between archaeal and eukaryal OBPs [60], they had been proposed based on sequence comparison analyses [158]. A number of searches for ORC/Cdc6 homologues in the genomes annotated so far have resulted in that most Archaea have a single ORC1/CDC6 gene albeit some, as Sulfolobus solfataricus, have up to three and Methanococcus jannaschii seems to have none [132, 133] (Fig. 4). Thus their encoded proteins might exert all the functions described for eukaryal ORC and Cdc6 OBPs (see above). A unique replication origin, oriC-like, was identified in the single circular chromosome of the Euryarchaea Pyrococcus abysii (Pya) after cumulative analysis of the G+C skew in each strand, a signature for prokaryotic replication origins, and verified to be the earliest replicated region [192]. oriC is bound by the Orc1/Cdc6 initiator in vivo [193]. Nevertheless, the possibility that some other Archaea (M. jannaschii among them) also could have more than one replication origin can not be excluded yet [194]. As for their bacterial chromosomal and plasmid counterparts, ORC1/CDC6 OBPs genes are usually clustered around the putative replication origin, together with those for the two subunits of the euryarchaeal DNA polymerase D [132, 133, 192]. However, this is not the case in Archaeoglobus fulgidus [194]. It is noteworthy that all the archaeal genomes analyzed so far, again with the exception of M. jannaschii, also contain a single MCM gene [10]. It has been shown that the Mcm protein of Methanobacterium thermoautotrophicum (Mt) assembles into a double hexameric ring with helicase activity [195, 196], each one resembling the functional form of the hetero-hexameric eukaryal Mcm2-7 complex [174] (see above).

Figure 4

Phylogenetic relationships between OBPs, based on their conserved AAA+ domains. A: 82 sequences of OBPs from Bacteria (DnaA), Eukarya (Orc1,4,5 and Cdc6) and Archaea (Orc1/Cdc6) were retrieved from the NCBI web site (http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?db=Protein; updated in May 2002). Multiple alignments were performed separately on the DnaA, ORCs, Cdc6 and archaeal sequence datasets with CLUSTAL-X (BLOSUM series protein weight matrix; gap penalties: 10.0 for opening and 1.0 for extension) [202]. Then profile alignments were performed between those pre-aligned clusters (penalties: 20.0 opening, 1.0 extension) prior to a final round of multiple alignment (penalties: 10.0 opening, 1.0 extension) with all sequences, followed by minor manual adjustments. Outlined sequence names correspond to those OBPs extensively discussed in this review. Boxes are color-coded according to the chemical nature of conserved (in at least 20% of sequences) amino acid residues: yellow (hydrophobic and aromatic: A, V, L, I, M, C, F, Y, W); green (polar: S, T, N, Q, H, A, C); red (acidic: D, E, plus similarly shaped N, Q); blue (basic: K, R, plus N, Q, H); pink (P); orange (G). For simplified display, three regions showing large insertions in a small subset of sequences have been removed (their position and extension are indicated between brackets on the EcDnaA sequence). The secondary structure elements (α-helices and β-strands) found in PaCdc6 crystal structure [60] are shown over the sequence alignment. The two subdomains in the AAA+ fold (Fig. 1) are contoured with a dashed black line, whereas relevant sequence motifs (Walker-A/B and sensor-1/2) are in grey boxes. Histogram below the alignment reflects the degree of residue conservation for each position. OBPs sequences and accession numbers: DnaA (Ec, Escherichia coli P03004; Tht, Thermus thermophilus Q9X9D5; Cht, Chlamydia trachomatis O84252; Al, Acholeplasma laidlawii Q9KHU8; Ae, A. aeolicus O66659; Bs, Bacillus subtilis P05648; Bob, Borrelia burgdorferi P33768; Bua, Buchnera aphidicola P29434; Cj, Campylobacter jejuni Q9PJB0; Cac, Caulobacter crescentus P35887; Der, Deinococcus radiodurans Q9RYE7; Hi, Haemophilus influenzae P43742; Hp, Helicobacter pylori Q9ZJ96; Ll, Lactococcus lactis Q9CJJ2; Ml, Micrococcus luteus P21173; Myt, Mycobacterium tuberculosis P49993; Myg, Mycoplasma genitalium P35888; Nm, Neisseria meningitidis Q9JW45; Pm, Pasteurella multocida Q9CLQ4; Prm, Prochlorococcus marinus Q51896; Pmi, Proteus mirabilis P22837; Psa, Pseudomonas aeruginosa Q9I7C5; Rm, Rhizobium meliloti P35890; Rp, Rickettsia prowazekii Q59758; Spc, Spiroplasma citri P34028; Sta, Staphylococcus aureus P49994; Spn, Streptococcus pneumoniae O08397; Sco, Streptomyces coelicolor P27902; Sy, Synechocystis sp. P49995; Tm, Thermotoga maritima P46798; Trp, Treponema pallidum O83047; Up, Ureaplasma parvum Q9PRE2; Vc, Vibrio cholerae Q9KVX6; Xf, Xylella fastidiosa Q9PHE3; Zym, Zymomonas mobilis Q9S493. Orc1 (Sc, Saccharomyces cerevisiae P54784; Ca, Candida albicans O74270; Sp, Schizosaccharomyces pombe P54789; Ce, Caenorhabditis elegans Y39A1A.12; Dm, Drosophila melanogaster O16810; Mm, Mus musculus Q9Z1N2; Hs, Homo sapiens Q13415). Orc4 (Sc, P54791; Sp, Q9Y794; Dm, NP_477320; Xl, Xenopus laevis O93479; Mm, O88708; Hs, O43929). Orc5 (Sc, P50874; Sp, O43114; Dm, Q24169; Mm, Q9WUV0; Hs, O43913; Zm, Zea mays AAL91670). Cdc6 (Sc, P09119; Ca, T46606; Sp, P41411; Stp, Strongylocentrotus purpuratus AAL37208; Xl, T46977; Mm, XP_122302; Hs, AAH25232; At, NP_565686; Os, Oryza sativa BAC03316). Archaeal Orc1/Cdc6 (Pa, Pyrobaculum aerophilum AAL62992; Ap-1, Aeropyrum pernix APE0152; Ap-2, APE0475; Ss-1, Sulfolobus solfataricus NP_341806; Ss-2, NP_342278; Ss-3, NP_343568; Af-1, Archaeoglobus fulgidus AF0244; Af-2, AF0695; Mt-1, Methanobacterium thermoautotrophicum AAB85889; Mt-2, AAB86072; pFZ1, Mt plasmid X68367; Mta-1, Methanosarcina acetivorans AAM03539; Mta-2, AAM03455; Ta, Thermoplasma acidophilum CAC11593; Tv, Thermoplasma volcanium BAB60645; Pyh, Pyrococcus horikoshii PH0124; Pya, Pyrococcus abysii PAB2265; Hh, Halobacterium sp. plasmid NRC-1 NC002607; Fea, Ferroplasma acidarmanus NC_002709). B: An unrooted neighbor-joining phylogenetic tree calculated by means of CLUSTAL-X [202] on the sequence alignment shown in (A). Numbers in nodes are the reliability values obtained after 100 bootstrapping trials. Tree was plotted with TreeView (http://taxonomy.zoology.gla.ac.uk/rod/treeview.html). Bacterial sequences are displayed in purple, whereas archaeal ones are in green and eukaryal in red.

Overall, it seems that the archaeal replication machinery is a simplified version of that functional in eukaryotes, since homo-oligomers of a single initiator protein in Archaea manage to perform the job of the more complex hetero-oligomeric assemblies found in Eukarya, but probably working on a prokaryotic type replicator. The advantage of having multiple different subunits in eukaryotic OBPs might be that these would thus be more suitable for fine, cell cycle-coupled, post-translational regulation. As pioneered by the studies on PaCdc6 [60], PyaoriC [192, 193] and MtMcm [195, 196], archaeal replicons are amenable model systems for biochemical and structural analyses, leaving aside the standing difficulties for genetic manipulation of Archaea.

4 Phylogeny of OBPs domains: Outlining the ancestral initiator

Sequence comparisons of the proteins involved in the processes relevant for the transmission of the genetic information in the three ‘Life Domains’ [131] have arrived to a paradox. Namely, similarities can be universally recognized between the components of the transcriptional and translational machines, but the proteins involved in DNA replication cluster in two clearly different groups, sharing those from Archaea and Eukarya clear similarities, whereas their functional counterparts in Bacteria seem unrelated [810]. The former specially applies to OBPs, but also to helicases, single-stranded DNA binding proteins, primases, DNA polymerases and their accessory factors [132, 133]. This observation has settled down the basis for the recent proposal that DNA replication was ‘invented’ twice independently, a scenario compatible with having LUCA a mixed RNA–DNA genome whose replication would be worked out by an RNA polymerase and a reverse transcriptase [9]. However, this view runs against the common wisdom derived from decades of studies on Molecular Biology, concluding that the essential processes share common molecular bases in all organisms (as reflected in an old biochemical adagio: ‘What is true for E. coli is true for elephants’) [197]. Among all those replication proteins just mentioned above, OBPs phylogenies are particularly significant since initiators are likely the answer to the early need, inherent to the development of a cellular organization, for a controlled replication of DNA. This is required to synchronize duplication of the genetic material with the rest of cellular processes, by means of regulating initiator expression and its biochemical activities. Control was likely not a problem for self-replicative nucleic acids in the pre-existent ‘RNA world’ [1].

With the advent of ‘the Genomic Era’, having already hundreds of genomes completely sequenced and millions of new entries in protein databases (just check it ‘on line’ at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Genome), we are beginning to exceed our ability for naming most of them with functionally significant names [198]. It is time to wonder if our current tools to annotate genomes and to identify functions are sufficient by themselves to handle such a vast amount of data in a meaningful way and to drive Biology successfully through the XXIst century. Fortunately, besides spectacular recent improvements in bio-informatic tools, experimental Science is, as always, to the rescue by means of three main approaches. (i) Proteomics, aiming to the complete description of the patterns of expression and interaction for all proteins in any cell type of an organism [199]. (ii) Structural Genomics provide the hope that solving the three-dimensional structures of ‘unknown protein entries’ in a few model genomes will provide a hint on their function, due to the existing intimate correlation between the structure and function of biological macromolecules [200]. (iii) The classic fashioned ‘curiosity driven’ research in academic groups that will continue supplying precious knowledge and human expertise on concrete biological problems, now in danger to be left aside by the other two, more fashionable, ‘discovery driven’ enterprises. In this sense experimental research on micro-organisms, in addition to allowing the possibility of sampling biodiversity through their genomes [201], has recently provided clues towards unravelling the phylogenetic origin of DNA replication [810], and thus about one of the essentials for defining a living cell.

4.1 The three-dimensional structures of a few replication proteins unveil missed sequence similarities

The most common way to establish phylogenetic relationships between proteins is to compare their sequences by means of multiple alignments, in which identical residues and conservative substitutions, as well as insertions and gaps, are taken into account. The scores thus obtained are transformed into pairwise distances among the compared sequences, that rank them in a way feasible to be represented as an schematic tree [202]. Although such approach is invaluable in most of the cases, it has a limited value if the sequences of two truly homologous proteins show a low degree of identity (say <20%), sometimes even beyond recognition (≤9%). This can be the consequence of having evolved (at least) one of the proteins at a fast rate after their divergence [203]. In these cases, the exclusive use of sequence alignments as phylogenetic criteria has the potential pitfall of concluding that two given proteins are unrelated, in spite of the existence of experimental evidence on having identical function, and even the same three-dimensional structure. Structures have the advantage of being more conserved through evolution than sequences. With the current exponential growth in the number of entries in the database of protein atomic coordinates (PDB, http://www.rcsb.org/pdb) [200], the later possibility is not exceptional anymore. It is still open to debate if we are approaching the complete coverage of protein architectures or if such a catalogue will expand further [204]. Therefore, any attempt to describe the phylogeny of a protein family should take into account the (common) three-dimensional structure of at least one of its members (ideally also those for the most divergent ones). This puts sequence features in a structural framework, bringing the ‘homology’ concept to its physico-chemical ground.

To outline just a few examples among many recently reported, the processivity subunits of the replicative DNA polymerases (β-clamp in Bacteria and PCNA in Eukarya/Archaea), as well as their loading factors (δ′:γ3:δ complex and Replication Factor C (RFC), respectively), have nearly identical three-dimensional structures, in spite of showing only marginal sequence similarities root mean square deviation (rmsd, in Å) between a discrete number of overlapped Cα atoms (in the example above, 0.9 Å for 55 residues in both clamps) [186]. Out of the group of proteins involved in DNA replication, the single-stranded DNA binding proteins that pack telomere ends in yeast (Cdc13) and ciliated protozoa (α-subunit) share the same OB-fold (rmsd=2.2 Å for 113 Cα atoms), although sequence similarities were beyond recognition [205]. Also based on structural similarity, it has been found that prokaryotes have two proteins (FtsZ, MreB) related to eukaryotic cytoskeletal components: tubulin (rmsd=2.4 Å for 178 Cα) [206] and actin (rmsd=3.7 Å for 310 Cα) [207], respectively. The same kind of structural homology criterion, applied in this review to the AAA+ domains of DnaA and Orc1,4,5/Cdc6 on one hand and to the WHs in Rep and Orc1,4,5/Cdc6 on the other (see above, Fig. 3), can be used to discriminate cases of functional analogy in non-phylogenetically related proteins. Thus, DNA primases in Bacteria and Archaea/Eukarya (DnaG and Pri, respectively) are unrelated (that is, are a case for convergent evolution) since, leaving aside a cluster of acidic residues similarly arranged to position two catalytic Zn2+/Mg2+ ions, they have completely different three-dimensional folds [208]. The power of comparing protein architectures is evident in more subtle examples, such as the WHs domains of plasmid Rep [94] and P4 phage gpα[98] OBPs. In this case the same overall fold is found, albeit with two distinct topologies (connectivities) of their α-helical and β-strand elements, pointing to another case of convergent evolution.

With the concerns just outlined in mind, the analyses of phylogenetic trees for bacterial, archaeal and eukaryal OBPs, based either on sequence alignments of their AAA+ (Fig. 4) or WH (Fig. 5) domains, must be approached with caution. Except for closely related species, long length branching is a common feature in both cases (specially for WHs), possibly pointing to quick rates of evolution after divergence from the ancestor of each domain [203]. However, OBPs from the same ‘Life Domain’ cluster together, indicating that divergence has not proceeded too far for allowing the recognition of a few conserved characteristic residues. These are particularly evident in AAA+ domains, where more key functional residues have been identified (see above) and appear consistently conserved. It is noteworthy that the topology (branching order) of the Orc1 and Cdc6 groups is inverted in both trees, being contiguous, besides to the prokaryotic sequences (DnaA's AAA+ or Rep's WH), either to the archaeal group (AAA+) or to Orc4 (WHs). As further discussed below (Fig. 6), this could reflect Horizontal Gene Transfer (HGT) [209] events affecting both OBP domains throughout the divergent evolution of initiators from their common ancestor, but this hypothesis requires further rigorous testing since deep-rooting branches are not supported by high bootstrap values (Figs. 4B, 5B) [210].

Figure 5

Phylogenetic relationships between OBPs, based on their conserved WH domains. A: The sequence alignment was generated, as specified in Fig. 4A, with a number of OBP sequences from Gram-negative bacteria plasmids (those sharing clear similarities with pPS10 RepA [20, 21]), archaeas and eukaryotes, retrieved from the NCBI web site (see Fig. 4 legend, database updated in August 2002). OBPs sequences and accession numbers (not already quoted in Fig. 4 legend): Rep (pPS10, P. savastanoi S20615; pECB2, P. alcaligenes Y10829; pRO1614, P. aeruginosa L30112; pCM1, Chromohalobacter marismortui X86092; pFA3, Neisseria gonorrhoeae M31727; pSC101, E. coli K00828; pminiF, E. coli X00959; pR6K, E. coli M65025; pGSH500, Klebsiella pneumoniae Z11775; pCU1, E. coli M18262; pXF5823, Xylella fastidiosa AF322908; pL6.5, P. fluorescens AJ250853; pTAV3, Paracoccus versutus AF390867). Orc1 (Kl, Kluyveromyces lactis P54788; Nc, Neurospora crassa T50982; Enc, Encephalitozoon cuniculi CAD26247; Cg, Cricetulus griseus AAF66067). Orc4 (Zm, Zea mays AAL10455). For color and graphic display, refer to Fig. 4. Asterisks mark the positions of the conserved Leu and Trp residues discussed to be key parts of the hydrophobic core of the WHs. B: An unrooted, neighbor-joining phylogenetic tree calculated by means of CLUSTAL-X [202] on the sequence alignment shown in (A) (see Fig. 4B). Bacterial plasmid Rep sequences are displayed in orange, whereas archaeal Orc1/Cdc6 are in green and eukaryal Orc1,4,5 or Cdc6 are in red.

Figure 6

Some of the different possible scenarios for the evolution of OBPs in Bacteria, Archaea and Eukarya. This is a speculative cartoon that is open to further combinations. According to Woese [11], LUCA genome is drawn as composed of multiple, single-gene size, DNA molecules. Membranes and cell envelopes a differentially colored to reflect their distinct composition [13]. A: Whereas the AAA+ domain in present day chromosomal initiators (purple in Bacteria and green in Archaea/Eukarya) was already found in the ancestral microorganism (LUCA, in red), being thus orthologues, the WH domains in archaeal/eukaryal (blue) and plasmid (orange) branches arose independently by convergent evolution. B: Ancestral OBP would be as proposed above, but a precursor of present day Gram-negative bacteria would have transferred horizontally, through a plasmid, a WH domain (orange) to an ancestor of the archaeal/eukaryal lineage, replacing its primordial DNA binding domain to yield a chimeric OBP. Again, the ubiquitous AAA+ domain would diverge in all branches from its precursor found in LUCA. C: OBP functions in LUCA might have been performed by independent polypeptides, including the AAA+ domain and a WH, respectively. They could be then fused into a single OBP shortly before the divergence of the Archaea/Eukarya domains, remaining the WH isolated in plasmids until present. As proposed by Forterre [219], non-orthologous gene displacement would be responsible for the substitution, just before divergence of all bacterial lineages, of the ancestral chromosomal initiator by an analogue gene of viral origin, including the required AAA+ and DNA binding modules (in purple). Panels (B) and (C) are compatible with the phylogenetic trees in Figs. 4B and 5B.

WH and AAA+ domains are ubiquitous as DNA binding and remodelling protein modules, raising the question on their appearance in evolution. In each of the annotated whole genomes available is common to find tens of proteins having a putative domain with the AAA+ signature. A recent study on the occurrence of individual domains in bacterial, archaeal and eukaryal genomes showed that WHs are the third most common protein modules (only after P-loop NTP-hydrolases and Rossmann folds) [211]. Moreover, WH is used as a basic module, besides DNA replication, in proteins related with the other processes universally involved in the transmission of genetic information. Thus, in translation the S4 protein, including a WH of the ETS type, holds together five RNA helices in the small ribosomal subunit (30S) [212]. S4 is a key protein in the early steps of the assembly of the 30S particle, strongly interacting with the DnaK chaperone [213], as described for the WH domains in plasmid OBPs. In transcription WHs are present in a number of general factors of RNA polymerase II, such as the C-terminal domains of Rap54 and Rap30 (TFIIF) and the central domain of TFIIE (reviewed in [214]). The RuvB motor protein, involved in prokaryotic homologous recombination, has the same combination of domains (AAA+ followed by WH) discussed above for Orc1,4,5/Cdc6 [215]. These facts point out to the ancestral origin of both domains, most likely traceable back to LUCA [216].

4.2 Possible alternative scenarios for the phylogenetic origin of OBPs

The possibility that the AAA+ domains in DnaA/Orc1,4,5/Cdc6, and the WHs in Rep and Orc1,4,5/Cdc6, OBPs emerged by convergent evolution (Fig. 6A) is very unlikely, given their astonishingly high degree of structural similarity (Fig. 3). Another possible scenario (Fig. 6B) takes advantage of the predominant role of plasmids in HGT among micro-organisms [209]. The gene coding for an ancestral AAA+ initiator in Archaea/Eukarya could have been fused with a WH gene of plasmid origin, transferred from a primordial member of Bacteria, to constitute distinct domains in a chimeric OBP that would then duplicate and diverge into the multisubunit ORC and Cdc6. This hypothesis provides an example on how bacteria could have contributed to build the eukaryal nuclear genome [217] an also implies that DnaA would be closer to the ancestral initiator, since it would conserve the original (non-WH) DBD. In addition, it would constitute a case for HGT affecting ‘informational’ genes, rather than the most common examples involving ‘operational’ (metabolic) ones [218]. Alternatively (Fig. 6C), as Woese proposes, LUCA might have been a community of proto-cells (‘progenotes’), rather than a proper cellular entity with defined components and lineage [11, 15]. Progenotes would interchange their genomes that, on their side, ‘more resembled mobile genetic elements than typical modern chromosomes’ [11]. Ancestral genomes could be thus conceived as consisting in a number of plasmid-like replicons, some of which would contribute with a gene coding for a distinct domain (AAA+ and WH) of the ancestral initiator. This would be more a macromolecular assembly than a multidomain OBP. The appearance of the later by gene fusion would be roughly coincident with the transition of a particular community of proto-cells to the proper cellular status, giving way to the successive establishment of each one of the three ‘Life Domains’ [15]. In such a scenario, non-orthologous gene displacement, as proposed by Forterre [219], might be responsible for the replacement, just before the emergence of the Bacteria domain, of the ancestral chromosomal initiator by an analogue gene of (say) viral origin, including the AAA+ domain and a non-WH DNA binding module. In such a case, archaeal/eukaryal OBPs would resemble the replication initiators found in LUCA more closely than bacterial DnaA. Modern plasmid Rep OBPs, composed solely of WHs domains, would be thus a relict from the ancestral modular initiators found in proto-cells.

5 Concluding remarks

The classic integrated view on the processes of transmission of the genetic information has been challenged by modern genomics, since protein sequence comparisons conclude that the set of genes for DNA replication in Bacteria clearly differs from that found in Archaea and Eukarya [810]. This divergence is specially noteworthy for the proteins that initiate chromosomal replication in Bacteria (DnaA) [3840] and Eukarya (the six subunits of the origin recognition complex, ORC) [130] which, in spite of their common function in binding to DNA replicators, lack significant sequence similarity. Nevertheless, DnaA [56] and some ORC subunits share an AAA+ module [58, 59] for ATP binding, which triggers conformational changes when the initiator complexes are assembled with origin DNA [70, 147]. It has been recently found that the proteins that initiate DNA replication of plasmids in Gram-negative bacteria (Rep) and a C-terminal domain in a subunit of yeast ORC (ScOrc4) are structurally related, in terms of protein sequence motifs, overall secondary structure, three-dimensional fold (a WH domain) and association state [187]. Furthermore, Rep OBPs also experiment conformational changes upon binding to plasmid origin sequences [99] but, unlike DnaA and ORC, they just affect the N-terminal of the two WHs in Rep and are independent of ATP. Similarities between OBPs, besides other ORC subunits such as Orc1 and Orc5, also extend to Cdc6, that regulates initiation both in Archaea and Eukarya [60]. In functional terms, Hsp70 chaperones might modulate the association state of either Rep or Orc4 in homo- or hetero-oligomeric initiation complexes, respectively [187]. It is noteworthy that interactions with chaperones are recently becoming relevant for the proper assembly of a growing number of cell factors into large functional complexes [210, 220]. However such interactions, in spite of being detected with significant frequency in recent whole-cell proteomic approaches, are still left aside dismissed as ‘contaminants’ [221], a statement that should be revised after the evidence discussed above. Plasmid RepA-type initiators form homo-oligomers when bound at their replication origins [20]. At the prokaryotic chromosomal origin oriC, the initiator protein DnaA also establishes homo-oligomeric assemblies [6568]. The eukaryotic ORC is an hetero-oligomer of six different, although structurally related, subunits [130]. Archaeal initiation complex is likely to be a homo-oligomer of a single Orc1/Cdc6 protein [193]. Thus the prokaryotic and eukaryotic OBPs would result to be variations of a unique macromolecular assembly, evolved to unwind origin DNA and then to load the factors that constitute the replication fork.

As it has been proposed in this review, to find common molecular traits among living organisms, with the aim either of gene annotation (v.g., identifying function from sequence) [198] or phylogenetic analyses (such as the attempts of defining the nature of LUCA) [1115], will probably require to combine new approaches. Among others, improved in silico tools and further expanding the current set of experimental model systems, specially towards those micro-organisms deep-branching in the ‘Tree of Life’ [131]. This would result in a broader coverage of the biodiversity, with additional benefits for both basic and applied research. Since the current bioinformatic tools, extensively used in comparative whole genome analyses [201], often fail to recognize consistent relationships between proteins if they are hidden behind low sequence similarity scores [203], future phylogenetics should rely as much on Proteomics (broadly speaking) as nowadays rest on Genomics.


The author is indebted to Dr. Ramón Díaz-Orejas for over 15 years of friendship, continuous support and shared interest in the molecular mechanisms of DNA replication. This work has been financed by Spanish CICYT (Grant PM99-0096).


  1. [1].
  2. [2].
  3. [3].
  4. [4].
  5. [5].
  6. [6].
  7. [7].
  8. [8].
  9. [9].
  10. [10].
  11. [11].
  12. [12].
  13. [13].
  14. [14].
  15. [15].
  16. [16].
  17. [17].
  18. [18].
  19. [19].
  20. [20].
  21. [21].
  22. [22].
  23. [23].
  24. [24].
  25. [25].
  26. [26].
  27. [27].
  28. [28].
  29. [29].
  30. [30].
  31. [31].
  32. [32].
  33. [33].
  34. [34].
  35. [35].
  36. [36].
  37. [37].
  38. [38].
  39. [39].
  40. [40].
  41. [41].
  42. [42].
  43. [43].
  44. [44].
  45. [45].
  46. [46].
  47. [47].
  48. [48].
  49. [49].
  50. [50].
  51. [51].
  52. [52].
  53. [53].
  54. [54].
  55. [55].
  56. [56].
  57. [57].
  58. [58].
  59. [59].
  60. [60].
  61. [61].
  62. [62].
  63. [63].
  64. [64].
  65. [65].
  66. [66].
  67. [67].
  68. [68].
  69. [69].
  70. [70].
  71. [71].
  72. [72].
  73. [73].
  74. [74].
  75. [75].
  76. [76].
  77. [77].
  78. [78].
  79. [79].
  80. [80].
  81. [81].
  82. [82].
  83. [83].
  84. [84].
  85. [85].
  86. [86].
  87. [87].
  88. [88].
  89. [89].
  90. [90].
  91. [91].
  92. [92].
  93. [93].
  94. [94].
  95. [95].
  96. [96].
  97. [97].
  98. [98].
  99. [99].
  100. [100].
  101. [101].
  102. [102].
  103. [103].
  104. [104].
  105. [105].
  106. [106].
  107. [107].
  108. [108].
  109. [109].
  110. [110].
  111. [111].
  112. [112].
  113. [113].
  114. [114].
  115. [115].
  116. [116].
  117. [117].
  118. [118].
  119. [119].
  120. [120].
  121. [121].
  122. [122].
  123. [123].
  124. [124].
  125. [125].
  126. [126].
  127. [127].
  128. [128].
  129. [129].
  130. [130].
  131. [131].
  132. [132].
  133. [133].
  134. [134].
  135. [135].
  136. [136].
  137. [137].
  138. [138].
  139. [139].
  140. [140].
  141. [141].
  142. [142].
  143. [143].
  144. [144].
  145. [145].
  146. [146].
  147. [147].
  148. [148].
  149. [149].
  150. [150].
  151. [151].
  152. [152].
  153. [153].
  154. [154].
  155. [155].
  156. [156].
  157. [157].
  158. [158].
  159. [159].
  160. [160].
  161. [161].
  162. [162].
  163. [163].
  164. [164].
  165. [165].
  166. [166].
  167. [167].
  168. [168].
  169. [169].
  170. [170].
  171. [171].
  172. [172].
  173. [173].
  174. [174].
  175. [175].
  176. [176].
  177. [177].
  178. [178].
  179. [179].
  180. [180].
  181. [181].
  182. [182].
  183. [183].
  184. [184].
  185. [185].
  186. [186].
  187. [187].
  188. [188].
  189. [189].
  190. [190].
  191. [191].
  192. [192].
  193. [193].
  194. [194].
  195. [195].
  196. [196].
  197. [197].
  198. [198].
  199. [199].
  200. [200].
  201. [201].
  202. [202].
  203. [203].
  204. [204].
  205. [205].
  206. [206].
  207. [207].
  208. [208].
  209. [209].
  210. [210].
  211. [211].
  212. [212].
  213. [213].
  214. [214].
  215. [215].
  216. [216].
  217. [217].
  218. [218].
  219. [219].
  220. [220].
  221. [221].
View Abstract