OUP user menu

The serine, threonine, and/or tyrosine-specific protein kinases and protein phosphatases of prokaryotic organisms: a family portrait

Liang Shi, Malcolm Potts, Peter J. Kennelly
DOI: http://dx.doi.org/10.1111/j.1574-6976.1998.tb00369.x 229-253 First published online: 1 October 1998


Inspection of the genomes for the bacteria Bacillus subtilis 168, Borrelia burgdorferi B31, Escherichia coli K-12, Haemophilus influenzae KW20, Helicobacter pylori 26695, Mycoplasma genitalium G-37, and Synechocystis sp PCC 6803 and for the archaeons Archaeoglobus fulgidus VC-16 DSM4304, Methanobacterium thermoautotrophicum delta H, and Methanococcus jannaschii DSM2661 revealed that each contains at least one ORF whose predicted product displays sequence features characteristic of eukaryote-like protein-serine/threonine/tyrosine kinases and protein-serine/threonine/tyrosine phosphatases. Orthologs for all four major protein phosphatase families (PPP, PPM, conventional PTP, and low molecular weight PTP) were present in the bacteria surveyed, but not all strains contained all types. The three archaeons surveyed lacked recognizable homologs of the PPM family of eukaryotic protein-serine/threonine phosphatases; and only two prokaryotes were found to contain ORFs for potential protein phosphatases from all four major families. Intriguingly, our searches revealed a potential ancestral link between the catalytic subunits of microbial arsenate reductases and the protein-tyrosine phosphatases; they share similar ligands (arsenate versus phosphate) and features of their catalytic mechanism (formation of arseno- versus phospho-cysteinyl intermediates). It appears that all prokaryotic organisms, at one time, contained the genetic information necessary to construct protein phosphorylation–dephosphorylation networks that target serine, threonine, and/or tyrosine residues on proteins. However, the potential for functional redundancy among the four protein phosphatase families has led many prokaryotic organisms to discard one, two, or three of the four.

  • Protein kinase
  • Protein phosphatase
  • Protein-tyrosine phosphatase
  • Dual-specific phosphatase
  • Bacteria
  • Archaea

1 Introduction

Views on the origin and evolution of protein phosphorylation–dephosphorylation as a key element in cellular mechanisms for the detection, transmission, and integration of intra- and extra-cellular signals have been steadily evolving since the discovery of regulatory protein phosphorylation by Krebs, Fischer, and colleagues in the mid-1950s [1]. Because of its intimate early association with hormone action, protein phosphorylation was viewed initially as the molecular extension of the neuroendocrine system into the interior of cells. The implication of this functional linkage was that regulatory protein phosphorylation arose as a late evolutionary response to the needs of ‘higher’ organisms, i.e. those comprised of multiple, differentiated cells. Not until the early 1980s did it become apparent that both ‘lower’ eukaryotes and bacteria also possessed protein phosphorylation systems (reviewed in [2,3]). However, while the protein phosphorylation events studied initially in eukaryotic microorganisms, such as yeast, bore an obvious resemblance in both form and function to those that took place in mammals, the protein phosphorylation events initially characterized in prokaryotic microorganisms generally did not. The basic unit of the emerging bacterial signal transduction paradigm, the two-component system [2], targeted aspartyl residues for modification rather than the hydroxyl amino acids associated with ‘normal’, i.e. eukaryotic, protein phosphorylation events. The histidine protein kinases of the bacterial two-component system differed radically in both primary sequence and catalytic mechanism (ping-pong versus direct transfer) from the prototype of the single eukaryotic superfamily of protein kinases: the cAMP-dependent protein kinase [4].

In the past few years, a number of laboratories reported the presence of bacteria-like two-component modules in eukaryotic organisms, such as yeast [5,6] and Arabidopsis thaliana[7]. A family of mitochondrial protein-serine/threonine kinases bearing faint, but recognizable homology to histidine protein kinases also were discovered [8]. Similarly, eukaryote-like protein-serine/threonine kinases, protein-serine/threonine phosphatases, protein-tyrosine phosphatases, and dual-specific protein phosphatases were isolated and/or cloned from assorted members of the Bacteria and Archaea (reviewed in [3,911]). Many of the earliest eukaryote-like protein kinases and protein phosphatases to be identified, such as the bacteriophage lambda protein-serine/threonine phosphatase [12] and the secreted YopH protein-tyrosine phosphatase [13] and protein kinase [14] of Yersinia pseudotuberculosis, are encoded by mobile extrachromosomal elements, such as viruses or plasmids. This suggested that they were acquired from the eukaryotic hosts of bacterial pathogens rather than inherited from their forbears [13]. Although other, genomically encoded eukaryote-like protein kinases and protein phosphatases have since been identified, the record remains a scattered and fragmentary one. Hence, a comprehensive picture of the signal transduction networks that target the hydroxyl amino acids serine, threonine, and tyrosine in organisms from the archaeal and bacterial phylogenetic domains has yet to emerge.

The recent release of a ‘critical mass’ of complete genome sequence information from members of the Bacteria and Archaea offers the opportunity to address several key questions concerning the evolution of the protein phosphorylation networks in prokaryotic organisms. Do ‘all’ prokaryotes contain ORFs encoding potential eukaryote-like protein kinases and protein phosphatases? While a single dominant superfamily of eukaryotic protein kinases has been identified, four families of eukaryotic protein phosphatases exist. Are ORFs from all four families present in all prokaryotes? Do all organisms that contain a potential protein kinase(s) also possess a potential countervailing protein phosphatase(s), and vice-versa? In this article, we describe the application of genome sequence analysis to address these questions.

2 Approach

The sequences of selected protein kinases, protein phosphatases, and related proteins (Table 1) were obtained electronically from GenBank, EMBL or SWISS-PROT. These include both representative eukaryotic enzymes and recently identified bacterial and archaeal homologs, such as the pkn family of Anabaena PCC7120 [10]. Conserved regions [1520] from these sequences encompassing conserved, signature motifs were used as templates for the initial identification of open reading frames (ORFs) whose predicted polypeptide products displayed similarity to eukaryotic and eukaryote-like protein kinases and protein phosphatases [20,21]. The following databases of bacterial or archaeal genome sequence information were searched: the Archaeoglobus fulgidus Genome Database [22], SubtiList – the Bacillus subtilis Genome Database [23], the Borrelia burgdorferi Genome Database [24], the Escherichia coli Genome Database [25], the Haemophilus influenzae Rd Genome Database [26], the Helicobacter pylori Genome Database [27], the Methanobacterium thermoautotrophicum Genome Database [28], the Methanococcus jannaschii Genome Database [29], the Mycoplasma genitalium Genome Database [30], and CyanoBase – the Genome Database for Synechocystis sp PCC 6803 [31]. SubtiList and CyanoBase are provided by Institut Pasteur and Kazusa DNA Research Institute, respectively. The Methanobacterium thermoautotrophicum Genome Database is hosted by Department of Microbiology, The Ohio State University. The remainder are maintained by The Institute for Genomic Research. In general, each database was searched with multiple templates derived from each family of protein kinases and protein phosphatases (Table 1).

View this table:
Table 1

The protein kinases, protein phosphatases and related proteins whose sequences were used for similarity search in this study

AFSK_STRO (P54741)ApaH E. coli (P05637)YBX5 yeast (P38089)DUS2 (Q05923)
cAPKα (P17612)ApaH K. aerogenes (P27510)YBF6 yeast (P34221)PTME (Q15678)
cSrc (P12931)ApaH S. typhimurium (U24176)PP2C yeast (P35182)BEM-2 (D45413)
AKT (P31748)PPP F. islandicum (U97022)KAPP A. thaliana (P46014)PTM4 (P29074)
c-Ab1 (P00519)PPP P. abyssi (Y12396)PDP bovine (P35816)MC0821 (U60315)
CDC15 (P27636)PP1 sds 21 yeast (P32945)YHN6 yeast (P38797)MYXTYRP (L31960)
Chk+ (P34208)PP1-arch1 (U35278)YCW9 yeast (P25646)YopH (P08538)
Pk1 (Y10168)PP1-arch2 (U96772)PP2C M. pneumonia (P75525)IphP (Q05918)
PKN1_MYXXA (P33973)PP1-cyano2 (U80887)CYAA C. elegans (Q09564)YAV4 (P19519)
PKN2_MYXXA (P54736)PP2A rabbit (P13353)CYAA N. crassa (Q01631)SptP (U63293)
PKN5_MYXXA (P54737)PP2B beta 1 yeast (P16299)
PKN6_MYXXA (P54738)PP Lambda (P03772)
PKNA_MYCLE (P54743)PPT yeast (P53043)
PKNB_MYCLE (P54744)PrpA E. coli (P55798)
PKND (U63893)PrpB E. coli (P55799)
  • aAbbreviations used include: PK, protein kinase; PPP, PPP-family protein phosphatase; PPM, PPM-family protein phosphatase; PTP, protein tyrosine or dual-specific phosphatase. The accession numbers by which the sequence for each can be accessed are listed in parentheses.

After tentative identification of candidate ORFs using the search algorithms provided by each database (with the exception of The E. coli Genome Database, which was searched through a web-based BLAST server from the National Center for Biotechnology Information), DNA-derived amino acid sequences were aligned with conserved regions of protein kinases or protein phosphatases using LaserGene computer software from DNASTAR (Madison, WI, USA). Where necessary, alignments were adjusted manually to maximize apparent resemblance in the areas of signature motifs conserved among known protein kinases or protein phosphatases. Visual inspection then was performed to eliminate those ORFs that lacked the minimum complement of conserved sequence features considered necessary to constitute a plausibly functional enzyme. The specific criteria employed are discussed in the sections below that are devoted to individual enzyme families. ORFs encoding proteins whose catalytic function previously had been determined to be something other than protein phosphorylation or protein dephosphorylation, e.g. adenosine tetraphosphatase [19] or arsenate reductase (see Section 7, below), also were eliminated from the list of potential ORFs for protein kinases and protein phosphatases.

3 Protein kinases

The protein kinase (PK) superfamily in eukaryotes includes protein-serine/threonine kinases, such as casein kinase II and the cAMP-dependent protein kinase, protein-tyrosine kinases, such as pp60src and the insulin receptor kinase, and dual-specific protein kinases, such as MAP kinase kinase and p34cdc2 that target both the aliphatic and aromatic members of the hydroxyl amino acids [18]. It has been estimated that a few percent of the ORFs in a typical eukaryote encode potential PKs [32], implying that a mammalian cell may contain as many as 2000 in total [33]. A substantial fraction of these ORFs are thought to express functional protein products in vivo, providing key elements of multicomponent regulatory networks capable of sophisticated feats of signal integration and the comprehensive management of cellular functions [34,35]. While other types of PKs recently were discovered in eukaryotic cells, including ‘bacterial’ histidine kinases and their derivatives, our current state of knowledge indicates that the latter play quantitatively minor and highly specialized roles within the protein phosphorylation networks of the Eucarya.

Comparison of the primary sequences of over 400 known and potential PKs from eukaryotes indicates that the core catalytic domain is approximately 280 amino acids in length [33]. This domain folds into a bilobal unit, the N-terminal portion of which binds the nucleotide triphosphate that serves as phosphoryl donor and the C-terminal portion of which binds the peptide or protein substrate [36,37]. The catalytic domain contains twelve conserved subdomains, numbered I–V, VIa, VIb, and VII–XI, that range in size from a single amino acid to stretches of 10 or, in one instance, 20 residues (Fig. 1, section C). The degree of absolute conservation is relatively small, however, with only 12 residues known to be conserved with near 100% frequency among eukaryotic PKs [4,33] (Fig. 2).

Figure 1

Alignment of DNA-derived amino acid sequences from ORFs of known or potential eukaryote-like protein kinases from prokaryotic organisms. Alignments are restricted to those areas encompassing the 12 conserved subdomains characteristic of eukaryotic PKs [18], and are designated by roman numerals above the sequences. The number of amino acid residues encompassing the regions beyond and between subdomains are also given in parentheses. Dashes indicate gaps introduced during the alignment process. Highly conserved amino acid residues are shown in bold. The aligned sequences are grouped into the following three categories: (A) ORFs identified through searches of completed archaeal and bacterial genomes; (B) representative PKs from other prokaryotes; and (C) representative PKs from eukaryotes. ORFs in section A are listed alphabetically by designators used in Table 2. An asterisk (*) designates those bacterial and archaeal enzymes for which phosphotransferase activity toward itself (autophosphorylation) or an exogenous protein has been demonstrated. Abbreviations used along with their accession numbers (in parentheses), annotations, and, where applicable, relevant literature references include: (A) AF0665 (AE001059), putative O-sialoglycoprotein endopeptidase from A. fulgidus; AF1804 (AE000978) and AF2426 (AE001107); conserved hypothetical proteins from A. fulgidus; BB0648 (AE001166), putative protein serine/threonine kinase from B. burgdorferi; HI0113 (P44523), heme utilization protein from H. influenzae; HI1537 (B64128), lic-1 operon protein (licA) homolog from H. influenzae; HP0432 (AE000559), protein kinase C-like protein from H. pylori; MG109 (P47355), putative protein serine/threonine kinase from M. genitalium; MJ0444 (Q57886) and MJ1073 (H64433), conserved hypothetical proteins from M. janaschii; MJ1130 (A64441), O-sialoglycoprotein endopeptidase homolog from M. janaschii; mt1005 (AE000873), conserved hypothetical protein from M. thermoautophicum; mt1425 (AE000904), O-sialoglycoprotein endopeptidase from M. thermoautophicum; mt1645 (AE000923), ABC transport from M. thermoautophicum; o274 (AE000211), o286 (AE000267) and o546 (AE000459), hypothetical proteins from Escherichia coli; sll0005 (D64000), sll1770 (D90908), slr0889 (D90907) and slr1919 (D90903), ABC1-like proteins from Synechocystis sp. PCC 6803; sll0776 (P54735), slr0152 (D90915), slr0599 (D90917), slr1225 (D90906) and slr1697 (D90914), eukaryotic-type protein kinases from Synecocystis sp. PCC 6803; YbdM (Z99105) and YloP (Y13937), putative eukaryotic-type protein kinases from B. subtlis; (B) AFSK-STRCO, serine/threonine protein kinase from Streptomyces coelicolor[98]; pknD, eukaryotic-type protein kinase from Anabaena PCC 7120 [10]; K04_orf389(P75524), putative serine/threonine-protein kinase from M. pneumoniae[88]; and (C) cAPKa, α-catalytic subunit of cAMP-dependent protein kinase from human; and cSrc, proto-oncogene tyrosine-protein kinase from human.

Figure 2

Schematic representation of essential sequence features of eukaryotic protein kinases found in DNA-derived amino acid sequences of potential prokaryotic homologs from completed archaeal and bacterial genomes. Group 1 outlines a linear representation of the conserved subdomains common to the 280 amino acid catalytic core of eukaryotic PKs. (Note: the sizes of these subdomains are not drawn to scale.) Amino acids considered to be invariant or nearly so in eukaryotic PKs are listed using the numbering system of Taylor et al. [4]. Groups 2–9 summarize the predicted sequence features of ORFs lacking one or more of these key subdomains or residues. The number of ORFs from Fig. 1, section A that fall into each group is given in parentheses. Group 1 includes the following ORFs: HP0432, o274, sll0776, sll1770, slr0152, slr0559, slr1225, slr1697, YbdM, and YloP. Group 2 includes HI1537, MG109, mt1645, o546, sll0005, slr1919. Group 3 consists of HI0113. Group 4 consists of BB0648. Group 5 consists of MJ1130, mt1005, mt1425, and slr 0889. Group 6 consists of AF1804. Group 7 includes AF0665 and o286. Group 8 consists of AF2426 and MJ1073. Group 9 consists of MJ0444. The asterisk indicates that, in five cases, a glutamate is found in place of the aspartate normally found at position 220 (Fig. 1, section A). Given the highly conservative nature of this substitution, this alteration was not represented in order to simplify the figure.

Table 2 summarizes the results from searches of three archaeal and seven bacterial genomes for homologs of the dominant superfamily of protein kinases from eukaryotes. All told, 28 ORFs were identified whose predicted products contained or approached a minimum complement of the sequence features considered essential for phosphotransferase activity (Fig. 1, section A). Ten of the 28 contained plausible candidates for every conserved subdomain and ‘essential’ residue, while the remainder demonstrated deviations from the eukaryotic prototype (Fig. 2).

View this table:
Table 2

List of ORFs from ten completed prokaryotic genomes that possess the signature sequence motifs found in eukaryotic protein kinases and protein phosphatases

SpeciesDomainaSize (Mbp)PKPPPPPMConv. PTPLMW PTP
Archaeoglobus fulgidus VC-16, DSM4304A2.18AF0665AF1822AF1361
Bacillus subtilis 168B4.2YbdMYjbPRsbUYtrCYfkJ
Borrelia burgdorferi B31B1.44BB0648BB0836
Escherichia coli K-12B4.6o274f219f729o430f147
Haemophilus influenzae Rd KW20B1.83HI0113HI0551
Helicobacter pylori 26695B1.66HP0432HP0431
Methanobacterium thermoautotrophicum delta HA1.75mt1005mt1586mt1355
Methanococcus jannaschii DSM 2661A1.66MJ0444MJ0215
Mycoplasma genitalium G-37B0.58MG109MG108
Synechocystis sp. PCC6803B3.57sll0005sll1387sll0602slr0328
  • aAbbreviations used include: A, Archaea; B, Bacteria; PK, protein kinase; PPP, PPP-family protein phosphatase; PPM, PPM-family protein phosphatase; Conv. PTP, conventional protein-tyrosine or dual-specific protein phosphatase; and LMW PTP, low molecular weight protein-tyrosine phosphatase.

What are these features? The first is a nucleotide binding domain, which spans subdomains I–IV. The key features of this region are the middle glycine and the highly conserved aliphatic hydrophobic or, occasionally, hydroxyl or thiol residue at the end of the GXGXXGXV sequence of subdomain I, the absolutely conserved lysine of subdomain II, and the absolutely conserved glutamate of subdomain III (Fig. 1, section C; Fig. 2). Eighteen of the 28 ORFs predict products containing all four of these features. The remaining 10 contain both the subdomain II lysine and subdomain III glutamate, but lack a glycine residue corresponding to the central one of subdomain I. With the exception of o546 from E. coli and slr1919 from Synechocystis PCC6803, all of the remaining putative subdomain I sequences possess the conserved hydrophobic or neutral residue and one or more of the other subdomain I glycine residues.

Studies on other phosphotransferases indicate that the central glycine may not be as indispensable for nucleotide binding as its near absolute conservation among eukaryotic PKs suggests. The aminoglycoside 3′-phosphotransferases transfer the gamma phosphoryl group of ATP to antibiotics, such as streptomycin, in order to neutralize their bacteriotoxic properties. Based on both primary sequence [3840] and X-ray crystallographic data [41], it is apparent that these enzymes are the evolutionary descendants of eukaryote-like protein kinases, with whom they share catalytically essential subdomains II–VII and IX. However, even though these antibiotic kinases bind and utilize ATP in a manner similar to their protein-specific cousins, they do so without using the subdomain I motif of eukaryotic PKs. The antibiotic kinases do, however, contain the conserved lysine and glutamate residues of subdomains II and III, respectively, suggesting that the latter comprise essential features for nucleotide binding. Alternatively, it is possible that the potential products of some of the ORFs lacking a eukaryote-like subdomain I may employ high energy phosphate compounds other than ATP as substrate. An unidentified PK activity from the archaeon Sulfolobus acidocaldarius, for example, has been reported to use polyphosphate as phosphoryl donor [42].

The second set of essential sequence features include the residues thought to participate directly in catalysis: the D-X4-N sequence of subdomain VIb and the conserved aspartate of subdomain VII. Twenty-six of the 28 ORFs in Fig. 1 possess all three of these highly conserved active-site residues. The two exceptions contain conservative substitutions for the aspartate of subdomain VII: asparagine in BB0648 from B. burgdorferi (Fig. 2, group 4) and glutamine in HI0113 from H. influenzae. (Fig. 2, group 3). Unlike substitutions in the nucleotide binding domain, where functional exceptions are known, the prospects of catalytic competency for the predicted products of the latter two ORFs must be viewed with a much higher degree of caution, especially in the case of BB0648, which exhibits other deviations from the eukaryotic PK prototype.

The other very highly conserved amino acid residues in eukaryotic PKs are the glutamate in subdomain VIII, the aspartate of subdomain IX, and the arginine of subdomain XI. The aspartate in subdomain IX forms hydrogen bonds with backbone amides to stabilize the conformation of the catalytic loop containing subdomain VIb [4]. All candidate ORFs possessed either an aspartate at this position (22/28) or a conservative substitute, such as glutamate (5/28) or asparagine (BB0648), that also possesses hydrogen bonding potential. (For the sake of simplicity and because of their highly conservative nature, Fig. 2 does not represent the glutamate for aspartate substitutions, which can be found in Fig. 1, section A). In eukaryotic PKs, the glutamate of subdomain VIII forms a salt bridge with the arginine of subdomain XI that, while distant from the site of catalysis, helps buttress the structural integrity of the protein/peptide binding lobe [36,37]. Curiously, while all 28 ORFs contain plausible candidates for the aspartate of subdomain VIII, five lack its arginine partner. In two cases, BB0648 from B. burgdorferi (Fig. 2, group 4) and AF1804 from A. fulgidus (Fig. 2, group 6), a lysine is present in this position, which has the potential to carry a positive charge and hence function in an analogous manner. However, in the other three the predicted sequence of the protein actually terminates short of subdomain XI (Fig. 2, groups 8 and 9). In fact, these three ORFs (AF2426 from A. fulgidus, and MJ0444 and MJ1073 from M. jannaschii) terminate short of subdomain X as well. Several other ORFs (i.e. AF0665 and AF1804 from A. fulgidus, MJ1130 from M. jannaschii, mt1005 and mt1425 from M. thermoautotrophicum, o286 from E. coli, and slr0889 from Synechocystis PCC6803) can be aligned to provide a potential subdomain XI arginine only by resorting to the extreme measure of leaving the subdomain X region void (Fig. 2, groups 5–7).

Could ORFs lacking subdomains X and/or XI produce functional PKs? Examination of the X-ray structures of eukaryotic PKs suggests that deletion of subdomain X may not necessarily be fatal to function, as this subdomain lies fairly distant from the active site region and does not contain residues directly involved in either substrate binding or phosphotransfer [4]. The lack of subdomain XI is more problematic, given the important role that its arginine plays in stabilizing the peptide/protein substrate binding lobe. One potential explanation is that those ORFs possessing a truncated amino terminus, relative to eukaryotic PKs, encode phosphotransferases, such as the aminoglycoside kinases, that act on non-protein substrates. Interestingly, the deviations in subdomains X and XI are most common in the genomes of the archaeal representatives, all of whose potential PK ORFs predict products lacking subdomain X and, in some cases, subdomain XI as well. The only bacterial examples of this phenomenon were o286 from E. coli and slr0889 from Synechocystis.

To summarize, 28 potential ORFs for protein kinases were identified from 10 prokaryotic genomes. Every organism contained at least one such ORF. Nearly all (26/28) of the predicted products contained the residues deemed essential for phosphotransferase activity in eukaryotic PKs. However, in roughly half the ORFs examined deviations from eucaryal prototypes were observed in the extreme N- and C-terminal ends of the putative catalytic domains, involving subdomains I, X, and XI. The residues in these regions have structural roles, rather than catalytic ones. Deviations in the subdomain I sequence of the nucleotide binding lobe are tolerated in other phosphotransferases, and it appears possible that the potential products from each of the 28 ORFs bind nucleotide substrates. The functional consequences resulting from the deletion of subdomain X or the complete truncation of both subdomains X and XI are much more difficult to predict. Unfortunately, data on the catalytic capabilities of prokaryotic PKs is available for only a small handful of these enzymes (reviewed in [3,9,10]), and this group does not include any of the ORFs identified in our genome searches. It is therefore possible that the truncated proteins, if expressed, would be completely incapable of phosphotransfer or, given that the affected domains are located in the binding lobe for the phosphoacceptor substrate, that they target non-protein biomolecules. Even so, their high degree of similarity to eukaryotic PKs and universal distribution strongly indicates that all prokaryotes inherited the genetic information necessary to produce protein kinases, regardless of whether they elaborate such enzymes today.

4 The PPP family of protein phosphatases

In contrast to the situation with the PKs, eukaryotes make extensive use of a variety of distinct molecular paradigms for the construction of the catalytic elements in the protein phosphatases (PPs) with which they carry out the dephosphorylation of phosphoproteins [43]. The PPP family of serine/threonine phosphatases is the most quantitatively significant source of protein phosphatase activity in eukaryotes. It includes the catalytic subunits of PP1 and PP2A, as well as the Ca2+ and calmodulin regulated protein phosphatase calcineurin (PP2B) (for reviews see [44,45]). The latter is abundant in neuronal tissues and testes, and PP2B is a target of a complex formed between the protein FKBP12 and the immunosuppressant drug FK506 [46]. While mammalian cells elaborate several hundred PKs to provide the means for achieving specificity and integration in the phosphorylation of proteins on serine and threonine, the PPP family is not so prolific. Instead, the substrate specificity of a more limited set of catalytic core units appears to be controlled through association with a large family of regulatory and targeting subunits that control the temporal and spatial loci at which the activity of the catalytic units becomes manifested [47].

At the primary sequence level, the catalytic subunits or domains of the PPP family span a region approximately 220 amino acids in length characterized by the presence and spacing of three conserved amino acids motifs: GDXHG (motif I), GDXXDRG (motif II), and GNHE (motif III) [48] (Fig. 3, section C). Genome searches revealed the presence of six ORFs encoding proteins containing versions of all three of these motifs in their proper order and spacing (Fig. 3, section A). While not all of the genomes surveyed contained potential PPPs, both archaeal and bacterial organisms were represented (Table 2). Three of the ORFs identified (f219, o218, and sll1387) contain perfect matches for the invariant residues found in these motifs and two of these three, namely f219 and o218 from E. coli, were shown to be synthesized and function as protein phosphatases in vivo [49]. Molecular genetic analysis indicates that these E. coli PPPs participate in signaling events triggered by protein misfolding in extracytoplasmic compartments.

Figure 3

Alignment of DNA-derived amino acid sequences from ORFs of known or potential eukaryote-like PPP family protein-serine/threonine phosphatases from prokaryotic organisms. Alignments are restricted to those areas encompassing the three conserved sequence motifs characteristic of well-characterized PPPs [48], and are designated by roman numerals above the sequences. The number of amino acid residues encompassing the regions beyond and between subdomains are also given in parentheses. Dashes indicate gaps introduced during the alignment process. Highly conserved amino acid residues are shown in bold. The aligned sequences are grouped into the following three categories: (A) ORFs identified through searches of completed archaeal and bacterial genomes; (B) known or potential PPPs from other prokaryotes; and (C) representative PPPs from eukaryotes. ORFs in section A are listed alphabetically using the designations listed in Table 2. An asterisk (*) designates those bacterial and archaeal enzymes for which protein phosphatase activity has been demonstrated. Abbreviations used along with their accession numbers (in parentheses), annotations, and, where applicable, relevant literature references include: (A) AF1822 (AE000977), putative serine/threonine phosphatase from A. fulgidus; f219 (AE000278), PrpA from E. coli[49]; HI0551 (P44751), putative diadenosine-tetraphosphatase from H. influenzae; o218 (AE000357), PrpB from E. coli[49]; sll1387 (D90912), hypothetical protein from Synechocystis sp. PCC 6803; YjbP (Z99110), putative diadenosine-tetraphosphatase from B. subtilis; (B) PP1-arch1 (U35278), protein serine/threonine phosphatase from Sulfolobus solfataricus[52], PP1-arch2 (U96772), protein serine/threonine phosphatase from Methanosarcina thermophila[53]; PP1-cyano1, protein serine/threonine phosphatase 1 from Microcystis aeruginosa PCC 7820 [51], PP1-cyano2 (U80887), protein serine/threonine phosphatase 1 from M. aeruginosa UTEX 2063 [51]; PP lambda, protein serine/threonine phosphatase from bacteriophage lambda [54]; PPP F. islandicum, putative serine/threonine protein phosphatase from Fervidobacterium islandicum; PPP P. abyssi, serine/threonine specific protein phosphatase from Pyrodictium abyssi; and (C) PP1 sds 21 yeast, protein serine/threonine phosphatase 1 from yeast; PP2A rabbit, protein serine/threonine phosphatase 2A from rabbit; PP2B beta 1 human, protein serine/threonine phosphatase 2B beta 1 from human; and PPT yeast, protein serine/threonine phosphatase T from yeast.

Of the three less than perfect matches, AF1822 from A. fulgidus has an alanine in place of the second glycine in motif I, a relatively conservative substitution that is precedented among paralogous phosphohydrolases and thus may be consistent with catalytic function [50]. HI0551 from H. influenzae substitutes a glutamine in place of the histidine in motif I, an alanine for the second aspartate in motif II, and an aspartate for the glutamate in motif III. The first and last of these can be viewed as relatively conservative changes. However, since the second aspartate in motif II participates in binding one of the metal ions involved in catalysis [43], the presence of alanine at this position may seriously impair metal binding and the prospects for enzymatic function. YjbP from B. subtilis has a serine in place of the first glycine in motif I and a cysteine in place of the glutamate in motif III. While neither residue appears to be directly involved in metal binding or catalysis, it is noteworthy that the presence of an aspartate or glutamate in motif III is conserved not only across the eukaryotic PPPs, but in numerous other phosphoesterases with similar active site motifs as well [50]. The prospects that the predicted product of this ORF would exhibit phosphohydrolase activity appear to be quite low.

Functional PPP family protein phosphatases have been found in several other prokaryotes (Fig. 3, section B), namely the cyanobacteria Microcystis aeruginosa PCC 7820 and M. aeruginosa UTEX 2063 [51], and the archaeons Sulfolobus solfataricus[52] and Methanosarcina thermophila TM-1 [53], as well as bacteriophage lambda [12,54]. Studies on these enzymes indicate that both of the cyanobacterial enzymes, called PP1-cyano1 and PP1-cyano2 (Shi, Carmichael, and Kennelly, unpublished observations), as well as the bacteriophage lambda protein phosphatase [55], dephosphorylate phosphotyrosyl proteins in vitro in addition to predicted phosphoseryl and phosphothreonyl protein substrates. Likewise, the two E. coli PPPs displayed dual-specific activity in vitro, and genetic analysis places them in a pathway containing a two-component protein-histidine kinase [49].

As was the case with the potential PKs, the PPP catalytic archetype has been utilized for the hydrolysis of biomolecules other than protein-bound phosphoesters. Specifically, the diadenosine tetraphosphatases have diverged from the PPP family, with whom they share the same three key conserved ‘phosphoesterase motifs’[19]. Thus, while the as yet uncharacterized products of ORFs AF1822 from A. fulgidus, HI0551 from H. influenzae, sll1387 from Synechocystis PCC6803, and YjbP from B. subtilis each may encode a phosphohydrolase, it is uncertain whether their substrates will be phosphoproteins.

5 The PPM family of protein phosphatases

The members of the PPM family in eukaryotes, which include PP2C and the pyruvate dehydrogenase phosphatase, are protein-serine/threonine phosphatases characterized by a requirement for the presence of a divalent metal ion, usually Mg2+, for catalytic function [56]. As the smallest contributor, quantitatively speaking, to the pool of protein-serine/threonine phosphatase activity in eukaryotes our understanding of their physiological role(s) remains incomplete. Evidence to date indicates a tendency for representatives of the PPM family to participate in modulating responses to environmental stresses, such as anoxia, heat or osmotic shock, etc. In many microorganisms, the catalytic domains for PPM-like protein phosphatases often are found to be part of polypeptide chimeras that include other domains with signal transduction potential [15]. These include adenylate cyclase, with whom the PPMs share significant homology within their respective catalytic domains [56], as well as potential membrane spanning and ‘extracellular’ domains reminiscent of transmembrane receptors, and strings of leucine-rich repeats.

The catalytic domain of the PPMs spans a region approximately 290 amino acid residues in length [56] marked by 11 conserved motifs [15] containing eight ‘absolutely’ conserved residues (Fig. 4, section C). These residues are the aspartates of motifs 1 and 2, the threonine in motif 4, the glycines of motifs 5 and 6, the DG sequence of motif 8, and the aspartate of motif 11 (Fig. 5). All of the conserved aspartates appear to function in the coordination of the active site metal ions [56].

Figure 4

Alignment of DNA-derived amino acid sequences from ORFs of known or potential eukaryote-like PPM family protein-serine/threonine phosphatases from prokaryotic organisms. Alignments are restricted to those areas encompassing the eleven conserved sequence motifs characteristic of well-characterized PPMs [15], and are designated by numerals above the sequences. The number of amino acid residues encompassing the regions beyond and between subdomains are also given in parentheses. Dashes indicate gaps introduced during the alignment process. Highly conserved amino acid residues are shown in bold. The aligned sequences are grouped into the following three categories: (A) ORFs identified through searches of completed archaeal and bacterial genomes; (B) known or potential PPPs from other prokaryotes; and (C) representative PPMs from eukaryotes. ORFs in section (A) are listed using the designators shown in Table 2. An asterisk (*) designates those bacterial enzymes for which protein phosphatase activity has been demonstrated. Abbreviations used along with their accession numbers (in parentheses), annotations, and, where applicable, relevant literature references include: (A) BB0836 (AE001182), putative subunit B of excinuclease ABC from B. burgdorferi; HP0431 (AE000559), protein phosphatase 2C homolog from H. pylori; MG108 (P47354), protein phosphatase 2C homolog from M. genitalium; f729 (AE000210), outer-membrane receptor for Fe(III)-coprogen, Fe(III)-ferrioxamine B and Fe(III)-rhodotrulic acid precursor from E. coli; sll0602 (D64002), slr1860 (P37979) and slr2031 (D90905), protein phosphatase 2C homolog from Synechocystis sp. PCC 6803 [93]; sll1033 (D90900) and sll1771 (D90908), hypothetical proteins from Synechocystis sp. PCC 6803; slr1983 (D90912), regulatory component of sensory transduction system from Synechocystis sp. PCC 6803; RsbU (L35574), RsbX (P17906) and SpoIIE (P37475), protein serine phosphatases from B. subtilis[57,58]; YloO (Y13937) and YvfP (Z94043), hypothetical proteins from B. subtilis; (B) K04_orf259 (P75525), putative protein phosphatase from M. pneumoniae[88]; and (C) YBX5 and KAPP, protein phosphatase 2C from yeast and Arabidopsis thaliana, respectively; and PDP, bovine pyruvate dehydrogenase phosphatase.

Figure 5

Schematic representation of essential sequence features of eukaryotic PPM-family protein phosphatases found in DNA-derived amino acid sequence of potential prokaryotic homologs from completed archaeal and bacterial genomes. Group 1 outlines a linear representation of the conserved motifs common to the 300 amino acid catalytic core of eukaryotic PKs. (Note: the sizes of these motifs are not drawn to scale.) Amino acids considered to be invariant or nearly so in eukaryotic PKs are listed using the numbering system of Das et al. [56]. Groups 2–7 summarize the predicted sequence features of ORFs lacking one or more of these key motifs or residues. Group 1 includes sll0602, sll1033, sll1771, and YloO. Group 2 includes slr1860, slr1983, RsbU, RsbX, SpoIIE, and YvfP. Group 3 is comprised of f729. Group 4 is comprised of slr2031. Group 5 consists of BB0836. Group 6 consists of MG108. Group 7 consists of HP0431. The number of ORFs from Fig. 4, section A that fall into each group is given in parentheses.

Genome searches revealed the presence of 15 ORFs, at least 11 of which predict products containing plausible candidates for all eight ‘absolutely conserved’ amino acid residues listed above and appear likely to encode functional protein products (Fig. 5, groups 1–3). Within this set of 11, the major variation in sequence features from established PPM family members was the lack of submotifs 5a and 5b (Fig. 5, groups 2 and 3). However, three of these variant ORFs have been demonstrated to encode catalytically functional protein products, suggesting that these submotifs, while conserved, are not critical. Among the remaining four ORFs, three contain substitutions for one of the ‘invariant’ amino acids (Fig. 5, groups 4–6), only one of which appears to be functionally conservative, an asparagine for the aspartate of motif 11 in BB0836 from B. burgdorferi (Fig. 5, group 5). The last ORF, HP0431 from H. pylori (Fig. 5, group 7), lacks part of motif 6, including the invariant glycine residue, and all of motif 7.

Curiously, the potential PPM homologs identified during our searches were found exclusively among the Bacteria. No archaeal representatives of the PPM family have been detected either through genome searches or by other molecular genetic or biochemical means. Although highly provocative, given the small number of archaeal genomes currently available it would be premature to conclude that the distribution of the PPM family is restricted to the Bacteria and Eucarya.

The members of the PPM family from Bacillus subtilis have been studied in some detail. All three – RsbU, RsbX, and SpoIIE – exhibit divalent metal ion-stimulated protein-serine phosphatase activity in vitro [5759]. In vivo, all three function in the regulation of the activity of gene transcription factors in response to environmental stresses. SpoIIE acts as a trigger for sporulation by promoting the release of the transcription factor σF/Spo0A [57,60]. RsbU and RsbX comprise elements of tandem ‘switch modules’ that regulate the transcription factor σB in response to environmental and energy stress signals, respectively [58,61].

6 The protein-tyrosine phosphatases

The most numerically prolific family of protein phosphatases are known collectively as the protein-tyrosine phosphatases (PTPs). However, this widely employed designation is a misnomer on two counts. First, not all PTPs are specific for tyrosine. Several PTPs will remove phosphoryl groups esterified to all three hydroxyl amino acids: serine, threonine, and tyrosine. These latter enzymes are referred to as dual-specific protein phosphatases, or DSPs. More importantly, the PTPs are in fact an amalgamation of two distinct enzyme families whose evolution has converged [62] to produce proteins that share a common catalytic mechanism [63,64] and active site geometry [65].

The two PTP families are the conventional PTP/DSPs and the low molecular weight PTPs (low MW PTPs). Both share the characteristic active site signature motif, CX5R, which was used as the template for our searches. The active site cysteine functions as a nucleophile [63,64], attacking the protein-bound phosphoryl group with the resultant displacement of the dephosphorylated protein and formation of a cysteinyl-phosphate enzyme intermediate. The conserved arginine forms salt bridges with the phosphoryl group to promote substrate binding [66] and to stabilize the phosphoenzyme intermediate [67]. The catalytic cycle is completed when a molecule of water enters the region vacated by the dephosphorylated protein, hydrolyzing the cysteinyl-phosphate moiety to generate free enzyme and inorganic phosphate. Most conventional PTP/DSPs possess a histidine residue immediately N-terminal to the active site cysteine, where it forms a hydrogen bond with the thiol group that decreases its pKa and thus enhances its nucleophilicity [68]. The only other amino acid deemed absolutely essential for catalysis is a conserved aspartic acid that serves as a general acid-base [69]. During the initial nucleophilic attack on the phosphorylated substrate, the aspartic acid protonates the leaving group alcohol on the protein as it is displaced. Later, the resulting conjugate base abstracts a proton from the entering water molecule to enhance its nucleophilicity and accelerate the hydrolysis of the phosphoenzyme intermediate. It is the relative locations of the active site (H)CX5R motif and the catalytic aspartic acid that differentiates the two PTP families (reviewed in [17]).

In the conventional PTP/DSPs, the (H)CX5R motif is located within the central portion of the catalytic domain sequence, which is estimated to span approximately 250 amino acids, with the conserved aspartate located anywhere from 25 to 50 residues to the N-terminal side of the active site cysteine. In the low MW PTPs, the CX5R motif is located very near the extreme N-terminus of the catalytic domain, which is significantly smaller than that in the conventional PTP/DSPs: 140 amino acids or less. The essential aspartate residue is located from 80–110 residues distant on the C-terminal side of the catalytic cysteine. While these configurations are radically different in the relative positioning of key catalytic residues within the primary sequence, the spatial relationship of these residues within the active site pocket is remarkably similar [65]. Both within and between the conventional and low MW PTPs no consistent pattern of global sequence conservation is apparent, although subgroups exhibiting characteristic conserved features, e.g. the Cdc25 and VH-1 like groups of conventional PTP/DSPs, are evident [16,17].

Genome searches revealed the presence of 16 ORFs whose predicted products contained the (H)CX5R motif and a potential active site aspartate, and thus appear likely to encode functional phosphohydrolases. These included seven conventional PTP/DSPs (Fig. 6, section IA) and nine low MW PTPs (Fig. 6, section IIB). Representatives of each PTP family were found in both the Bacteria and Archaea (Table 2). Two organisms, the archaeon M. thermoautotrophicum and the bacterium B. subtilis, contained both types of PTPs. Three others, B. burgdorferi, H. pylori, and M. genitalium, lacked recognizable PTPs.

Figure 6

Alignment of DNA-derived amino acid sequences from ORFs of known or potential eukaryote-like protein-tyrosine or dual-specific protein phosphatases from prokaryotic organisms. Alignments are restricted to those areas encompassing the two conserved active site sequence motifs characteristic of (I) conventional PTP/DSPs or (II) low MW PTPs. The number of amino acid residues encompassing the regions beyond and between subdomains are given in parentheses. Dashes indicate gaps introduced during the alignment process. Highly conserved amino acid residues are shown in bold. Each set of aligned sequences are grouped into the following three categories: (A) ORFs identified through searches of completed archaeal and bacterial genomes; (B) known or potential PTPs from other prokaryotes; and (C) the consensus sequences motifs themselves [17]. ORFs in section A are listed alphabetically using the designators listed in Table 2. An asterisk (*) designates those bacterial and archaeal enzymes for which protein phosphatase activity has been demonstrated. Abbreviations used along with their accession numbers (in parentheses), annotations, and, where applicable, relevant literature references include: (IA) MJ0215 (Q57668) and MJECL20 (Q60280), hypothetical proteins from M. janaschii; MJ1098 (A64437), putative RNA maturase isolog from M. janaschii; mt1586 (AE000918), pyruvate formate-lyase activating enzyme from M.thermoautophicum; o430 (AE000238), hypothetical proteins from E. coli; YtrC (AF008220), putative cytochrome c oxidase subunit from B. subtilis; YvcJ (Z94043), hypothetical protein from B. subtilis; (IB) IphP (L11392), protein-tyrosine/serine phosphatase from Nostoc commune[70]; SptP (U63293), protein tyrosine phosphatase from Salmonella typhimurium[72]; YopH, protein tyrosine phosphatase from Yersinia pseudotuberculosis[13]; (IC) consensus sequence motifs for conventional PTP/DSPs [17]; (IIA) AF1361 (AE001010), putative arsenate reductase from A. fulgidus; f147 (AE000296) and f152 (AE000200), hypothetical proteins from E. coli; mt1335 (AE000898), putative arsenate reductase from M.thermoautophicum; slr0328 (Q55535), putative low molecular weight tyrosine phosphatase from Synechocystis sp. PCC 6803; slr0946 (D90914), putative arsenate reductase from Synechocystis sp. PCC 6803; slr1617 (D90901), hypothetical protein from Synechocystis sp. PCC 6803; YfkJ (D83967) and YwlE (P39155), putative low molecular weight tyrosine phosphatases from B. subtilis; (IIB) Potential low MW PTPs from E. amylovora, Erwinia amylovora; K. pneumoniae, Klebsiella pneumoniae; M. tuberculosis, Mycobacterium tuberculosis; R. solanacearum, Ralstonia solanacearum; S. coelicolor, Streptomyces coelicolor[75]; and (IIC) consensus sequence for low MW PTPs [17].

Representatives of both families of PTPs have been characterized from bacterial organisms (Fig. 6, sections IB and IIB). These include the conventional DSP IphP from Nostoc commune[70,71] and the conventional PTPs YopH from Y. pseudotuberculosis[13] and SptP from Salmonella typhimurium[72]. Both YopH and SptP are exported by the pathogenic microorganisms in which they are expressed. Once ‘outside’, they act to override host signal transduction networks. The protein phosphatase activity of these enzymes is essential for the virulence of each pathogen [73,74]. Whereas YopH is encoded extrachromosomally, both SptP and IphP are chromosomally encoded. In contrast to Y. pseudotuberculosis and S. typhimurium, N. commune is a free-living microorganism, hence it seems likely that IphP serves a more direct role in metabolic or sensory functions in this cyanobacterium. The only bacterial members of the low MW PTP family to be characterized at the protein level is PtpA from Streptomyces coelicolor A3(2) [75], and Ptp from Acinetobacter johnsonii[76], each of which hydrolyzed free phosphotyrosine and phosphotyrosine residues within a synthetic polypeptides, but not free phosphoserine or phosphothreonine, in vitro.

Caution must be exercised in assigning functions to PTP-like open reading frames based on sequence resemblance alone. As was the case for the PKs and PPPs above, paralogs exist that employ variations on the PTP catalytic domain paradigms to perform other enzymatic functions. These include RNA 5′-tetraphosphatase [77] and many arsenate reductases (see Section 7, below).

7 Potential ancestral relationship between arsenate reductases and PTPs

One of the ORFs encountered in the search for potential PTPs was ArsC, arsenate reductase, from B. subtilis[78]. Remarkably, detailed inspection revealed that ArsC from B. subtilis contained all of the essential catalytic residues characteristic of the low MW PTPs (Fig. 7, section I). Moreover, the sequence context in which the signature residues were found, i.e. the CX5R motif and essential aspartate, displayed multiple additional features that tend to be conserved among the low MW PTPs. Indeed, ArsC is indistinguishable from the low MW PTPs at the level of primary sequence. Since phosphorous and arsenic reside in the same column of the periodic table, the substrate for ArsC, arsenate or HAsO−24, is a close chemical analog of the reaction product of the PTPs, phosphate or HPO−24. These remarkable similarities in primary sequence and catalytic reactants caused us to look further into the structure and function of the arsenate reductases to determine: (a) if these residues were conserved in other members of the ArsC family; and (b) if they participated in catalysis. The answer to both questions proved to be yes.

Figure 7

Sequence alignment of the catalytic subunits of microbial arsenate reductases with consensus sequences of PTPs from both (I) the low MW PTP family, and (II) the Cdc25-like DSPs of the conventional PTP family. Highly conserved amino acid residues are shown in bold, while numbers indicate where amino acid residues are not shown. Dashes indicate where a gaps were introduced in the course of alignment. The asterisk (*) marks the conserved cysteine that functions as the active site nucleophile in PTPs, the plus sign (+) marks the conserved arginine residue that stabilizes the negative charge on the phosphoryl group, and the number sign (#) indicates the aspartic acid residue that serves as a general acid during catalysis. Codes and abbreviations: ArsC, S. aureus pI258 (A53641), arsenate reductase from S. aureus plasmid pI258 [82]; ArsC, S. xylosus pSX267 (O01257), arsenate reductase from S. xylosus plasmid pSX267 [83]; ArsC, B. subtilis (P45947), arsenate reductase from B. subtilis[78]; ArsC, E. coli (X80057), arsenate reductase from E. coli[81]; ACR2, S. crevisiae (O06597), arsenate reductase from Sacccharomyces cerevisiae[86].

Arsenate reductase is a key component of the microbial detoxification systems responsible for the neutralization and export of toxic oxyanions of antimony and arsenic. The ArsC protein catalyzes the reduction of arsenate utilizing reducing equivalents obtained from sources, such as glutathione, in E. coli[79], or thioredoxin, in S. aureus[80]. The reduced product, arsenite, is then exported by another component of the detoxification system, ArsB. ArsB, ArsC, and the regulatory protein ArsR are encoded in the ars operon. This operon is encoded by plasmids in E. coli[81], S. aureus[82],and S. xylosus[83], while it is chromosomally encoded in B. subtilis[78]. Studies using site-directed mutagenesis to alter ArsC from E. coli revealed that catalysis proceeds via formation of an enzyme-bound arseno-cysteine intermediate involving cysteine-12 [84]. Intriguingly, an N-terminal histidine residue, histidine-8, was observed decrease the pKa, and hence increase the nucleophilicity, of cysteine-12 in a manner analogous to that by which the histidine adjacent to the active site cysteine in most conventional PTP/DSPs affects the latter’s acidity/nucleophilicity [85]. While the E. coli arsenate reductase exhibits faint similarity to the active site sequence of either PTPs or the other bacterial ArsCs, HX3CX3R versus (H)CX5R, it is generally believed that the conserved cysteine residues of other ArsCs function in a similar fashion as cysteine-12. These similarities in active site sequence, mechanism, and reactants strongly suggest that many bacterial ArsC proteins evolved from members of the low MW PTP family in a process of divergent evolution.

Further evidence that many arsenate reductases are descended from PTPases is provided by ACR2, the arsenate reductase from the yeast S. cerevisiae[86]. ACR2 exhibits homology with the conventional PTP/DSP family of PTPs (Fig. 7, section II). Moreover, this resemblance is not restricted to just a few key catalytic residues. When the sequence of ACR2 was compared with that of a conventional DSP, Cdc25, from the same microorganism [87], 29 amino acid identities were observed from a total of 138 amino acid residues plus gaps. If highly conservative substitutions are considered; i.e. Asn for Gln, Asp for Glu, Phe for Tyr, Lys for Arg, and Ile for Leu or Val; the overall degree of sequence similarity increases from 21 to 38% (53 residues total out of 138 amino acids plus gaps). The areas of greatest sequence similarity were clustered within the three regions of sequence conservation used to identify members of the Cdc25 subfamily of conventional PTP/DSPs [17]. In order of their appearance in the primary sequence they are the CH2-A region, which contains the catalytic aspartate; the active site signature sequence, which contains the nucleophilic cysteine; and the CH2-B region. Thus, it would appear that ACR2 evolved from a DSP, and that in this case it was drawn from the conventional family of PTPs. The fact that evidence implicating such adaptations has been discovered involving two different types of putative PTPase precursors in two different phylogenetic domains strongly suggests that nature did in fact evolve many arsenate reductases from protein phosphatases.

One final question arises from this analysis. If PTPs and many arsenate reductases both contain a common set of essential sequence features, can they be distinguished through genome searches? One possible criterion for distinguishing ORFs encoding PTPs from those encoding ArsCs resides in the genetic context surrounding them. PTPs function as independent units, or in coordination with protein kinases, while ArsC is part of a multicomponent operon encoding other detoxification functions, such as the ArsB transporter. Inspection of the flanking ORFs for the presence or absence of other elements of the ars operon should allow one to discriminate between ORFs encoding arsenate reductases and those encoding PTPs, respectively. Thus, while mt1355 from M. thermoautotrophicum was annotated as a ‘potential arsenate reductase’ within its genome database, here we have identified it as a low MW PTP.

8 Summary and conclusions

We began our investigation searching for clues to the answers of three questions. Do ‘all’ prokaryotes contain ORFs encoding potential eukaryote-like protein kinases and protein phosphatases? Are ORFs from all four families of eukaryotic PPs present in all prokaryotes? Do all organisms that contain a potential PK(s) also possess a potential countervailing PP(s), and vice-versa?

The answer to the first and third of the questions above is a qualified yes. Each of the 10 prokaryotic genomes we inspected contained ORFs whose potential products exhibited sequence features characteristic of eukaryotic PKs and PPs. (Independent analysis of another prokaryotic genome, that of the bacterium M. pneumoniae G-37, indicated that it contained ORFs for a potential PK and a PPM family PP as well [88], and ORFs for seven potential PKs have been identified in Mycobacterium tuberculosis[89].) In the case of the ORFs for potential PPs, it appears that a clear majority predict products that contain the minimal essential set of primary sequence features necessary to support phosphohydrolase activity. In the case of the ORFs whose potential products resembled eukaryotic PKs, roughly half of the 28 clearly met this criterion, with the second half showing deviations of varying degrees. Most deviations from eukaryotic prototypes localized to subdomain I of the nucleotide binding domain and appeared to be compatible with catalytic function. Those ORFs that displayed the greatest deviation from the eukaryotic PK paradigm were found predominantly among the Archaea. Here, deviations in the most C-terminal regions of the catalytic core, i.e. subdomains IX and X, were observed. Since these regions encompass portions of eukaryotic PKs removed from the immediate sites of substrate binding and catalysis, the potential impact on function is difficult to predict. Determination of the catalytic properties of these more diverse potential PKs thus has the potential to provide new insights into structure–function relationships in this important class of enzymes. In considering the functional potential of the potential PK ORFs described herein, it should be borne in mind that the eukaryotic PK paradigm does not represent the sole source of protein-serine/threonine kinase activity in prokaryotes. Three other PK paradigms have been identified in prokaryotes including the isocitrate dehydrogenase kinase/phosphatase [90], the HPr kinases [9193], and variants of the two-component histidine protein kinases, such as SpoIIAB, that phosphorylate hydroxyl amino acid residues [94].

Intriguingly, recent genome searches focussing on the other major protein phosphorylation system identified in prokaryotes, the two-component system, have revealed that while it is widely distributed, it is not universal. Three prokaryotes, the archaeon M. jannachii and the bacteria M. genitalium and M. pneumonia, all lack recognizable vestiges of the histidine protein kinases or their target response regulators [95]. This is somewhat ironic, as the two-component system is widely regarded as the more ancient and ‘prokaryotic’ of the two major protein phosphorylation systems.

Our surveys revealed that the number of ORFs for potential PKs ranged from one of each in B. burgdorferi, H. pylori, and M. genitalium to a high of nine potential PKs and 10 potential PPs in Synechocystis. Not surprisingly, a rough correlation was observed between genome size and the number of putative PKs and PPs. However, factors beyond mere number of genes would appear to influence the amount of genetic information devoted to these classes of signal transduction/regulatory molecules. Among the two bacteria and two archaeons whose genomes ranged in size from 1.66 to 1.83 Mbp, the number of PKs and PPs ranged from one of each in H. pylori to a high of three of each in M. jannaschii (Table 2). Among the three bacteria with the largest genomes –Synechocystis (3.6 Mbp), B. subtilis (4.2 Mbp), and E. coli (4.6 Mbp) – the last named has only half as many potential PKs and PPs as the first named despite the need to manage over 25% more genetic material and its resulting gene products. These trends, as well as the total lack of two-component cascades in some prokaryotes, presumably reflect the level of ‘environmental adaptability’ inherent in each organism. Synechocystis PCC6803, with its ability to survive on the barest of environmental resources, light and air, presumably has developed and maintained a more extensive cellular sensing, command, and control apparatus to support its lifestyle than organisms that specialized to efficiently exploit more monotonous environmental niches.

If all organisms contain both eukaryote-like PKs and PPs, what is the balance between the two? Does each PK possess a partner PP in prokaryotes? In most instances the number of potential PKs equalled, plus or minus one, the number of potential PPs (Table 2). However, in two organisms this was not the case. E. coli possesses three potential ORFs for PKs, but six for PPs, while B. subtilis possessed two for PKs and 10 for PPs. The fact that, in each instance, the number of PPs was in great excess over the PKs may be explained, at least in part, by presence of other types of protein-serine/threonine kinase in prokaryotes, such as the isocitrate dehydrogenase kinase/phosphatase [90], HPr kinase [9193], and SpoIIAB-like ‘histidine’ protein kinases [94]. Since the PPs recognize only phosphoprotein substrates, and not the enzyme that phosphorylates them, it is quite possible that the eukaryote-like PPs also dephosphorylate the targets of other PKs. The presence of excess PPs also would be explained if some act toward non-phosphoester substrates. The PPP family members PP1 and PP2A and the PPM family member PP2C, all from mammals, have been observed to exhibit protein-histidine phosphatase activity toward phosphohistidyl-histone H4 in vitro [96], while the phage lambda protein phosphatase hydrolyzed phosphohistidine from the bacterial protein NRII[55]. More recently, we have observed that the cyanobacterial PPPs PP1-cyano1 and PP1-cyano2 from M. aeruginosa PCC 7820 and M. aeruginosa UTEX 2063, respectively, exhibit both histidine and lysine phosphatase activity in the laboratory (Shi, Carmichael, and Kennelly, in preparation). Thus, some of the potential PPs identified in these studies may act toward the phosphohistidyl (and/or phosphoaspartyl?) proteins of the two-component cascade or, possibly, the PTS system in bacteria.

Their catalytic versatility also may explain the lack of a clear pattern in the distribution of the PPPs, PPMs, and PTPs among the prokaryotes. Studies in this laboratory have established that IphP from Nostoc commune UTEX 584, a conventional PTP/DSP, will dephosphorylate serine, threonine, and tyrosine residues on proteins [71]. The bacteriophage lambda PPP will dephosphorylate tyrosine and histidine in addition to serine and threonine as well [55], and the same is true for PP1-cyano1 and PP1-cyano2 from M. aeruginosa[51]. This suggests that, in contrast to their eukaryotic counterparts, the bacterial PPs have not undergone global specialization into serine/threonine- and tyrosine-specific classes. It follows that since any of the three major PP families could be used as the vehicles for dephosphorylating any protein-bound phosphomonoester as required, many prokaryotes may have saved precious genome space by utilizing only one or two of the major families of PPs to perform all protein dephosphorylation functions.

The question still remains as to how this heterogeneous distribution came to be. Perhaps it is most readily explained by presuming that the ancient ancestors of today’s prokaryotes contained the genetic information for all the major classes of protein phosphatases, but discarded those types rendered redundant by their potential for functional overlap. If so, we would anticipate that, while the three representatives from the Archaea are devoid of homologs of PPM, ORFs for archaeal PPMs will eventually be uncovered, since it is difficult to imagine how a gene family that pervades both the Eucarya and Bacteria could not have been passed on to the Archaea as well. Alternatively, the widespread distribution of PP prototypes may reflect the results of an acquisitive process involving gene transfer events. The virally encoded PPs of the PPP [12] and conventional PTP [97] family might reflect the vestiges of this ancient swapping process. Since there is no reason to believe that the process should be monotonic for either the PKs, PPs, or both together, the most viable explanation may be that a combination of hereditary and acquisitive events, tuned by selection and elimination, shaped the array of potential phosphotransferases and phosphohydrolases present in contemporary organisms.

To summarize, the genetic information required to express eukaryotic PKs and PPs for the phosphorylation and dephosphorylation of proteins on the hydroxyl amino acids is present in the genome of every organism surveyed to date. It is not yet known if this information is currently utilized for this purpose in all instances. Alternative schemes for phosphorylating proteins on these and other amino acid residues exist, and alternative uses of these PK and PP paradigms; e.g. the aminoglycoside kinases (PKs) [3841], diadenosine tetraphosphatases (PPPs) [19], and arsenate reductases (PTPs); may explain their presence in some organisms. However, the results of this survey strongly indicate that the ancestors of what were long considered eukaryote-specific signal transduction proteins are ancient in origin and widespread through phylogeny.


This work was supported by Grant R01 GM55067 (to P.J.K.). We wish to thank Dr. Kristin K. Koretke and colleagues from Smithkline Beecham Pharmaceuticals for permitting us to cite the preliminary results of their searches for ORFs encoding potential component of the two-component regulatory system in prokaryotes.


  1. [1].
  2. [2].
  3. [3].
  4. [4].
  5. [5].
  6. [6].
  7. [7].
  8. [8].
  9. [9].
  10. [10].
  11. [11].
  12. [12].
  13. [13].
  14. [14].
  15. [15].
  16. [16].
  17. [17].
  18. [18].
  19. [19].
  20. [20].
  21. [21].
  22. [22].
  23. [23].
  24. [24].
  25. [25].
  26. [26].
  27. [27].
  28. [28].
  29. [29].
  30. [30].
  31. [31].
  32. [32].
  33. [33].
  34. [34].
  35. [35].
  36. [36].
  37. [37].
  38. [38].
  39. [39].
  40. [40].
  41. [41].
  42. [42].
  43. [43].
  44. [44].
  45. [45].
  46. [46].
  47. [47].
  48. [48].
  49. [49].
  50. [50].
  51. [51].
  52. [52].
  53. [53].
  54. [54].
  55. [55].
  56. [56].
  57. [57].
  58. [58].
  59. [59].
  60. [60].
  61. [61].
  62. [62].
  63. [63].
  64. [64].
  65. [65].
  66. [66].
  67. [67].
  68. [68].
  69. [69].
  70. [70].
  71. [71].
  72. [72].
  73. [73].
  74. [74].
  75. [75].
  76. [76].
  77. [77].
  78. [78].
  79. [79].
  80. [80].
  81. [81].
  82. [82].
  83. [83].
  84. [84].
  85. [85].
  86. [86].
  87. [87].
  88. [88].
  89. [89].
  90. [90].
  91. [91].
  92. [92].
  93. [93].
  94. [94].
  95. [95].
  96. [96].
  97. [97].
  98. [98].
View Abstract