OUP user menu

Lipoproteins of Mycobacterium tuberculosis: an abundant and functionally diverse class of cell envelope components

Iain C. Sutcliffe , Dean J. Harrington
DOI: http://dx.doi.org/10.1016/j.femsre.2004.06.002 645-659 First published online: 1 November 2004

Abstract

Mycobacterium tuberculosis remains the predominant bacterial scourge of mankind. Understanding of its biology and pathogenicity has been greatly advanced by the determination of whole genome sequences for this organism. Bacterial lipoproteins are a functionally diverse class of membrane-anchored proteins. The signal peptides of these proteins direct their export and post-translational lipid modification. These signal peptides are amenable to bioinformatic analysis, allowing the lipoproteins encoded in whole genomes to be catalogued. This review applies bioinformatic methods to the identification and functional characterisation of the lipoproteins encoded in the M. tuberculosis genomes. Ninety nine putative lipoproteins were identified and so this family of proteins represents ca. 2.5% of the M. tuberculosis predicted proteome. Thus, lipoproteins represent an important class of cell envelope proteins that may contribute to the virulence of this major pathogen.

Keywords
  • Bioinformatics
  • Genome
  • Lipoprotein
  • Mycobacterium tuberculosis
  • Periplasm
  • Virulence factor

1 Introduction

At the start of the twenty-first century tuberculosis (TB) remains one of the major infectious diseases of man with ca. 8 million active cases and ca. 2 million deaths annually [1, 2]. Furthermore, an estimated one third of the worlds population is infected with latent TB and face the possibility of reactivation disease. Disease in this population, especially when combined with the devasting impact of HIV co-infection, remains a significant issue in global disease control [1, 2]. Thus obtaining a greater understanding of the molecular basis of the virulence of Mycobacterium tuberculosis and its disease pathogenesis presents an urgent challenge to the scientific community. In this regard, the publication of the genome sequence of M. tuberculosis strain H37Rv [3] represents a contemporary landmark in TB research. Another major achievement has been in the definition of the composition and organisation of the mycobacterial cell envelope, which has provided important insights into the physiology and pathogenicity of M. tuberculosis [47]. The mycobacterial cell envelope is a complex structure dominated by the peptidoglycan-arabinogalactan-mycolic acid wall skeleton in which the mycolic acids are orientated so as to provide the foundation for an external permeability barrier. As such it seems likely that this lipid permeability barrier represents the outer boundary of a ‘pseudoperiplasmic’ compartment, which is defined at its internal aspect by the plasma membrane. However, the nature of the protein components and metabolic activities that may be present within this subcellular compartment remain largely unknown.

All bacteria apparently localise specific proteins to their cell envelopes by post-translational lipid modification to produce membrane-anchored lipoproteins (Lpp [8, 9]). In mycobacteria lipid modification is likely to represent an important mechanism by which proteins are localised within the cell envelopes, such that these proteins are located within the above described pseudoperiplasmic compartment. Indeed, Lpp of Gram-positive bacteria have been previously suggested to be functional equivalents of periplasmic proteins in Gram-negative bacteria [9, 10]. Given the likely significance of Lpp to bacterial physiology and the expectation that some Lpp may be virulence factors we have used a bioinformatic strategy to compile an inventory of putative Lpp of M. tuberculosis. The likely functions of these Lpp are discussed with particular reference to the contribution some of them may make during pathogenesis.

2 The lipoprotein biosynthetic pathway in M. tuberculosis

Lpp biogenesis is dependant on the presence of specific type II signal peptide sequences [8, 1114]. The signal peptide directs preprolipoprotein export through the plasma membrane whereupon a diacylglyceride unit is added by thioether linkage to a crucial cysteine residue. The enzyme that carries out this lipidation reaction, prolipoprotein diacylglyceryl transferase (Lgt), is apparently an essential enzyme in Gram-negative bacteria but is dispensable in the Gram-positive bacteria studied to date [15, 16]. However, an lgt mutant of Streptococcus pneumoniae was attenuated in an animal model of disease [16]. Subsequent to the lipidation by Lgt, the signal peptide is cleaved by a specific prolipoprotein signal peptidase II enzyme (Lsp) at a cleavage site immediately preceding the lipidated cysteine, which consequently becomes the N-terminus of the mature Lpp. Lsp is also apparently dispensable for growth of Gram-positive bacteria in vitro [1719] although lsp mutants of Listeria monocytogenes and Staphylococcus aureus have been shown to be attenuated in animal models [1921].

Several mycobacterial Lpp with type II signal peptides have been experimentally characterised as Lpp [14, 22] although in no case has there been direct chemical characterisation of the diglyceride-modified cysteine residue. However, consistent with the above biosynthetic pathway, putative lgt (Rv1614) and lsp (Rv1539) genes have been identified in the M. tuberculosis genome [3]. It is likely that the actions of these two enzymes are both necessary and sufficient for ensuring Lpp anchoring. In Gram-negative bacteria a third enzyme, lipoprotein N-acyl transferase (Lnt), adds an additional amide-linked fatty acid to the amino terminus of the mature Lpp [23]. However, the presence of this enzyme in M. tuberculosis needs to be clarified: although Rv2051c was originally annotated as a two-domain enzyme containing a putative Lnt domain, this protein has been characterised as a polyprenol monophosphomannose synthase (Ppm1, [24]). Although the putative Lnt domain is not needed for Ppm1 activity, on overexpression in Mycobacterium smegmatis it appeared to enhance the mannosyltransferase activity. Interestingly, the two domains of Ppm1 are encoded by separate, adjacent open reading frames in the genomes of other mycobacteria and corynebacteria. Thus the role of Rv2051c and its homologues in mycobacterial Lpp biogenesis remains unclear.

3 Identification of putative lipoproteins of M. tuberculosis

In contrast to the cleavage sites of type 1 signal peptides, there is considerable sequence conservation in the amino acids that immediately precede the lipidated cysteine in Type II signal peptides [8, 1214]. This sequence is commonly referred to as the ‘lipobox’ and its sequence characteristics are described in the Prosite pattern PS00013 ([25], http://www.expasy.ch/prosite/). We have since derived a revised sequence pattern (G+LPP) for the confident identification of putative Lpp sequences in the genomes of Gram-positive bacteria from the signal sequence features of thirty three experimentaly verified Gram-positive bacterial Lpp [14]. The recognition of a lipobox cysteine appropriately placed in relation to other typical signal peptide features has allowed the bioinformatic identification of genes encoding putative Lpp, revealing that Lpp are an abundant class of proteins, typically representing ca. 1.5% or more of the total predicted proteins in the sequenced bacterial genomes [13, 14, 19, 2628].

In the present analysis the G+LPP pattern was used in a taxon-restricted pattern search to retrieve M. tuberculosis sequences in the Swiss-Prot/TrEMBL database ([29], http://ca.expasy.org/sprot/), using the ScanProsite tool ([30], http://ca.expasy.org/tools/scanprosite/). Except where discrepancies are referred to in the text below, all sequences were common to both the M. tuberculosis strain H37Rv [3] and strain CDC1551 [31] genomes and are refered to herein by their original gene designations (Rv No. [3]). Sequences unique to strain CDC1551 are referred to as annotated (MT No. [31]). The N-terminal features of the 59 sequences identified as matching the G+LPP pattern were re-examined using SignalP ([32, 33], http://www.cbs.dtu.dk/services/SignalP-2.0/) and prediction methods for identifying membrane-spanning domains (MSD), notably the TMHMM server ([34], http://www.cbs.dtu.dk/services/TMHMM-2.0/) and similar tools as described previously [14]. As well as the three previously described experimentally verified Lpp (Rv0432; Rv0934, Rv3763), the G+LPP pattern identified 51 other probable Lpp sequences (Table S1 in the online supplementary material). Two sequences that had ambiguous signal peptide features (Rv0847 and Rv2945c) were retained as possible Lpp (Table S2 in the online supplementary material) whilst three sequences (MT2138.1, MT3476 and MT3814.1) wherein the predicted lipobox cysteine was inappropriately placed in relation to typical signal peptide features were considered false-positives, as described previously [14]. These sequences are notable in that all are subject to annotation discrepancies between the M. tuberculosis strain H37Rv and strain CDC1551 genomes [3, 31].

Following the above analysis, a pattern search with the Prosite pattern PS00013 identified an additional 32 M. tuberculosis sequences in the Swiss-Prot/TrEMBL database that had a putative lipobox that did not match the more restrictive cleavage site in the G+LPP pattern. Further examination of the N-terminal features of these sequences as above suggested that a substantial proportion (25%) should be excluded as false-positives whilst 24 should be retained and considered possible Lpp (Table S2 in the online supplementary material). These analyses confirmed the previous observation [14] that the stringency of the G+LPP pattern compared to PS00013 gives greater confidence (i.e. fewer false positives) when predicting Lpp signal peptides. Moreover, the pattern search with the G+LPP pattern identified six putative Lpp that were not recognised by the PS00013 pattern.

Finally, a variety of strategies, notably reference to the H37Rv genome annotation and to DOLOP [27], were used to identify putative Lpp sequences that were not retrieved by either of the above pattern searches. This approach identified 19 further sequences (Tables S3 and S4 in the online supplementary material). These included a large subset of 10 sequences with hydrophobic h-regions and lipobox cleavage sites that match the G+LPP pattern but with anomalously long signal peptide n-regions. These sequences were considered ‘anomalous probable’ Lpp (Table S3 in the online supplementary material). Rv2080, which has a typical lipobox cleavage site but a minor variation in its h-region sequence, was also included in this category (Table S3 in the online supplementary material). The remaining 8 sequences that had features consistent with anomalous type II signal peptides were considered ‘anomalous possible’ Lpp (Table S4 in the online supplementary material).

Analysis of the signal peptide features indicated that the proven and probable Lpp (Supplementary Table S1) have typical type II signal peptide characteristics with respect to their length (mean cysteine position 24.0 ± 4.1; range 16-33 amino acids; n= 54) and that most of the variation derives from variation in the length of the n-region (mean length 6.7 ± 3.6; range 2-15 amino acids; n= 54). Thus these signal peptides are typically shorter than those of exported proteins of M. tuberculosis, for which a median length of 32 amino acids (range 21–49; n= 28) has been reported [35]. The comparative shortness of Lpp signal peptides has been noted previously [13, 14]. Analysis of the frequency of amino acid usage in the lipobox cleavage site suggested some minor selectivity in comparison to the G+LPP pattern: a slightly increased preference for glycine at the –4 position (relative to the cysteine) was noted, whilst at the –3 position a slightly decreased preference for leucine was linked to an increase in the frequency of occurrence of valine and alanine (data not shown). As for Gram-positive bacterial Lpp generally, there was a notable preference for small amino acids at the +2 position following the lipobox cysteine, with alanine, glycine, serine or threonine present in 45/54 (83%) of the sequences. Thus, the putative Lpp differ from secreted proteins where a marked preference for proline (15 out of 28 [54%] sequences examined) at the +2 position has been reported [35].

The recognition of the subset of ten anomalous probable Lpp signal sequences (Supplementary Table S3) containing long n-regions was of interest. The Rv0179c and Rv2518c sequences identified by searching with PS00013 (Supplementary Table S2) also belong to this group. The length of the n-region (mean 20.8 ± 5.6; range 16–33 amino acids; n= 12) of these sequences notably affected the position of the lipobox cysteine (mean cysteine position 38.6 ± 6.0; range 33–52; n= 12). As the h-region features of these putative Lpp are comparable to those of the proven/probable Lpp (data not shown) the length of this region is probably decisive in orientating the lipobox cysteine such that this critical amino acid can interact with the membrane-bound Lgt enzyme.

Cumulatively, the above searches identified 99 sequences including three proven, 62 probable (Supplementary Tables S1 and S3) and 34 possible Lpp (Supplementary Tables S2 and S4). As such putative Lpp represent ca. 2.5% of the M. tuberculosis predicted proteome. Forty five of the 99 putative Lpp identified in the M. tuberculosis genome had a homologue (Supplementary Tables S1–S4) in the highly degenerated genome of M. leprae [36], including Rv0344c which has extensive amino acid identity (67% over 179 amino acids) to an unannotated M. leprae sequence. Twenty three out of the 99 putative Lpp identified were homologous to pseudogenes in the M. leprae genome (Supplementary Tables S1–S4). The putative Lpp sequences were subjected to functional categorisation following BLAST sequence analysis [37] and analysis of sequence motifs and patterns including reference to the annotation at Swiss-Prot/TrEMBL, the curated Tuberculist server (http://genolist.pasteur.fr/TubercuList/index.html) and the Pfam database ([38], http://www.sanger.ac.uk/Software/Pfam/). GenTHREADER ([39], http://bioinf.cs.ucl.ac.uk/psipred/) was also used to support some predictions. Sequence alignments were made using ClustalW [40] accessed at Pôle Bio-Informatique Lyonnais (http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_clustalw.html).

4 Functional categorisation of putative Lpp in M. tuberculosis

4.1 Solute binding proteins (SBP) of ABC transport systems

Lpp SBP are abundant in the genomes of Gram-positive bacteria as components of ABC transport systems [14, 28, 41] and those of M. tuberculosis have been catalogued previously [42]. SBP are unique to prokaryotic ABC importer systems and deliver substrates to membrane-located permeases prior to transport into the cell [4143]. The present analysis identified 17 likely Lpp SBPs (Table 1) including three (Rv1166; Rv1244 and Rv2585c) that had not been identified by Braibant et al. [42]. 11 of these SBP genes were adjacent to genes encoding permease and ATP-binding components of typical putative ABC transport systems [42]. Several of these SBP have been investigated in relation to their potential role in virulence. PstS-1 and PstS-3 have received attention as vaccine candidates, following delivery as either protein or as DNA vaccines [4447] whilst the ModA SBP is of interest as a signature-tagged transposon mutagenesis study found that modA disruption attenuated survival in the mouse lung [48]. Inactivation of the SubI SBP in Mycobacterium bovis BCG did not significantly attenuate the growth of mutants during mouse infection studies [49, 50]. However, analysis of the growth of transposon mutants has indicated that subI is essential for optimal growth on defined media [51]. This study also suggested that putative SBPs for peptides (Rv1166; Rv3666c) and sugars (Rv2041; Rv2833) are needed for optimal growth in vitro.

View this table:
1

Functional categorisation of M. tuberculosis putative Lpp

Functional categorySub-category andORF
SBPs in ABC transport systems (categorised by their predicted substrates)Iron: Rv3044 (FecB); Rv0265c (FecB2)
Molybdenum and Phosphate: Rv0928 (PstS3); Rv0932c (PstS2); Rv0934 (PstS1); Rv1857 (ModA)
Peptides: Rv0411c; Rv1166; Rv1244; Rv1280c; Rv2585c; Rv3666c; Rv3759c
Sugars: Rv1235; Rv2041c; Rv2833c
Sulphates: Rv2400c
Enzymes predicted to be involved in cell wall metabolismRv0399c; Rv0838; Rv1922; Rv2068c; Rv2864c; Rv2905; Rv3593
Enzymes predicted to be involved in degradative processesEsterases: Rv0671; Rv3298c
Glycosyl hydrolase: Rv0237
Phosphorylase: Rv2293c
Proteinase/peptidases: Rv2224c, Rv2672, Rv0418; Rv0419
Other enzymes and metabolic activitiesCopper oxidase: Rv0846c
FAD-linked oxidase: Rv2251
γ-Glutamyl transferase: Rv2394
Oxidoreductases and thioredoxins: Rv0132c, Rv0526, Rv1677; Rv3006
Phosphoglycerate mutase: Rv3390
Superoxide dismutase: Rv0432
Putative Lpp with roles in adhesion and cell invasionAdhesin: Rv2873 (MPT83)
mce operon proteins: Rv0173 (Mce1E); Rv0593 (Mce2E); Rv1970 (Mce3E); Rv3495c (Mce4E)
Putative roles in signalling and related functionsRv1009 (RpfB); Rv1270c; Rv1368 (LprF); Rv1411c; Rv1690 (LprJ); Rv1911c; Rv2403c; Rv2945; Rv3576
Unknown functionInter-related Lpp of unknown function: Rv0483 and Rv2518c; Rv0583c and Rv1016c; Rv0604 and Rv2999; Rv1228 and 2341; The LppA paralogue family (Rv2543; Rv2544; Rv2796c; MT2619)
Other: Rv0179c; Rv0344c; Rv0381c; Rv0460; Rv0679c; Rv0847; Rv0962c; Rv1064c; Rv1252c; Rv1274; Rv1275; Rv1418; Rv1541c; Rv1799; Rv1881c; Rv1921c; Rv2046; Rv2080; Rv2116; Rv2138; Rv2171; Rv2270; Rv2290; Rv2330c; Rv2784c; Rv2843; Rv3016; Rv3244c; Rv3584; Rv3623; Rv3763 (19 kDa antigen); MT2627.1

Of the six apparently ‘orphan’ putative SBP that are not clearly associated with typical permease systems, GlnH (Rv0411c) has been shown to be necessary for growth in vitro [51] and may interact with ABC transporter components (Rv2563-GlnQ or Rv0072-Rv0073) located elsewhere on the chromosome [42]. The permease systems with which the three newly identified SBPs may interact cannot be identified, although it is intriguing that all are putative SBPs for amino acid or peptide substrates: as three complete ABC transport systems and the GlnH system(s) are present for such substrates [42] these SBPs could possibly interact with components encoded by other chromosomal loci.

The remaining two orphan SBPs (FecB2/Rv0265c and FecB/Rv3044) are both putative SBPs for iron(III)-siderophore substrates and, in the absence of other ABC transporter components for this substrate family, their functions remain unclear. However, it is possible that these Lpp could interact with either the membrane-associated mycobactin or the secreted carboxymycobactin forms of the M. tuberculosis iron-chelating siderophores and thereby participate in either iron transport or other aspects of iron homeostasis [52].

4.2 Lipoprotein enzymes

4.2.1 Enzymes involved in cell wall metabolism

Given the predicted localisation of Lpp at the interface of the cell membrane and the peptidoglycan layer, it was unsurprising that seven putative Lpp were identified with functions that may relate to cell wall metabolism (Table 1). These include the putative transpeptidase (penicillin-binding protein) Rv2864c and the putative peptidases Rv0399c and Rv1922. Rv0838 (LpqR) is a putative d-Ala d-Ala dipeptidase that is homologous to proteins in the VanX family that participate in glycopeptide resistance, including VanX of the vancomycin producer Amycolatopsis orientalis [53]. However, Rv0838 is not encoded as part of a typical glycopeptide resistance gene cassette and so its role remains unclear. Cumulatively, it can be speculated that these four putative Lpp may play roles in peptidoglycan crosslinking and remodelling and so it is notable that Rv0399c has been shown to be necessary for optimal growth in defined media [51].

The three other putative Lpp included in this category may contribute to cell wall metabolism or resistance to β-lactam antibiotics. Rv2068c encodes a previously characterised β-lactamase [54]. β-lactamase Lpp are present in other Gram-positive bacteria [9, 10]. Rv3593 is homologous to ORF12 from the clavulanic acid biosynthetic cluster of Streptomyces clavuligerus [55] and contains an SxxK motif typical of the acyltransferase superfamily [56]. As for Rv0399c, a likely housekeeping role for this protein is suggested by the recent observation that Rv3593 is necessary for optimal growth in vitro [51]. Likewise, Rv2905 is noted to exhibit similarities to β-lactamases and contains an SxxK motif [56].

4.2.2 Degradative enzymes

Several putative Lpp are predicted to be degradative enzymes including esterases (Rv0671 and Rv3298c, which are 36% identical to each other); proteases/peptidases (Rv0418, Rv0419, Rv2224c and Rv2672); a putative phosphorylase (Rv2293c) and a family 3 glycosyl hydrolase (Rv0237). Although the specific substrates of these putative enzymes remain to be determined, their predicted localisation to the M. tuberculosis pseudoperiplasm suggests they could be involved in nutrient metabolism. The Rv2224c putative protease is 38% identical to SlpD, an apparently essential putative Lpp protease of Streptomyces lividans [57] and it is notable that transcription of Rv2224c was induced ca. fourfold within infected macrophages [58].

4.2.3 Other enzymes and metabolic activities

One of the few well characterised Lpp of M. tuberculosis is the Cu,Zn superoxide dismutase (SodC, Rv0432; [59, 60]). This protein was demonstrated to be lipidated by radiolabelling of the protein with palmitic acid when expressed in E. coli and by comparison of the recombinant protein and the native protein on non-denaturing activity gel electrophoresis [60]. Consistent with this SodC has been shown by immunogold electron microscopy to be localised to the cell envelope in M. tuberculosis [59] although it is difficult to determine if this represents localisation to the plasma membrane or the surface layers of the organism. SodC was also shown to be induced following phagocytotic uptake by macrophages. Investigation of a sodC deletion mutant has shown that this enzyme contributes to the ability of M. tuberculosis to resist oxidative stress either in culture or following phagocytosis into induced murine peritoneal macrophages in vitro [61], although a sodC mutant was unaffected for growth in activated murine bone marrow macrophages and in a guinea pig model of infection [62]. Cumulatively, these data suggest that cell envelope-localised SodC could represent an important front-line defence against oxidative stress during intramacrophage growth. However, it is also notable that sodC transcription may be switched off as M. tuberculosis enters the persistant phase associated with time points longer than 20 days in the mouse model of lung infection [63].

Rv3390 is a putative phosphoglycerate/bisphosphoglycerate mutase family member (Pfam PF00300). The M. tuberculosis H37Rv genome encodes nine putative members of this family and, aside from the participation of phosphoglycerate mutase in glycolysis, the functions of these proteins are unclear. The majority are cytoplasmic but the Rv3390 putative Lpp and Rv0754, a member of the PE-PGRS family [3], are predicted to be exported.

Rv2394 is a putative Lpp homologue of the periplasmic γ-glutamytransferase of E.coli which catalyses the cleavage of glutathione [64]. Glutathione is a highly abundant thiol tripeptide that acts as an antioxidant in both bacterial and mammalian cells. Cleavage of excreted glutathione by periplasmic γ-glutamytransferases may act as a mechanism for recycling the amino acid constituents as nutrients [64]. However, in actinomycetes such as M. tuberculosis, the most abundant low molecular weight thiol antioxidant is mycothiol and glutathione is absent [65]. Thus Rv2394, with its predicted pseudoperiplasmic location, could be involved in metabolising glutathione derived from the host. This could represent a nutrient acquisition or signalling pathway as the adjacent ORF Rv2395 encodes a putative integral membrane protein of the OPT superfamily, members of which are possibly transporters for peptide signalling molecules [66].

Several putative Lpp have likely roles in redox reactions (Table 1). Of these, the Rv0132c possible Lpp has been annotated as a putative F420-dependent glucose-6-phosphate dehydrogenase due to its homology with an M. smegmatis enzyme [67]. However, the most significant homology resides in the N-terminal domain which is likely to interact with the F420 coenzyme [67] whereas their C-terminal regions show much lower homology. Thus it would seem prudent to consider this enzyme a putative F420-dependent oxidoreducase until its substrate specificity has been directly demonstrated.

The M. tuberculosis genome encodes two putative thioredoxin Lpp of unknown function. First, the possible Lpp Rv0526 is a thioredoxin-like protein and the adjacent ORF, Rv0525 encodes a putative cytochrome c biogenesis protein. Thus, these two domains may interact to allow transfer of electrons across the cytoplasmic membrane and onto, as yet unknown, acceptors in the pseudoperiplasmic compartment, in a manner analogous to the action of the β and γ domains within the integral membrane protein DsbD in E. coli [68]. The Rv0524-Rv0529 locus has been shown to be necessary for growth in vitro [51]. The other putative Lpp thioredoxin (Rv1677, DsbF; thiol:disulphide interchange protein) exhibits high homology (54% identity over amino 137 acids) with the secreted antigen MPT53 (Rv2878c), which also contains a CXXC active site and which has recently been demonstrated to act as an oxidant [69].

M. tuberculosis possesses a putative Lpp multi-copper binding oxidase, Rv0846c. Recently, evidence has been presented that periplasmic multi-copper oxidases of gram-negative bacteria may act as ferroxidases, oxidising Fe(II) to Fe(III) prior to Fe(III) uptake [70]. Combined ferroxidase/Fe(III) transport systems could thus represent an important route for iron acquisition by pathogens. Thus Rv0846c may be localised to the M. tuberculosis pseudoperiplasm in order to participate in iron metabolism (see also Section 4.1).

4.3 Lpp with putative roles in adhesion and cell invasion

4.3.1 MPT83 (Rv2873), a putative adhesin

One of the most extensively studied mycobacterial Lpp is the MPB83 protein of M. bovis BCG which has been characterised by Harboe, Wiker and co-workers. MPB83 exhibits extensive sequence homology to the secreted antigen MPB70 [71]. However, the mpb83 gene sequence was noted to encode a putative Lpp [72] and convincing evidence for lipidation of MPB83 has been presented [73, 74]. The mpb83 gene was also recognised to be located close to the mpb70 gene in the M. bovis BCG genome and their levels of expression are linked [72, 73, 75]. MPB83 has been immunolocalised to the surface of M. bovis BCG by both electron microscopy [73] and flow cytometry [74]. Thus, the MPB83 and MPB70 proteins represent highly homologous but differently localised proteins [74]. However, it has also been noted that MPB83 is released from the cells as both a mature, lipidated 25-26 kDa form and as a hydrophilic 22-23 kDa form [71, 7476]. N-terminal sequencing of secreted forms of MPB83 from M. bovis culture supernatants indicates proteolytic cleavage of the MPB83 following residues Ser3 and Val23 of the mature Lpp sequence [77].

M. tuberculosis produces identical homologues of both MPB83 and MBP70, designated MPT83 (Rv2873, Table 1) and MPT70 (Rv2875), respectively [75]. Thus the evidence that MPB83 is a Lpp provides strong evidence that MPT83 is also likely to be a Lpp and it is notable that recombinant MPT83 remained cell-associated after cloning into M. smegmatis [75]. The previously noted correlation in the level of expression of MPB83 and MPB70 has been confirmed by genetic analysis of transcription on the Rv2871-Rv2875 locus [78].

Several structural features of MPT83/MPB83 are notable in comparison to their MPT70/MPB70 homologues. First, the mature MPT83/MPB83 proteins are not only anchored by lipid modification but also possess an N-terminal sequence extension of 32 amino acids. This sequence, which is absent from their secreted homologues, contains a threonine motif (T48T49) that permits O-linked trimannosylation in MPB83 [77]. Second, both the MPT83 and MPT70 sequences contain cysteine residues that could form an internal disulphide bond [79] and their highly homologous regions belong to the β-Ig-H3/fascilin family (Pfam PF02469). This domain occurs in a variety of proteins from diverse taxa (e.g. man, plants, fungi, bacteria) and in many cases is present in multiple repeats [80]. MPT83 and MPT70 each contain a single copy of the domain. Several eukaryotic members of the fascilin family are adhesins, notably osteoblast-specific factor 2, OSF-2. MPT83 exhibits ca. 30% amino acid identity with domains in OSF-2 and consequently it has been suggested that MPT83/MPT70 may be an adhesin involved in bone tropism [75, 81]. However, no direct role for MPT83 in bone adhesion has been demonstrated and the proposed link between MPB70/MPB83 and post-BCG vaccination osteitis [75, 81] has not yet been substantiated. Thus MPT83 should be considered a putative adhesin for an as yet unknown ligand. Whether MPT83/MPT70 function involves other cotranscribed genes such as the gene product of the intervening Rv2874, which is a DipZ family protein [78], also remains to be determined.

4.3.2 Putative lipoprotein members of the Mce family

In 1995, Arruda et al. [82] identified a M. tuberculosis locus, mce, that conferred on E. coli the ability to invade mammalian cells. Subsequently the M. tuberculosis genome sequence revealed that this locus was in fact part of an operon (Rv0167-Rv0174) comprising eight genes of which mce was the third. Moreover, three comparable operons were also evident elsewhere in the genome (Rv0587-RvRv0594; Rv1964-Rv1972 and Rv3501c-Rv3494c respectively; [3]). In each case, the mce homologue is preceded by genes encoding putative membrane proteins and is followed by five genes that may have signal peptides or N-terminal hydrophobic sequences including a putative Lpp as the penultimate gene of each operon (Table 1). Moreover, the corresponding gene in the single M. leprae Mce operon (Mce1E; ML2593) encodes a putative Lpp [83].

All the MceA-F proteins from each operon contain an N-terminal region of sequence homology that is documented in the PFAM database entry PF02470. However, the functional inter-relationships of these proteins and their contribution to virulence remains unclear. The archetypal Mce1A is clearly linked with a mammalian cell invasion phenotype [82, 84, 85] and deletion of part of the N-terminal region common to the Mce family abolished the ability of recombinant Mce1A to direct uptake of latex beads into HeLa cells [85]. However, latex beads coated in Mce2A were not taken up into HeLa cells [85]. Moreover, Mce operons are present in both non-pathogenic mycobacteria [86] and other bacteria, notably S. coelicor (Sco2414-Sco2421; [87]). Although the link between the function(s) of the Mce operons and virulence is not clear, it is apparent that the putative Mce Lpp are expressed and antigenic in vivo as sera from 4/10 tuberculosis patients (but none of 10 sera from BCG-vaccinated healthy controls) cross-reacted with recombinant Mce1E [88]. Although the Mce3 operon is deleted in M. tuberculosis complex organisms other than M. tuberculosis and Mycobacterium canetti [89], the Mycobacterium avium homologue of Mce3E (Rv1970) has also been shown to be expressed early during growth in macrophages [90].

4.4 Lipoproteins with putative roles in signalling and sensory functions

4.4.1 Rv1368 (LprF) and Rv1690 (LprJ), accessory proteins to the KdpD potassium sensor kinase

Using post-genomic technologies, the putative lipoproteins LprF (Rv1368) and LprJ (Rv1690) were recently identified as proteins that interact with the KdpD histidine kinase in the potassium-dependent sensing of osmotic stress [91]. Mutagenesis of the LprJ lipobox was reported to affect LprJ localisation, providing evidence that this protein is indeed a Lpp. However, Steyn et al. [91] predicted topologies for LprF and LprJ that are not supported upon re-analysis with a wider range of prediction tools: each can be instead predicted to have a typical Lpp topology i.e. a membrane anchor and an extracytoplasmic domain (our data not shown). Such topology is as consistent with the fusion protein localisation data of Steyn et al. as those proposed previously [91]. Moreover, it allows for the predicted formation of an intramolecular disulphide in LprJ and its homologues (see below). Cumulatively, it may be hypothesised that LprF and LprJ are membrane localised and that these proteins possibly act as sensors/receptors for signals in the extracytoplasmic region that interact with the integral membrane protein KdpD. This proposed interaction between extracytoplasmic components and the sensor kinase is analagous to that proposed for the interaction of the KapB lipoprotein with the KinB sensor kinase in Bacillus subtilis [92]. However, it is remains difficult to resolve why LprF and LprJ were found to interact specifically with the putative cytoplasmic N-terminal domain of KdpD, as shown by yeast hybrid screens and SELDI-TOF mass spectrometry [91]. Clearly, the topologies of KdpD, LprF and LprJ require clarification.

The LprJ and LprF lipoproteins are themselves each representative of a family of M. tuberculosis proteins. LprF belongs to a family [93, 94] including three other putative Lpp (Rv1270c, Rv1411c and Rv2945c) which exhibit relatively low overall amino acid sequence homology (data not shown). The M. bovis homologue of Rv2945c has been cell surface localised by flow cytometry [95] and Rv1411c (LprG) is processed as a lipoprotein when cloned in E. coli [96]. Given the proposed sensor role of LprF described above, it is interesting that Rv1411c (LprG) has previously proposed to act as a sensor for the Rv1410c P55 antibiotic efflux pump as the genes for these two proteins are located in an operon [94, 97]. A defined mutant in this operon has been demonstrated to be attenuated in a BALB/C mouse model of infection [98]. Rv1411c induced strong delayed-type hypersensitivity and Th1-type immune responses in immunised mice [96]. However, mice vaccinated with Rv1411c gave an unfavourable response in subsequent infectious challenge experiments [96]. Likewise, DNA vaccination of mice with the M. bovis homologue of Rv2945c was unsuccessful in conferring protective immunity [95].

LprJ is a member of a family of eight proteins in M. tuberculosis that exhibit very low amino acid sequence homology but which are characterised by the presence of two conserved cysteines. In contrast to LprJ, the other seven members of this family are predicted to be exported proteins. In each case, one of the conserved cysteines (Cys73 in LprJ) is central to the mature protein sequence whilst the other (Cys113 in LprJ) is part of a conserved YCP motif near the C-terminus. It is clearly a possibility that these two conserved cysteines may form a disulphide bond that would have a major influence on protein folding and, in the case of LprJ, topology. In this regard, it is significant that fusion proteins containing only amino acids 1-99 of LprJ (i.e. fusions that would disrupt the proposed Cys73-Cys113 disulphide bond) gave ambiguous localisation results [91].

4.4.2 The Rv1009 growth-promoting factor

Study of the Rpf resuscitation-promoting factor of Micrococcus luteus has led to the understanding that actinomycete bacteria secrete proteinaceous growth stimulating factors [99, 100]. The M. tuberculosis genome contains five genes which encode proteins containing highly conserved Rpf-like domains [101]. Unsurprisingly, four of these are predicted to be secreted proteins, whereas RpfB (Rv1009) is a putative Lpp. The mature domains of all five proteins were produced as recombinant proteins and shown to stimulate in vitro growth of M. bovis BCG when added exogenously in picomolar quantities and when late stationary phase cells were used as an inoculum [101, 102]. The mature protein sequence of Rv1009 contains three tandem repeats of a domain of unknown function (DUF348; Pfam PF03990) in the N-terminal half of the protein and the ca. 70 amino acid Rpf-domain represents the extreme C-terminus. It is plausible that the growth stimulatory properties of this domain are dependant in vivo on its release by proteolytic cleavage. The inter-relationship between these proteins and their growth promoting activities is clearly an area that demands further study, especially in the context of the ability of M. tuberculosis to enter and subsequently emerge from a persistant (latent) state in vivo. However, whilst rv1009 belongs to the sub-set of M. tuberculosis genes whose mutation by transposon-insertion mutagenesis led to slow growth on defined laboratory media [51], a subsequent study of a defined rv1009 deletion mutant revealed a small colony phenotype but no observable defect in growth and persistence in a mouse model of infection [103].

4.4.3 Other putative Lpp that may be involved in cell signalling or sensory systems

Rv1911c belongs to a family of microbial proteins that include the E. coli periplasmic protein YcbL and which are similar to eukaryotic phosphatidyl ethanolamine-binding proteins that are important in cellular signalling [104]. Rv1911c is highly homologous to the product of the adjacent Rv1910c gene, which encodes a predicted exported protein. The substrates bound by the different members of this bacterial protein family are likely to vary but are probably phosphorylated substrates, including phospholipid head-groups [104, 105]. Thus the Rv1911c and Rv1910c proteins may be associated with the outer face of the plasma membrane.

M. tuberculosis was one of the first bacteria noted to utilise a family of eukaryotic-like serine/threonine protein kinases that are proposed to respond to environmental signals via extracytoplasmic C-terminal sensory domains [3, 106]. The putative sensory domain of one of these kinases, PknH, exhibits significant sequence homology to the Rv2403c and Rv3576 (PknM) putative Lpp. Thus these two proteins may act as receptor proteins that interact with signalling systems.

4.5 Lipoproteins of unknown function

4.5.1 Rv3763, the 19 kDa antigen

Probably the most extensively studied Lpp of M. tuberculosis is the 19 kDa antigen (Rv3763; Table; [107]), for which convincing experimental evidence of lipidation has been presented [22]. This antigen has homologues in other mycobacteria including M. avium, M. intracellulare and M. leprae [108, 109]. However, the 19-kDa antigen was found to be absent from 2 out of 9 strains of M. tuberculosis [110]. In addition to its lipidation, evidence has been presented that the 19-kDa antigen is glycosylated and that this is dependant on five threonine residues located within an 8 amino acid sequence (T13-T20) at the N-terminus of the mature protein [111, 112]. Glycosylation may be a protective mechanism for retaining the 19 kDa antigen in its membrane-associated location ([112]; see Section 6 below). Despite extensive study of the immunobiology of the 19-kDa antigen, including its interaction with Toll-like receptor 2 and intramacrophage trafficking of released antigen (for examples see [113118]), the function of this protein remains unknown.

4.5.2 Lipoprotein homologues of known mycobacterial antigens

The Rv0583c (LpqN) putative Lpp of M. tuberculosis is highly homologous to the MK35 antigen of Mycobacterium kansasii [119]. This latter protein was suggested to be a Lpp by Triton X-114 detergent partitioning studies and was shown to be strongly immunogenic in guinea pig delayed-type hypersensitivity tests. Despite the sequence homology between Rv0583c and MK35, it was noted that no reaction to recombinant MK35 was observed when the guinea pigs had been sensitised with M. tuberculosis H37Rv. A second M. tuberculosis putative Lpp, Rv1016c, exhibits distant sequence homology to Rv0583c and significant homology with the exported proline-rich antigen MTC28 [120]. Similarly, Rv2116 (LppK) exhibits significant sequence homology to the MTB12 (Rv2376c) secreted antigen [121].

4.5.3 The LppA paralogue family and other inter-related lipoproteins of unknown function

Although M. tuberculosis has been considered to exhibit relatively little genetic diversity it is clear from whole genome comparisons that this diversity may have been underestimated [3, 31]. One interesting polymorphism identified involves a family of closely related Lpp paralogues. The M. tuberculosis H37Rv genome encodes two adjacent putative Lpp, Rv2543 and Rv2544 (LppA and LppB respectively), that are 87% identical to each other. The corresponding region of the M. tuberculosis CDC1551 genome contains both of these paralogues (MT2618 and MT2620) and a third, MT2619. It appears that these three paralogues have arisen by gene duplication followed by loss of the MT2619 sequence from the H37Rv genome [31]. These three sequences are also distantly related to Rv2796c, another putative Lpp of unknown function (Table 1).

Two putative Lpp, Rv0483 and Rv2518c, belong to the recently described ErfK/YbiS/YcfS/YnhG family (Pfam PF03734). The function of these proteins remains unknown but may relate to the presence of a conserved region containing histidine and cysteine residues.

The Rv0604 (LpqO) and Rv2999 (LppY) probable Lpp are closely related proteins that also have significant homologies with other conserved hypothethical proteins. These proteins appear to consist of a fusion of duplicated domains but their function remains unknown. Aligning these M. tuberculosis proteins and their homologues identified a motif F-X(10,14)-G-[DE]-X(6)-E-X(18)-H-X-H-X(5)-P-X(5)-H which is conserved in both the N and C-terminal domains and in three clostridial proteins that each contain only a single copy of the domain.

4.5.4 Other lipoproteins of unknown function

The Rv0679c putative Lpp belongs to a family containing other non-Lpp proteins of M. tuberculosis. Thus the protein exhibits significant homology to both the Rv0680c putative exported protein encoded by the adjacent gene and also to the Rv0314c protein. This latter protein is predicted to be a membrane protein with a cytoplasmic N-terminal domain and a C-terminal domain which is homologous to the Rv0679c/Rv0680c proteins. Thus, the predicted localisation of the conserved domains of these three proteins is consistent with an inter-related pesudoperiplasmic function.

In addition to the above, the M. tuberculosis genome contains 28 additional putative Lpp of unknown function (Supplementary Tables S1–S4). Of these the majority are conserved hypotheticals although in many cases it is noted that they share significant homology only with sequences within other mycobacterial genomes. Two sequences (Rv0962c and Rv1799) are apparently unique to M. tuberculosis. Moreover, four of these proteins of unknown function (Rv1274; Rv2138; Rv2999 and Rv3244c) were shown to be necessary for optimal growth in vitro [51]. Similarly, Rv1252c has been identified as an iron-induced gene of unknown function that is independent of IdeR regulation [122].

5 Lipoprotein localisation

It is clear from the above analyses that putative Lpp represent an abundant and functionally diverse sub-set of the M. tuberculosis proteome. It is predicted that the lipid modification of these proteins serves to anchor them to the outer face of the plasma membrane, as in other Gram-positive bacteria. However, in the context of contemporary models of the more complex architecture of the mycobacterial cell envelope [47], these proteins can be viewed as having a pseudoperiplasmic location, which draws parallels with the cell envelope organisation of Gram-negative bacteria. It remains to be determined if significant numbers of Lpp are associated with the outer mycolate-based lipid layers of the mycobacterial cell envelope. It is, however, noted that there is evidence for secretion or a peripheral localisation of all the well characterised Lpp in M. tuberculosis. Thus, the Rv0432 SOD has been immunolocalised to the peripheral layers of M. tuberculosis H37Rv [59] and the MPB83 homologue of MPT83 has been localised to the surface of M. bovis BCG (see Section 4.3.1; [73, 74]). Moreover, the PstS SBPs have been detected at the cell surface of M. bovis BCG by flow cytometry [123] and of M. tuberculosis by immunogold electron microscopy [124]. Since these SBPs are believed to deliver phosphate to plasma membrane-associated ABC permease components [42], the apparent surface association of these proteins is unexpected. Likewise the M. bovis BCG homologue of Rv2945c (LppX; Section 4.4.1.) has been surface-localised by flow cytometry and this antigen is also released into culture supernatants [95]. Cumulatively these results may reflect the release of acylated Lpp (‘shedding’) or proteolytic cleavage downstream of the lipidated cysteine (‘shaving’), as proposed previously for B. subtilis Lpp [125]. Evidence for proteolytic release of the Pst-S1 Lpp has been obtained by N-terminal sequencing of antigen recovered from culture supernatants, which suggested a cleavage site preceding Ser3 of the mature Lpp sequence [126]. In addition, proteolytic release of the MPB83 antigen has been observed in M. bovis [77] and proteolytic release of the 19 kDa antigen was observed after cloning into M. smegmatis [112]. However, it is also clear that intact 19 kDa antigen can be released from cells since sub-cellular trafficking of this antigen within macrophages was directed by an acylation-dependant pathway distinct from that followed by live mycobacteria [117]. Thus, lipid modification may serve the primary purpose of retaining proteins at the mycobacterial plasma membrane but there may also be alternative pathways of Lpp processing that lead to their localisation within other subcellular compartments of either the bacterium or host cells.

It is also apparent from the above analyses that a significant number of putative Lpp have significant paralogues in the M. tuberculosis secretome (Table 2). Whether this represents a form of functional compartmentalisation is an interesting question for future study.

View this table:
2

Putative Lpp with significant homologies to exported proteins of M. tuberculosis

Putative LppExported homologue% identity (sequence alignment length)
Rv2873 (MPT83)Rv2875 (MPT70)73% (164)
Rv1911cRv1910c63% (206)
Rv1009 RpfBRv0867c RpfAa62% (74)
Rv1677Rv2878c (MPT53)54% (137)
Rv1690Rv3354 (YCP family)a46% (115)
Rv1016cRv0040c (MTC28)35% (167)
Rv2116Rv2376c (MTB12)29% (162)
Rv3495c (Mce4E)Rv3496c (Mce4F)a25% (247)
Rv0679cRv0680c38% (96)
  • aRepresentative of several exported proteins.

6 Glycosylation of lipoproteins

It is now clear that, like various other bacteria, M. tuberculosis is able to glycosylate proteins [77, 112, 127] and these glycoproteins include the 19 kDa antigen (Section 4.5.1) and the MPB83 Lpp (Section 4.3.1). In each of the chemically characterised mycobacterial glycoproteins O-mannosylated threonine residues are found in the proximity of proline residues but a precise sequence motif that directs glycosylation cannot yet be defined [77, 127]. Post-translational glycosylation may be linked to protein export as all three well-characterised glycoproteins of M. tuberculosis are exported. Moreover, an M. smegmatis expression system and Concanavilin A lectin binding assays provided experimental support for bioinformatic predictions that many other putative Lpp may also be glycosylated [128]. These data suggested that the Rv0432 superoxide dismutase, the Pst-S1 SBP and four other putative Lpp (Rv0411c; Rv1541c; Rv2270; Rv2341) contain sequence motifs that can direct mannosylation in a heterologous mycobacterial host.

The function of glycosylation remains unclear at present. It has been proposed that glycosylation of the 19 kDa antigen protects a proteolytically sensitive cleavage site so that the intact protein is retained by its lipid anchor [112]. However, glycosylation at the T48T49 motif does not prevent the proteolytic release of the 23-kDa form of the MPB83 antigen [77].

7 Concluding comments

The availability of genome sequences for strains of M. tuberculosis [3, 31] provides a major resource to underpin research on the mechanisms of virulence in this devastating bacterial pathogen. The present study suggests that putative Lpp could represent as much as 2.5% of the M. tuberculosis proteome, consolidating and extending the original analysis of Cole et al. [3]. Thus, Lpp are likely to represent a significant class of cell envelope proteins involved in interactions between the organism and the host. This study has provided a functional categorisation of these putative Lpp which identifies many inter-related protein sequences, including both putative Lpp families and relationships between putative Lpp and exported proteins. It is hoped that this analysis will provide the basis for novel lines of investigation into the biology of M. tuberculosis.

Addendum

The significance of Lpp to the virulence of M. tuberculosis has been confirmed recently by the important findings of Sander et al. [129] that inactivation of the Rv1539 Lsp gene (encoding the lipoprotein signal peptidase) impaired the ability of the mutant to replicate in cultured mouse macrophages and led to attenuation in a BALB/c mouse model of infection. Processing of the signal peptides of Rv1411c (LprG) and MPT83 were also disrupted in the lsp mutant, confirming the predictions that these proteins are indeed Lpp.

Appendix A Supplementary material

Supplementary data associated with this article can be found, in the online version at doi:10.1016/j.femsre.2004.06.002.

Table S1. Proven/probable Lpp sequences identified by pattern searching with the G+LPP pattern.

Acknowledgements

The authors are grateful to the Wellcome Trust and the Horserace Betting Levy Board for their support of work in our laboratories on bacterial lipoproteins. We thank Noel Carter (University of Sunderland) for extensive discussions on the topologies of LprF and LprJ.

References

  1. [1].
  2. [2].
  3. [3].
  4. [4].
  5. [5].
  6. [6].
  7. [7].
  8. [8].
  9. [9].
  10. [10].
  11. [11].
  12. [12].
  13. [13].
  14. [14].
  15. [15].
  16. [16].
  17. [17].
  18. [18].
  19. [19].
  20. [20].
  21. [21].
  22. [22].
  23. [23].
  24. [24].
  25. [25].
  26. [26].
  27. [27].
  28. [28].
  29. [29].
  30. [30].
  31. [31].
  32. [32].
  33. [33].
  34. [34].
  35. [35].
  36. [36].
  37. [37].
  38. [38].
  39. [39].
  40. [40].
  41. [41].
  42. [42].
  43. [43].
  44. [44].
  45. [45].
  46. [46].
  47. [47].
  48. [48].
  49. [49].
  50. [50].
  51. [51].
  52. [52].
  53. [53].
  54. [54].
  55. [55].
  56. [56].
  57. [57].
  58. [58].
  59. [59].
  60. [60].
  61. [61].
  62. [62].
  63. [63].
  64. [64].
  65. [65].
  66. [66].
  67. [67].
  68. [68].
  69. [69].
  70. [70].
  71. [71].
  72. [72].
  73. [73].
  74. [74].
  75. [75].
  76. [76].
  77. [77].
  78. [78].
  79. [79].
  80. [80].
  81. [81].
  82. [82].
  83. [83].
  84. [84].
  85. [85].
  86. [86].
  87. [87].
  88. [88].
  89. [89].
  90. [90].
  91. [91].
  92. [92].
  93. [93].
  94. [94].
  95. [95].
  96. [96].
  97. [97].
  98. [98].
  99. [99].
  100. [100].
  101. [101].
  102. [102].
  103. [103].
  104. [104].
  105. [105].
  106. [106].
  107. [107].
  108. [108].
  109. [109].
  110. [110].
  111. [111].
  112. [112].
  113. [113].
  114. [114].
  115. [115].
  116. [116].
  117. [117].
  118. [118].
  119. [119].
  120. [120].
  121. [121].
  122. [122].
  123. [123].
  124. [124].
  125. [125].
  126. [126].
  127. [127].
  128. [128].
  129. [129].
View Abstract