We have undertaken the inventory and assembly of the typical subunits of the ABC transporters encoded by the complete genome of Mycobacterium tuberculosis. These subunits, i.e. the nucleotide binding domains (NBDs), the membrane-spanning domains (MSDs) and the substrate binding proteins (SBPs), were identified on the basis of their characteristic stretches of amino acids and/or conserved structure. A total of 45 NBDs present in 38 proteins, of 47 MSDs present in 44 proteins and of 15 SBPs were found to be encoded by M. tuberculosis. Analysis of transcriptional clusters and searches of homology between the identified subunits of the transporters and proteins characterized in other organisms allowed the reconstitution of at least 26 complete (including at least one NBD and one MSD) and 11 incomplete ABC transporters. Sixteen of them were unambiguously classified as importers whereas 21 were presumed to be exporters. By searches of homology with already known transporters from other organisms, potential substrates (peptides, macrolides, carbohydrates, multidrugs, antibiotics, iron, anions) could be attributed to 30 of the ABC transporters identified in M. tuberculosis. The ABC transporters have been further classified in nine different sub-families according to a tree obtained from the clustering of their NBDs. Contrary to Escherichia coli and similarly to Bacillus subtilis, there is an equal representation of extruders and importers. Many exporters were found to be potentially implicated in the transport of drugs, probably contributing to the resistance of M. tuberculosis to many antibiotics. Interestingly, a transporter (absent in E. coli and in B. subtilis) potentially implicated in the export of a factor required for the bacterial attachment to the eukaryotic host cells was also identified. In comparison to E. coli and B. subtilis, there is an under-representation of the importers (with the exception of the phosphate importers) in M. tuberculosis. This may reflect the capacity of this bacterium to synthesize many essential compounds and to grow in the presence of few external nutrients. The genes encoding the ABC transporters occupy about 2.5% of the genome of M. tuberculosis.
ATP binding cassette transporter
ATP binding cassette (ABC) transporters, found in eukaryotes and prokaryotes, constitute a large superfamily of multi-subunit permeases that transport various molecules (ions, amino acids, peptides, antibiotics, polysaccharides, proteins, etc.) across biological membranes, with a relative specificity for a given substrate (for reviews, see [1, 2]). They consist of two hydrophobic membrane spanning domains (MSDs) associated with two cytoplasmic nucleotide binding domains (NBDs) (Fig. 1). In many bacterial ABC transporters, these four different domains are expressed as independent polypeptides encoded by genes organized in operon(s) or at least clustered in one DNA region, while eukaryotic ABC transporters are usually composed of a large multi-domain molecule encoded by a single gene. However, some bacterial ABC transporters containing two NBDs, two MSDs or one MSD and one NBD on one molecule have also been described [3–6].
Topological organization of the prototypical ABC transporters. ABC transporters are classified as importers and exporters, depending on the direction of translocation of their substrate (indicated by an arrow). The prokaryotic prototype is composed of two membrane spanning domains (MSDs) and two nucleotide binding domains (NBDs) expressed as independent polypeptides. SBP indicates the presence of a substrate binding protein, usually present in importers. The eukaryotic prototype is formed by a large multi-domain molecule.
ABC transporters are classified as importers and exporters depending on the direction of translocation of their substrate . Importers are found exclusively in prokaryotes and are involved in the uptake of extracellular molecules (e.g. the maltose permease of Escherichia coli or the oligopeptide transport system of Streptococcus pneumoniae). They are usually associated with a high-affinity extra-cytoplasmic substrate binding protein (SBP), either located in the periplasm of Gram-negative bacteria or maintained by an NH2-terminal lipo-amino acid anchor at the vicinity of the cytoplasmic membrane of Gram-positive bacteria (Fig. 1) [10, 11]. Exporters are found in both prokaryotes and eukaryotes, where they export molecules from the cytoplasm (Fig. 1). Prominent eukaryotic exporters include among others the human P-glycoprotein which is associated with the phenomenon of multiple drug resistance in tumor cells or the cystic fibrosis gene product (CFTR) which is mutated in patients with cystic fibrosis (for reviews, see [12, 13]). Prokaryotic exporters include proteins, peptides, drugs, antibiotics or polysaccharide transport systems (e.g. the capsular polysaccharide exporter KpsMT of E. coli or the erythromycin resistance system of Staphylococcus epidermidis) (for reviews, see [15, 16]). Some of those prokaryotic exporters (for example the E. coliα-hemolysin and the Erwinia chrysanthemi protease exporters [17, 18]) also require additional proteins, such as outer membrane proteins and/or proteins supposed to connect their inner and outer membrane proteins.
The NBDs of ABC transporters bind ATP and couple ATP hydrolysis to the transport process. They are the best conserved domains of the transporters, presenting all about 30–40% sequence conservation, and this regardless of their substrate specificity or of their origin. Nearly all NBD-containing proteins are associated with the transport of substrate across biological membranes, although the DNA excision repair enzyme UvrA, the yeast EF-3 translation elongation factor and the yeast regulatory protein GCN20 provide notable exceptions [19–21]. NBDs possess some highly conserved stretches of amino acids: (1) the WalkerA and the WalkerB motifs which form an ATP binding pocket typical of the ABC transporters but also of other ATP hydrolyzing proteins ; (2) the ABC transporter family signature, a characteristic pattern located between the two Walker A and B motifs [22, 23]; (3) a six-amino acid motif, recently described by Linton and Higgins, downstream of the WalkerB motif. This short motif contains a highly conserved histidine residue previously shown to be essential in the transport process of several ABC transporters [24–27] (Table 1).
Prokaryotic membrane lipoprotein lipid attachment site
Prosite: PDOC00013 [11, 33]
aSingle amino acid letter code. Each element in a pattern is separated from its neighbor by a hyphen. ‘x’ is used for any amino acid. ‘h’ is used for any hydrophobic amino acid. ‘+’ and ‘−’ are used for positively or negatively charged residue, respectively. Square brackets indicates listing of amino acids acceptable at a given position. Square brackets indicates listing of amino acids that are not accepted at a given position. Numbers in parentheses indicate how often the preceding element is repeated in the pattern. Two numbers in the same parentheses indicate how often the preceding element may be repeated at least or at most, respectively.
The MSDs consist of four to eight transmembrane α-helices (usually six) forming a channel allowing the translocation of the substrate through the membrane. MSDs are less conserved than NBDs, but some of them possess characteristic patterns: (1) the EAA loop, a pattern found between the penultimate and ante-penultimate transmembrane α-helices of some of MSDs, which probably constitute a site of interaction with the NBDs [28–31]; (2) the ABC-2 type transport system integral membrane protein signature, a motif found in the C-terminal section of some MSDs belonging to a sub-family of bacterial ABC transporters catalyzing export of drugs and carbohydrates  (Table 1).
The SBPs of bacterial ABC transporters present little amino acid sequence similarity, but in Gram-positive bacteria they possess a membrane lipoprotein lipid attachment site next to a hydrophobic signal sequence  (Table 1). This latter characteristic was also experimentally proved to be present in PstS-1, a mycobacterial SBP implicated in the import of inorganic phosphate .
During the last 4 years, entire genomes of both prokaryotic and eukaryotic living organisms have been sequenced. Systematic genomic analyses of E. coli, B. subtilis and Saccharomyces cerevisiae identified 57, 78 and 29 ABC transporters in each of these species, respectively [4, 5, 34]. This indicates that at least in prokaryotes, ABC transporters constitute one of the largest superfamilies of paralogous proteins.
The complete sequence of the genome of Mycobacterium tuberculosis (strain H37Rv), the tubercle agent, has now been achieved, but a more systematic analysis of the available data is necessary to improve our understanding of the biology of this bacterium, which is still responsible for more deaths than any other infectious agent [35–37]. Due to the importance of bacterial ABC transporters for the uptake of a large variety of essential nutrients, but also for the export of drugs or secretion of virulence factors, proteases and toxins, we decide to make a systematic inventory and analysis of the M. tuberculosis ABC transport systems. Data reported could be useful for the planning of experimental work, necessary for the physiological characterization of these transporters and for the conception of new therapeutic agents.
Our analysis is based on structural similarities between ABC transporters in all living organisms. Knowing that NBDs are the most conserved subunits of ABC transporters, we begin the analysis by the search of M. tuberculosis proteins containing their characteristic stretches of amino acids (Table 1). As genes encoding subunits of ABC transporters are generally clustered in the chromosome [1, 2], we then searched for the presence of MSD and SBP encoding genes in the neighborhood of those encoding the previously identified NBDs. By searches of homology between each subunit constituting one potential ABC transporter and proteins characterized in other organisms, we then predicted the function and the substrate transported by the identified permeases.
2 Identification of the nucleotide binding domains
To identify the NBDs of the M. tuberculosis ABC transporters, we scanned all the proteins predicted to be encoded by the genome of M. tuberculosis for their content of the WalkerA, the WalkerB and the ABC transporter family signature motifs (Table 1). As in NBDs, the ABC transporter family signature motif is always located between the two Walker A and B motifs (about 100 residues downstream of the WalkerA motif and 10 residues upstream of the WalkerB motif), we then checked if the identified proteins contain each of these three motifs at a correct relative positions. The Tuberculist server from the Pasteur Institute (http://bioweb.pasteur.fr/GenoList/TubercuList/) was used as a source of M. tuberculosis proteins and the three above patterns were searched with the protein pattern search program provided by this server. Allowing one mismatch in the WalkerA, in the WalkerB or in the ABC transporter family signature motifs, 43 potential NBDs in 36 different proteins were identified. Seven of them, Rv0194, OppD, Rv1473, UvrA, Rv2326c, Rv2477c and DppD, contain duplicate NBDs (Table 2). Comparison of the amino acid sequences of the 36 identified proteins with characterized NBDs of ABC transporters from other organisms (BlastP) confirmed their belonging to the ABC superfamily. However, one of them (UvrA) is unrelated to any transporter of the ABC superfamily. It is highly similar to the subunit UvrA of the ABC excision nuclease from many bacteria, a member of the ABC family which plays a role in the repair of UV-damaged DNA . In Table 2, we classified these proteins on the basis of the number of their NBDs and following the numerical nomenclature of Cole et al. .
List of M. tuberculosis proteins containing typical motifs of NBDs
aProtein names are those attributed by Cole et al. .
bThe amino acid sequences of the identified conserved motifs are surrounded by numbers indicating their position in the proteins. Bold letters indicate the presence of conserved amino acids in nearly all M. tuberculosis NBDs. Amino acids boxed in grey are those not matching the consensus patterns.
To identify possible additional NBDs not selected by the above mentioned strategy, a search of homology (BlastP) between all proteins of M. tuberculosis H37Rv and three representative NBDs of M. tuberculosis (PstB, SugC and CydC, see Table 2) was done. This search, performed at the M. tuberculosis blast server of the Sanger Centre (http://www.sanger.ac.uk/), allowed the identification of the two additional proteins Rv1463 and Rv1667c, each containing one NBD (Table 2). Rv1463 is composed of one NBD with two mismatches in the ABC transporter family signature while Rv1667c possesses a truncated NBD with no WalkerA motif and one mismatch in the ABC transporter family signature. In the genome of M. tuberculosis, the gene encoding Rv1667c is preceded by the one encoding Rv1668c. Analysis of Rv1668c showed that this protein contains one complete NBD followed by a second WalkerA motif (Table 2). Close inspection of the Rv1668c and Rv1667c encoding genes revealed that the start codon of Rv1667c precedes the stop codon of Rv1668c by only one nucleotide. The addition of one nucleotide just before the beginning of the start codon of Rv1667c allows the complete fusion of Rv1668c and Rv1667c in one protein. In this protein, the second WalkerA motif previously found in Rv1668c is now correctly positioned with the ABC transporter family signature, the WalkerB and the Linton and Higgins motifs of Rv1667c, forming a complete NBD. This suggests that Rv1668c and Rv1667c originated from one single gene encoding a protein composed of two NBDs. It is not known if the missing nucleotide is a simple sequencing error or a mutation that appeared during evolution.
By our analysis, a total of 45 NBDs in 38 different proteins were found to be encoded by the M. tuberculosis genome (Table 2). The alignment of the conserved motifs reveals that all mycobacterial NBDs contain a glycine at the first position of the WalkerA motif and a serine at the second position of the ABC signature, whereas NBDs from other species could also accept an alanine at each of these two positions (Table 2). The absence of an alanine at these two positions in M. tuberculosis NBDs is not due to its codon usage. Nearly all of these NBDs also contain at a correct location the motif recently described by Linton and Higgins, containing at the fifth position the highly conserved histidine implicated in the function of the ABC transporters (Table 2). The conservation of this histidine also suggests an essential role of this residue in the transport process of the M. tuberculosis ABC transporters.
3 Identification of the membrane-spanning domains
As MSDs are less conserved than NBDs and because their signature motifs are only found in some sub-families of MSDs, we based our strategy of identification on the fact that all MSDs are integral transmembrane proteins composed of four to eight α-helices, and on the fact that their encoding genes are usually organized in an operon with those encoding NBDs. Therefore, we first searched if proteins encoded by genes adjacent or close to the previously identified NBD encoding genes possess putative transmembrane domains. For that aim we used the TMHMM (v. 0.1) Server of the Center for Biological Sequence Analysis at the Technical University of Denmark (http://www.cbs.dtu.dk/services/TMHMM-1.0/), which also predicts the location and orientation of the α-helices in the membrane spanning proteins . The amino acid sequences of the identified proteins were then compared with those of characterized MSDs from ABC transporters of other organisms (BlastP) contained in the non-redundant protein database from the NCBI. This allowed the selection of a first set of 32 proteins which, with the exception of Rv0987 and Rv1217c, are composed of four to eight transmembrane predicted α-helices. Rv0987 and Rv1217c contain 10 and 12 α-helices, respectively. They are thus probably constituted of two fused MSDs. In Table 3, we classified these proteins on the basis of the number of their MSDs and following the numerical nomenclature of Cole et al. .
List of M. tuberculosis proteins containing typical characteristics of MSDs
aProtein names are those attributed by Cole et al. .
bMultiple domains expressed in a single protein are separated by a hyphen.
cNumbers of α-helices were predicted using the TMHMM (v. 0.1) server. Positions of the beginning of the first α-helix and of the end of the last α-helix are in parentheses.
dThe amino acid sequences of the identified motifs are surrounded by numbers indicating their position in the proteins. Amino acids boxed in gray indicate the presence of conserved amino acids found in nearly all M. tuberculosis MSDs. Amino acids in italics are those not matching the consensus patterns.
As some ABC transporters are known to contain MSDs fused to NBDs [4, 5, 34], we completed our search of MSDs by screening the M. tuberculosis proteins containing the typical motifs of NBDs (Table 2) for the presence of transmembrane α-helices. Ten of those proteins were also shown to possess one or more MSDs. Among these proteins, eight (Rv1273c, Rv1348, Rv1349, CydC, CydD, Rv1747, Rv1819c and Rv2326c) were found to contain four to six α-helices; one (Rv1272c) was predicted to contain three α-helices using the TMHMM server but four α-helices were predicted using the TMpred program (http://www.ch.embnet.org/software/TMPRED_form.html); and one protein (Rv0194) was found to contain nine α-helices (Table 3). In most of these proteins, the region containing the α-helices constitutes only one MSD which is located before the NBD, with the exception of Rv1747, which contains an MSD located after its NBD, and Rv0194, which contains two putative MSDs. This Rv0194 protein, constituted of two NBDs and two MSDs, possesses the four typical domains of an ABC transporter, fused and expressed in one single protein, a kind of arrangement mostly encountered in ABC transporters from higher eukaryotes.
We then searched for the occurrence of the EAA loop and of the ABC-2 type transport system integral membrane protein signature among the 42 M. tuberculosis proteins described in Table 3. Whereas the EAA loop is not absolutely present in all MSDs and the ABC-2 type transport system integral membrane protein signature is only found in some MSDs belonging to a sub-family of bacterial ABC transporters catalyzing export of drugs and carbohydrates, we were able to find these motifs in some of the previously identified proteins. Allowing two mismatches, 22 proteins were found to contain the EAA loop consensus sequence: 16 proteins contain the motif correctly located in a cytoplasmic loop between the penultimate and ante-penultimate α-helices; one protein (DppC) contains three transmembrane α-helices after the EAA cytoplasmic loop instead of the usual two; three proteins (PstC, Rv2040c and SugA) contain the motif in a predicted extracellular loop; one protein (FtsX) possesses this motif in a transmembrane domain; and one protein (Rv1686c) contains the EAA loop after the first α-helix. As the motif found in Rv1686c is located in the first part of the protein, it should not have any role in the interaction with an NBD. The EAA motif is never found in proteins constituted of MSD fused to NBD, confirming the hypothesis that this site has a role in the interaction between the MSD and the NBD. The presence of the ABC-2 signature was only found in the DrrB and DrrC proteins, suggesting their role in the export of drugs or carbohydrates (Table 3).
The list of MSD-containing proteins was finally completed by screening all M. tuberculosis proteins for the presence of an EAA loop motif or an ABC-2 signature. The Tuberculist server from the Pasteur Institute (http://bioweb.pasteur.fr/GenoList/TubercuList/) was used as a source of M. tuberculosis proteins and the above patterns were searched with the protein pattern search program provided by this server. Two proteins, UspA and UspE, not genetically associated with any NBD were thus detected (Table 3). These two proteins contain an EAA loop consensus sequence in a correct location. Our analysis predicts a total of 47 MSDs found in 44 proteins encoded by the M. tuberculosis genome (Table 3).
4 Identification of the substrate binding proteins
Our strategy for finding the SBPs of the importers was based on the facts that: (1) in Gram-positive bacteria and probably also in Mycobacteria, SBPs are lipoproteins containing a prokaryotic membrane lipoprotein lipid attachment site (Table 1); (2) the genes encoding the SBPs are usually organized in an operon with those encoding NBDs and MSDs. Therefore, we first screened all the proteins encoded by genes located in the vicinity of those encoding the other identified subunits of the ABC transporters (Tables 2 and 3), for the content of a prokaryotic membrane lipoprotein lipid attachment site. To perform this search, the PATTINPROT search program from the Network Protein Sequence Analysis server of the Pôle Bio-Informatique Lyonnais was used (http://pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_pattinprot.html). Allowing one mismatch, 12 proteins were found to contain this motif at their N-terminal extremity (Table 4). All these proteins showed some similarity with known SBPs from other organisms (BlastP in the non-redundant database of the NCBI). To complete our search and to detect other putative SBPs encoded by genes not associated with genes encoding MSDs or NBDs, we then screened all the M. tuberculosis proteins, this time for a perfect match with this prokaryotic membrane lipoprotein lipid attachment site (protein pattern search program from the Tuberculist server). This site was detected at the N-terminal extremity of three additional proteins, GlnH, FecB and FecB2, presenting similarities with known SBPs from other organisms as well (Table 4).
aProtein names are those attributed by Cole et al. .
bBlack boxes indicate the presence of conserved amino acids found in nearly all M. tuberculosis potential SBPs. Amino acids boxed in grey are those not matching the consensus sequence of the lipid attachment site.
5 Reconstitution and classification of the Mycobacterium tuberculosis ABC transporters
The genome of M. tuberculosis encodes at least 45 NBDs (in 38 different proteins), 47 MSDs (in 44 different proteins) and 15 SBPs that we tried to assemble as multi-subunit ABC transporters. Potential substrates were then allocated to the mycobacterial transporters following the homology of each of their subunits with characterized permeases from other organisms (BlastP).
As genes encoding subunits of an ABC transporter are generally organized in operon(s) or clustered in the same DNA region, we first organized the genes encoding the proteins identified in Tables 2–4 on the basis of their proximity in the genome of M. tuberculosis (Table 5).
aAnnotated black arrows indicate the direction of transcription of the genes encoding the different subunits of the transporters. Thin lines represent intergenic regions.
bDomains expressed in different proteins are ordered according to the organization of their encoding genes and are separated by breakings. Multiple domains expressed in a single protein are linked by a hyphen. Domains expressed in proteins predicted to function as dimers in the transporter are indicated between parentheses followed by the number 2.
*The complex organization of these transporters is described in the text.
Prokaryotic ABC transporters consisting of two NBDs and two MSDs, encoded by four genes clustered in one DNA region, could not be found in M. tuberculosis. As many bacterial transporters are also known to be composed of homodimeric NBDs and/or homodimeric MSDs each encoded by a single gene [39, 40], or with proteins containing fused domains (fusions between two NBDs, two MSDs or between an MSD and an NBD) [4, 5], we then searched the presence of clusters of at least one NBD and one MSD encoding genes. Twenty-two clusters of genes were identified (Table 5).
The first 15 clusters of Table 5 contain only genes encoding proteins with unfused domains. In seven of these clusters, at least one gene encoding an SBP was also found. The transporters encoded by these seven clusters of genes can thus unambiguously be classified as importers, the eight others being only predicted by default to be extruders.
The last seven identified clusters contain at least one gene encoding a multi-domain protein (Table 5). Two of them are composed of four genes encoding two fused NBDs, two separate MSDs and one SBP, constituting importing systems with fused NBDs; two are composed of two genes encoding one NBD and two fused MSDs, constituting potential exporting systems with homodimeric NBDs and fused MSDs; and three of them are composed of two genes, each encoding one protein composed of one MSD fused to one NBD, probably constituting exporters with heterodimeric MSD–NBDs (Table 5).
Finally, we found three unclustered genes, each encoding proteins composed of fused membrane spanning and nucleotide binding domains (Table 5). The first (rv0194) encodes a putative exporting protein containing two NBDs and two MSDs fused together; the two other genes (rv1819c and rv1747c) encode proteins containing one MSD fused to one NBD. They are probably exporters of the homodimeric MSD–NBD type.
This genetic analysis led to the reconstitution of at least 25 complete ABC transporters. Nine of them are predicted to function as importers, while the others are predicted to be extruders. All the other identified NBDs, MSDs or SBPs are encoded either by isolated genes or by genes unclustered with those encoding all the other subunits of the transporter (Table 5).
To organize those latter components also into potential transporters and to classify all the transporters into sub-families, we constructed a phylogenetic tree based on their most conserved domain, i.e. the NBDs. All NBDs identified in Table 2 (with the exception of UvrA for the reason explained in Section 1) were aligned using the GCG PileUp program. This multiple alignment was performed by the method of Saurin et al. , using the most conserved part of the NBDs starting from 30 residues before the beginning of the WalkerA motif and ending at the end of the WalkerB motif. Sequences of amino acids outside this interval are less conserved and can only be aligned with a great number of gaps. A tree was then generated from the aligned sequences, using the neighbor-joining method . The tree is constituted of two major branches which split into multiple branch lines (Fig. 2). Searches of homology between these NBDs and corresponding domains of ABC transporters from other organisms (BlastP) showed that each branch line contains NBDs belonging to transporters with similar putative functions. These results were confirmed by similar searches of homology between the other domains of the transporter to which a particular NBD belongs and equivalent domains of ABC transporters from other organisms. This allows a classification of the transporters into multiple sub-families which were numbered following the nomenclature that Linton and Higgins, and Quentin et al. used for E. coli and B. subtilis ABC transporters, respectively [4, 5].
Unrooted tree of the M. tuberculosis nucleotide binding domains. The protein names refer to those attributed by Cole et al. . Each NBD fused in one protein is shown separately and is suffixed by -N or -C according to its amino- or carboxy-terminal position in the protein. The NBDs were aligned using the GCG PileUp program. Pair-wise distances between the different sequences were calculated using the GCG Distances program and the tree was constructed by the GCG Growtree program, using the neighbor-joining method . Sub-families and potential substrates are indicated. The scale represents the numbers of substitutions per 100 residues.
In E. coli, the majority of transporters have been classified into 10 sub-families. Sub-family 10 is absent in B. subtilis which contains the new sub-families 11 and 12. By our analysis, we found M. tuberculosis transporters belonging to nine of the 12 previously described sub-families. Transporters from sub-families 1 (implicated in monosaccharide uptake in E. coli and B. subtilis), 9 (unknown function) and 10 (involved in uptake of amino acids of E. coli) were not found in M. tuberculosis. Transporters from all the other sub-families are described and discussed hereafter.
5.1 Sub-family 2: peptide transporters
ABC transporters belonging to sub-family 2 have been described as involved in peptide uptake in both E. coli and B. subtilis. Small peptides play important roles in bacterial nutrition and signaling. Two M. tuberculosis transporters are putatively implicated in the import of peptides and their NBDs form a well defined sub-family on the tree (Fig. 2). Both are homologous to importers of oligopeptides and dipeptides from various bacteria [42–48]. The various subunits of each of these two transporters are encoded by genes organized in one potential operon (Table 5). Each transporter is composed of a substrate binding lipoprotein (OppA or DppA), and of the four typical domains of the ABC transporters, i.e. two NBDs fused in one protein (OppD or DppD) and two MSDs (OppB and OppC or DppB and DppC) (Table 5). Recently, a mutant strain of M. bovis BCG was constructed in which the oppD gene was interrupted with a selectable marker . This mutant strain was found to be resistant to the toxic tripeptides glutathione and S-nitrosoglutathione, indicating that the nucleotide binding domain OppD is implicated in peptide transport in this species. Nevertheless, whereas di- and oligopeptide permeases were identified among a number of mutant loci affecting growth and survival in Staphylococcus aureus, the survival of the OppD mutant of M. bovis BCG in cultured unactivated murine macrophages was unimpaired compared with its wild-type parent [49, 50].
5.2 Sub-family 3: macrolide transporters
Three potential ABC transporters showing high similarities with macrolide ABC transporters from streptomycetes [51–53] and staphylococci  are grouped in this sub-family. Only genes encoding NBDs of the transporters were found in M. tuberculosis. Two of these transporters are composed of a protein (Rv1473 or Rv2477) containing two NBDs, while the other is composed of two proteins (Rv1667c and Rv1668c) each containing one NBD (with the restriction concerning the missing nucleotide in the gene encoding Rv1668c as discussed in Section 2). As in other species, genes encoding these three transporters are not genetically associated with genes encoding MSDs (Table 5) [4, 5, 55, 56]. It is not known if transporters of this sub-family function in the absence of an MSD or are associated with an MSD encoded by a gene not linked with the one encoding their NBD. These three transporters probably confer resistance of M. tuberculosis to macrolide antibiotics, probably by exporting them from the cytoplasm. Interestingly, a Mycobacterium smegmatis homolog of Rv1473 was recently identified and shown to be induced during the anaerobic stationary phase of growth. During this growth phase, M. smegmatis and M. tuberculosis were also found to be resistant to standard anti-mycobacterial drugs [57–59].
5.3 Sub-family 4: amino acid transporters
Putative amino acid transporters homologous to importers of several osmoprotectants including glycine betaine and l-proline and homologous to importers of glutamine were identified.
The subunits of the transporter presenting high similarities with glycine betaine transporters from various bacteria (including E. coli [60–62], B. subtilis or Salmonella typhimurium) are encoded by a cluster of four genes proX, proV, proW and proZ (Table 5). This transporter, which is composed of one SBP (ProX), two different MSDs (ProW and ProZ), and two copies of one NBD (ProV), must be implicated in the survival of the mycobacteria under osmotic stress.
The putative glutamine importing system(s) consists of five proteins (Rv2563, GlnQ, Rv0072, Rv0073, and GlnH) encoded by genes located in three different regions of the genome (glnH in one region, rv0072 and rv0073 in a second region and rv2563 and glnQ in a third region) (Table 5). The NBDs GlnQ and Rv0073 are two nearly identical proteins of 330 amino acids showing 83.6% sequence identity. They are highly similar to the NBD of the glutamine importer of E. coli. The MSDs Rv2563 and Rv0072 are also almost identical. They are composed 349 amino acids presenting 76.2% sequence identity and they each contain four transmembrane α-helices. Whereas Rv2563 and Rv0072 show no similarity with the MSDs of the E. coli glutamine permease (nor with any other known proteins), we postulate that these proteins constitute the MSD of glutamine transporter because of the proximity of their encoding genes to glnQ and rv0073, and due to the fact that we were unable to find any other M. tuberculosis proteins similar to the MSD of the E. coli glutamine transporter (BlastP in the M. tuberculosis protein database). In a third region of the genome, a gene encoding a putative glutamine binding protein, GlnH, was identified. GlnH is highly similar to the glutamine receptor of E. coli and Bacillus stearothermophilus. GlnH probably constitutes the substrate binding protein of the above M. tuberculosis glutamine transporter. This putative transporter would be an importer composed of heterodimeric NBDs (GlnQ and Rv0073) and heterodimeric MSDs (Rv2563 and Rv0072). However, we cannot exclude the possibility of the existence of two different glutamine importers, each composed of homodimeric NBDs (two copies of GlnQ or of Rv0073) and homodimeric MSDs (two copies of Rv2563 or of Rv0072), using GlnH as a common SBP.
5.4 Sub-family 5: carbohydrate transporters
Some transporters highly similar to importers of carbohydrates (glycerol-3-phosphate and undefined sugar importing systems) were identified. The subunits of the glycerol-3-phosphate transporter are encoded by genes (ugpA-ugpE-ugpB-ugpC) organized in a putative operon (Table 5). All proteins encoded by this cluster of genes are highly similar to the different components of the glycerol-3-phosphate importer of E. coli. This transporter is composed of one SBP (UgpB), two MSDs (UgpA and UgpE) and two copies of one NBD (UgpC). Two sugar importers presenting a similar organization were also found. They are encoded by genes clustered in two potential operons: the rv2041c-rv2040c-rv2039c-rv2038c operon and the sugC-sugB-sugA-lpqY operon (Table 5). Each of these two transporters is composed of one SBP (LpqY or Rv2041c), two MSDs (SugA and SugB or Rv2039c and Rv2040c), and two copies of one NBD (SugC or Rv2038c). It is not known if these two transporters have a specificity for a particular carbohydrate because both of them are highly similar to many bacterial sugar transporters, including maltose/maltodextrins [66–69], lactose , cellobiose/cellotriose  and multiple sugar  transporters. A final potential sugar transporter could exist in M. tuberculosis. We found a cluster of genes (uspA-uspE-uspC) encoding two MSDs (UspA and UspE) and one SBP (UspC) each also similar to many components of bacterial sugar transporters. However, these three genes are not genetically associated with any NBD. It is therefore possible that this final transporter is non-functional or that it uses the NBD of another sugar transporter, i.e. SugC or Rv2038c.
5.5 Sub-family 6: multidrug transporters
Four transporters similar to multidrug resistance (MDR) proteins of eukaryotes  and prokaryotes  were identified. Two of them are encoded by two genes arranged in tandem (rv1273c-rv1272c and rv1348-rv1349) (Table 5). All of the proteins encoded by rv1273c, rv1272c, rv1348 and rv1349 are MSD–NBD fusion proteins and the two corresponding exporters should be of the heterodimeric MSD–NBD fusion type. One of the multidrug exporters is encoded by only one gene (rv1819) whose product is an MSD–NBD fusion protein. This exporter should be of the homodimeric MSD–NBD fusion protein type. The last one, encoded by one gene (rv0194), is constituted by only one protein (Rv0194) containing, as in eukaryotes, two MSDs and two NBDs fused together. The NBDs of all the above multidrug transporters are grouped, on the tree depicted in Fig. 2, with the NBDs contained in three other proteins, CydC, CydD and Rv3041c. CydC and CydD are encoded by genes arranged in tandem. They are MSD–NBD fusion proteins and they probably constitute one transporter of the heterodimeric MSD–NBD fusion type. These proteins are highly similar to CydC and CydD of E. coli [75, 76] and B. subtilis, two proteins implicated in the export of a component involved in the assembly of cytochromes that are found in the periplasm or exposed to the external side of the cytoplasmic membrane. The latter protein, Rv3041c, is an NBD encoded by one single gene, which is not associated with MSD or SBP encoding genes. Its function remains to be determined.
5.6 Sub-family 7: antibiotic transporters
Six transporters of this sub-family were identified in M. tuberculosis. These transporters can be split into two groups (7a and 7b) following the clustering of their NBDs on the tree (Fig. 2) and on the basis of the transported substrate. Group 7a is composed of four exporters similar to many already described antibiotic resistance systems. The first of these transporters is encoded by three genes clustered in the genome (drrA-drrB-drrC) (Table 5). It should be composed of two copies of the NBD (DrrA), and of one copy of each MSD (DrrB and DrrC). These three components are highly similar to the different components of the daunorubicin exporters from various Streptomyces species , and the DrrB and DrrC proteins (MSDs) contain the ABC-2 signature, a characteristic of some already described exporters of drugs and carbohydrates (Table 3). By signature-tagged transposon mutagenesis, this transporter was recently identified as a potential virulence factor of M. tuberculosis. The second exporter of group 7a is encoded by two genes arranged in tandem (rv1218c-rv1217c) (Table 5). It is probably composed of two copies of the NBD (Rv1218c), and of one copy of Rv1217c, a protein composed of two fused MSDs. This transporter is highly similar to the tetronasin ATP-dependent efflux system of Streptomyces longisporoflavus. The last two putative transporters of group 7a are each encoded by three genes organized in an operon (rv1458c-rv1457c-rv1456c and rv2688c-rv2687c-rv2686c). They could be composed of two copies of the NBD (Rv1458c or Rv2688c), and of one copy of each MSD (Rv1456c and Rv1457c or Rv2687c and Rv2686c), whereas it is worth noting that the putatively identified MSDs display no significant similarity to any MSD of identified ABC transporters. The substrates transported by these last two putative exporters remain obscure although their NBDs are somehow similar to the NBDs of the DrrA-DrrB-DrrC and Rv1217c-Rv1218c systems.
Group 7b is composed of transporters with unknown function. The first transporter of this group is encoded by only one gene (rv1747). This gene encodes a protein containing one MSD fused to one NBD and the transporter must thus be constituted of two copies of Rv1747 (Table 5). Rv1747 is highly similar to the White protein from Drosophila melanogaster, a permease necessary for the transport of pigment precursors into pigment cells responsible for eye color , and with NodI from Rhizobium and Bradyrhizobium strains, a protein implicated in the nodulation process by the export of a polysaccharide . The second transporter of this group is encoded by two genes arranged in tandem (rv1687c-rv1686c) (Table 5). These two genes encode an NBD (RV1687c) and an MSD (Rv1686c), respectively. The transporter must be composed of two copies of Rv1687c and of Rv1686c. Like Rv1747, this transporter is highly similar to NodI of Rhizobium and Bradyrhizobium strains , but also to NosF of Pseudomonas stutzeri, a member of a system involved in copper processing and transport .
5.7 Sub-family 8: iron transporters
In E. coli and B. subtilis, ABC transporters belonging to sub-family 8 were described as involved in iron(III) uptake [4, 5]. In M. tuberculosis, two different putative iron(III) dicitrate binding proteins, FecB and FecB2, were found (Table 4). These proteins possess respectively 25% and 30% similarity with the iron(III) dicitrate binding protein FecB of E. coli. However, contrary to the situation observed in E. coli, genes encoding their associated MSDs and NBDs could not be found in the M. tuberculosis genome. Sub-family 8 is therefore not represented on the NBD-based tree (Fig. 2) and the identification of mycobacterial transporters belonging to this sub-family is only based on the homology of the two above proteins with FecB of E. coli. No other iron importing ABC transporter genes have been detected in the genome of M. tuberculosis, whereas three and eight different iron importing ABC transporters are encoded in the genome of E. coli and B. subtilis, respectively [4, 5]. This observation is surprising since many siderophores, i.e. salicylic acid, citric acid, mycobactins and exochelins, have been described as involved in iron acquisition of slow- and fast-growing mycobacteria (for review, see ). One ferric exochelin uptake ABC transporter similar to the E. coli enterochelin importer has also been discovered in M. smegmatis, but BlastP searches with components of this system revealed no homologies with any M. tuberculosis proteins.
5.8 Sub-family 11: anion transporters
Transporters of sub-family 11 are specific for the import of anions. One sulfate, one molybdate and at least three inorganic phosphate permeases were found to be encoded by the M. tuberculosis genome.
The putative sulfate importer is encoded by four genes clustered in one potential operon (subI-cysT-cysW-cysA). It is composed of one SBP (SubI), two MSDs (CysT and CysW), and two copies of one NBD (CysA) (Table 5). All of these subunits are highly similar to the various components of the sulfate transporters of E. coli and of the cyanobacteria, Synechocystis PCC6803  and Synechococcus PCC7942 . The putative molybdate transporter is highly similar to the molybdate importing system of E. coli. It is encoded by by three genes organized in a putative operon (modA-modB-modC). The transporter is composed of an SBP (ModA), two copies of the MSD (ModB) and two copies of the NBD (ModC) (Table 5). A mutant of M. tuberculosis with a transposon integrated in modA was recently shown to be affected in its multiplication within the lungs of mice . Three putative inorganic phosphate transporters have been previously described in the literature [91–94]. They are highly similar to the phosphate-specific importer (Pst system) of E. coli. Genes encoding these transporters are contained in one cluster of about 10 000 bp constituted of three operons (Table 5). The first operon is composed of phoS1 (also called pstS-1 in the literature ) encoding an SBP, pstC and pstA2 (also called pstC-1 and pstA-2 in the literature ) each encoding an MSD, and pstB encoding an NBD. The transporter encoded by this operon is thus composed of one SBP (PhoS1), one copy of each MSD (PstC and PstA2) and two copies of one NBD (PstB) [92, 96]. A second operon is composed of phoS2 (also called pstS-3 in the literature ) encoding a second phosphate binding protein, and pstC2 and pstA1, each encoding an MSD. A last operon is composed of pstS (also named pstS-2 in the literature ) encoding an SBP, and pknD (also named mbK in the literature ) encoding a serine/threonine kinase. As PhoS2 and PstS (like PhoS1) were experimentally shown to be expressed at the surface of the bacteria, we postulate the existence of at least two other inorganic phosphate transporters . One should be composed of an SBP (PhoS2), one copy of each MSD (PstC2 and PstA1) and two copies of one NBD (PstB) and the second should be composed of an SBP (PstS), two copies of one NBD (PstB), and two MSDs, formed either by two copies of PstC2, PstA1, PstC or PstA2, or by a combination of one copy of two of these four proteins. Independently of this cluster of three operons, the genome of M. tuberculosis also contains a gene called phoT. This gene encodes an NBD possessing 69% similar amino acids with the ATPase PstB from the Pst system of E. coli (Table 2). The role and the function of this protein remain obscure since its encoding gene is not associated with any genes encoding MSDs or SPBs. It can nevertheless not absolutely be excluded that PhoT uses the above cited SBPs and MSDs to form an additional inorganic phosphate transporter. Recently, Banerjee and colleagues [98–100] reported the identification of the mpt1 gene of M. smegmatis. This 777-bp gene, which is involved in the process of efflux-mediated drug resistance, was found to be overexpressed in ciprofloxacin-resistant M. smegmatis. The authors found that the mpt1 gene encodes a protein containing 47% similar amino acids with the NBD PstB of M. tuberculosis and therefore postulated that mpt1 is part of an operon encoding a complete multi-domain phosphate ABC transporter. We also performed a search of homology between the protein encoded by mpt1 and all the M. tuberculosis proteins. Surprisingly, we found that the Mpt1 protein is much more similar to the NBD PhoT of M. tuberculosis (87% similar amino acids) than to PstB. Whether mpt1 belongs to an operon also encoding MSDs and an SBP of a phosphate importer is thus for the moment speculative.
5.9 Sub-family 12
The proteins Rv2326c and Rv1463 are grouped on the tree based on the NBDs (Fig. 2). They are classified in subfamily 12 as one of them (Rv2326c) presents some similarities with transporters of unknown function of the B. subtilis sub-family 12 . Rv2326c is also slightly similar to several hypothetical proteins from Methanobacterium thermoautotrophicum annotated as putative cobalt transporters . Rv2326c is composed of a fusion of one MSD and two NBDs, while Rv1463 is composed of a single NBD. As these two proteins are encoded by genes not associated with other components encoding genes, their function remains to be determined. Recently, a homolog of Rv1463 was shown to be induced during the anaerobic stationary phase of growth of M. smegmatis.
Four potential ABC transporters could not be classified in the previously described sub-families of transporters. The NBDs of two of them (the FtsEX and Rv0986-Rv0987 systems) are grouped in a same branch of the tree, while the NBDs of the two others (Rv3781 and Rv0655) each form an unclassified branch line (Fig. 2).
The ftsX-ftsE genes have already been described in the literature . They encode one MSD and one NBD, respectively. The FtsEX transporter, which must be composed of two copies of each of these proteins, bears significant homology with a FtsEX system implicated in the cell division of E. coli [103, 104]. The Rv0986-Rv0987 transporter is also encoded by two genes arranged in tandem. One of those genes encodes an NBD (Rv0986) while the other encodes a protein containing two fused MSDs (Rv0987). This transporter is thus probably composed of two copies of Rv0986 and one copy of Rv0987. It is absent in B. subtilis and E. coli. The protein Rv0987 of M. tuberculosis is predicted to contain 10 transmembrane α-helices constituting two putative MSDs. The N-terminal MSD possesses 40% similar amino acids with the protein AttF of Agrobacterium tumefaciens, while the C-terminal MSD contains 41% similar amino acids with the protein AttG of the same bacterium. Rv0986, the NBD, possesses 50% similar amino acids with the protein AttE of Agrobacterium tumefaciens and is also homologous to the NBDs of many other permeases. A search in the non-redundant protein database of the NCBI (BlastP) for proteins homologous to AttE, AttF and AttG of A. tumefaciens revealed homologs exclusively in M. tuberculosis. The att genes products of A. tumefaciens are required for virulence and bacterial attachment to host cells. It was postulated that they encode a transporter implicated in the export of a substance required for bacterial attachment to host cells . Due to the capacity of M. tuberculosis to invade different eukaryotic cell types, including professional phagocytic cells such as macrophages, but also normally non-phagocytic cells, such as epithelial cells [106, 107], it could be possible that this transporter plays a role in M. tuberculosis adherence to and/or entry into host cells, by exporting a yet unknown factor. It would nevertheless be noted that the FtsEX and the Rv0986-Rv0987 transporters are the only two examples of completely reconstituted transporters classified as importers on the basis of the clustering of their NBDs (Fig. 2) for which we were not able to find an associated SBP. Their classification as importers or exporters is thus still controversial for the moment.
The transporter containing the NBD Rv3781 is encoded by two genes (rv3781-rv3783) separated by an ORF encoding a putative sugar transferase. This third unclassified transporter is composed of two copies of Rv3781, the NBD, and of two copies of Rv3783, the MSD. Rv3781 possesses 54%, 55%, 54% and 40% similar amino acids with the proteins RfbE of Yersinia enterolytica, Wzt of Brucella melitensis (accession number AF047478), RfbI of Vibrio cholerae and KpsT of E. coli, respectively. All these proteins constitute the NBD of capsular lipopolysaccharide (LPS) and O-antigen exporting systems. Rv3781 also possesses 48% similar amino acids with the N-terminal part of TagH of B. subtilis, the NBD of an ABC transporter involved in the translocation of cell wall teichoic acids. Rv3783, the MSD, is similar to the proteins constituting the MSDs of the above transporters. Rv3783 possesses 53%, 36%, 45%, 35% and 35% similar amino acids with RfbD of Y. enterolytica, Wzm of B. melitensis, RfbH of V. cholerae, KpsM of E. coli and TagG of B. subtilis, respectively. Based on these homologies, it is likely that the transporter (Rv3781)2/(Rv3783)2 exports polysaccharides to the surface of M. tuberculosis. Lipoarabinomannan could be considered a candidate substrate for this transporter. Indeed, this mycobacterial LPS is generally considered to be equivalent to the LPS of the Gram-negative bacteria and well known for its potent immunological effects and for its role in the pathogenicity of the bacteria . Inactivation of the genes encoding the (Rv3781)2/(Rv3783)2 transporter, in combination with biochemical analysis of the cell surface components, would more precisely define its role.
Finally, we found the rv0655 gene, which encodes an NBD possessing 90% amino acids similar with the protein Mkl of Mycobacterium leprae. In M. leprae, the mkl gene is linked to RNA polymerase genes, and its product has been predicted to be a member of the ATP-hydrolyzing ABC proteins . It is not known if the Rv0655 protein is a part of a functional transporter because the rv0655 gene was not found to be genetically associated with any MSD or SBD encoding genes.
6 Duplication and gene fusion events
Large multi-domain proteins resulting from duplications and gene fusion events are a characteristic of the ABC transporter superfamily [4, 5, 34]. We analyzed these events in M. tuberculosis in relation to the classification of transporters into sub-families.
Each of the two transporters of sub-family 2 possesses a protein containing two fused NBDs (DppD and OppD) (Table 2). The N- and C-terminal NBDs of these two proteins are separately sub-clustered on the tree (Fig. 2). As DppD and OppD are similarly organized, it is probable that the gene duplication and fusion events occurred before the duplication and differentiation of the transporters of this family. In contrast, in sub-family 3, two transporters possess proteins containing two fused NBDs (Rv1473 and Rv2477) (Table 2), while one transporter seems to possess two different proteins each containing one NBD (Rv1667c and Rv1668c). This suggests that the multi-domain proteins Rv1473 and Rv2477 result from a gene fusion event that occurred in evolution after the differentiation of the transporters of this sub-family. However, if we consider, as discussed in Section 2, that Rv1667c and Rv1668c originated from a single gene encoding a protein composed of two fused NBDs, sub-family 3 would contain three transporters similarly organized (possessing a protein containing two fused NBDs). In this more probable case the fusion event between the two NBD encoding genes would have occurred before the multiplication and differentiation of the transporters of this sub-family. The analysis of the organization of the domains of the transporters classified in sub-family 6 reveals a more complex situation. If we exclude Rv3041c, an orphan NBD of unknown function, the transporters of this sub-family are of two types. The transporters of the first type are composed of proteins containing an NBD fused to an MSD. These proteins are likely to form a homodimeric transporter [(Rv1819)2] when the corresponding gene is isolated in the genome, or to form a heterodimeric transporter [(Rv1272c/Rv1273c) and (Rv1348c/1349c)] when the corresponding genes are arranged in tandem in the genome (Table 5). The second type of transporter is represented by Rv0194, a protein in which the four domains characterizing an ABC transporter, i.e. two MSDs and two NBDs, are fused. The observed pattern of domain fusion suggests that the ancestral progenitor of this sub-family is the homodimeric transporter (Rv1819)2 which was previously subjected to a first MSD–NBD fusion event. rv1819 then underwent multiplications leading to the heterodimeric differentiated transporters (Rv1272c/Rv1273c) and (Rv1348c/1349c). After these multiplications, a second gene fusion event probably occurred leading to the MSD–NBD–MSD–NBD fusion protein Rv0194 (Table 3).
Sub-family 7 is composed of members with homodimeric NBDs and homodimeric MSDs [(Rv1687c)2/(Rv1686c)2], members with homodimeric NBDs and heterodimeric MSDs [(DrrA)2/DrrB/DrrC, (Rv1458c)2/Rv1457c/Rv1456c, (Rv2688c)2/Rv2687c/Rv2686c], members with homodimeric NBDs and monomeric fused MSDs [(Rv1218c)2/Rv1217c], and a transporter formed by a dimer of a protein containing one MSD fused to one MBD [(Rv1747)2]. The ancestral progenitor of this sub-family is probably (Rv1687c)2/(Rv1686c)2, a transporter with homodimeric NBDs and homodimeric MSDs. The rv1687c-rv1686c tandem genes then probably underwent triplication before a first event of gene fusion in one of the triplicated tandems, leading to (Rv1747)2, a transporter formed by two proteins each composed of one MSD fused to one NBD. This gene fusion event was followed by duplication of the MSD encoding gene in another previously triplicated tandem. This new cluster of three genes then underwent quadruplication leading to (DrrA)2/DrrB/DrrC, (Rv1458c)2/Rv1456c/Rv1457c and (Rv2688c)2/Rv2687c/Rv2686c, three transporters composed of homodimeric NBDs and heterodimeric MSDs. The two MSD encoding genes of the last quadruplicated cluster then fused leading to the synthesis of (Rv1218c)2/Rv1217c, a transporter with homodimeric NBDs and monomeric fused MSDs.
Apart from these six or seven fusion events (one in sub-family 2, one or two in subfamily 3 and two in sub-families 6 and 7), three other fusion events probably occurred in three other proteins of the ABC superfamily, two in Rv2326c and one in Rv0987. A total of nine or 10 fusion events probably occurred during the evolution of the ABC transporters of M. tuberculosis.
The identified M. tuberculosis NBDs, MSDs and SBPs are distributed in at least 37 complete and incomplete ABC transporters. We were able to reconstitute completely 26 of these transporters, and to attribute potential substrates to 24 of them (Table 5). Eleven transporters were not completely reconstituted, but potential substrates could also be attributed to six of them. The M. tuberculosis transporters belonging to sub-families 2, 4, 5, 8 and 11 are probably implicated in the import of peptides, amino acids, sugars, iron and anions, respectively, while transporters of sub-families 3, 6, 7, and 12 could be involved in the export of macrolides, multidrugs, antibiotics and of unknown substrates. After analysis, the tree generated on the basis of the homology between the NBDs revealed that its upper main branch is only composed of NBDs from exporters while its lower branch contains a majority of NBDs from importers (Fig. 2). These results are in good agreement with those observed by Saurin et al.  who pointed out a segregation of the ABC transporters according to their functional characteristic: import versus export.
The recent complete analysis of ABC transport systems from E. coli and B. subtilis allows the comparison of the distribution of these transporters in three different bacterial genomes. The E. coli genome encodes 57 ABC transporters, of which 44 are importers and 13 are presumed exporters. The B. subtilis genome encodes 78 ABC transporters that have been split into 38 importers and 40 extruders. The number of importers is about the same between these two bacteria, but extruders are more represented in B. subtilis than in E. coli.
Contrary to E. coli but similarly to B. subtilis, there is a relatively equal representation of exporting systems versus importing systems in M. tuberculosis (16 importers versus 21 exporters). Many of those exporting systems are potentially implicated in the export of drugs and they probably contribute to the intrinsic resistance of M. tuberculosis to many antibiotics, a characteristic usually mainly attributed to the relative impermeability of its cell wall . One interesting exporting system is (Rv0986)2/Rv0987 which seems to be specific to M. tuberculosis and A. tumefaciens and which could be required for virulence and bacterial attachment to host cells . Another one is (Rv3781)2/(Rv3783)2 which probably exports polysaccharides (perhaps lipoarabinomannan) at the mycobacterial cell surface.
When compared with the number of E. coli and B. subtilis importers, only a few importers (16) are encoded by the M. tuberculosis genome. This is particularly apparent for the transporters involved in carbohydrate uptake (sub-family 1 and sub-group of sub-family 5). These two sub-families group a total of nine and 10 importers in B. subtilis and E. coli, respectively. In M. tuberculosis, transporters of sub-family 1 seem to be absent and only three (or at most four) transporters of sub-family 5 are present. The same situation is observed with the importers of amino acids of sub-family 4. In B. subtilis and E. coli, a total of five and six members of the sub-family 4 are present, respectively, but only two transporters of this sub-family are present in M. tuberculosis. This under-representation may reflect the capacity of M. tuberculosis to synthesize many essential compounds, including essential amino acids, vitamins and enzyme co-factors , and to grow in the presence of few external nutrients . Nevertheless, despite a reduced number of importers in M. tuberculosis, at least three of them are probably implicated in the import of inorganic phosphate, whereas in E. coli and B. subtilis only one such system is present [4, 5, 94]. As phosphate is an essential but often limiting nutrient, it is possible that several inorganic phosphate importers are necessary to allow the survival of M. tuberculosis in the different environments to which it is exposed during the infectious cycle (phagosomes, granulomas, caseum, etc.).
Genes encoding the proteins potentially implicated in the formation of the ABC permeases account for about 2.5% of the genome of M. tuberculosis. These proteins probably constitute one of the largest families of paralogous proteins present in M. tuberculosis. Experimental work is now necessary to confirm our sequence-based analysis and to improve the understanding of the physiological role of all these transporters.
This work was supported by Grants PL962167, 962134 and QLK2-1999-01093 from the European Economic Community (BIOMED2).
(1993) Bacterial periplasmic permeases as model systems for the superfamily of traffic ATPases, including the multidrug resistance protein and the cystic fibrosis transmembrane conductance regulator. Int. Rev. Cytol. 137, 1–35.
(1991) Characterization, localization and transmembrane organization of the three proteins PrtD, PrtE and PrtF necessary for protease secretion by the gram-negative bacterium Erwinia chrysanthemi. Mol. Microbiol. 5, 2427–2434.
(1997) Subunit interactions in ABC transporters: a conserved sequence in hydrophobic membrane proteins of periplasmic permeases defines an important site of interaction with the ATPase subunits. EMBO J. 16, 3066–3077.
Sonnhammer, E.L.L., von Heijne, G. and Krogh, A. (1998) A hidden Markov model for predicting transmembrane helices in protein sequences. in: Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology (Glasgow, J., Littlejohn, T., Major, F., Lathrop, R., Sankoff, D. and Sensen, C., Eds.), pp. 175–182, AAAI Press, Menlo Park, CA.
(1991) Peptide transport and chemotaxis in Escherichia coli and Salmonella typhimurium: characterization of the dipeptide permease (Dpp) and the dipeptide-binding protein. Mol. Microbiol. 5, 1035–1047.
(1984) The nucleotide sequence of the gene for malF protein, an inner membrane component of the maltose transport system of Escherichia coli. Repeated DNA sequences are found in the malE-malF intercistronic region. J. Biol. Chem. 259, 10896–10903.
(1994) The cydD gene product, component of a heterodimeric ABC transporter, is required for assembly of periplasmic cytochrome c and of cytochrome bd in Escherichia coli. FEMS Microbiol. Lett. 117, 217–223.
(1990) Nitrous oxide reductase from denitrifying Pseudomonas stutzeri. Genes for copper-processing and properties of the deduced products, including a new member of the family of ATP/GTP-binding proteins. Eur. J. Biochem. 192, 591–599.
(1989) Nucleotide sequences of the fecBCDE genes and locations of the proteins suggest a periplasmic-binding-protein-dependent transport mechanism for iron(III) dicitrate in Escherichia coli. J. Bacteriol. 171, 2626–2633.
et al. (1997) Three different putative phosphate transport receptors are encoded by the Mycobacterium tuberculosis genome and are present at the surface of Mycobacterium bovis BCG. J. Bacteriol. 179, 2900–2906.
(1993) Genetic organization and sequence of the rfb gene cluster of Yersinia enterolitica serotype O:3: similarities to the dTDP-L-rhamnose biosynthesis pathway of Salmonella and to the bacterial polysaccharide transport systems. Mol. Microbiol. 9, 309–321.