OUP user menu

Comparative genomics of metabolic pathways in Mycobacterium species: gene duplication, gene decay and lateral gene transfer

Pradeep Reddy Marri, John P. Bannantine, Geoffrey B. Golding
DOI: http://dx.doi.org/10.1111/j.1574-6976.2006.00041.x 906-925 First published online: 1 November 2006

Abstract

The genus Mycobacterium comprises significant pathogenic species that infect both humans and animals. One species within this genus, Mycobacterium tuberculosis, is the primary killer of humans resulting from bacterial infections. Five mycobacterial genomes belonging to four different species (M. tuberculosis, Mycobacterium bovis, Mycobacterium leprae and Mycobacterium avium ssp. paratuberculosis) have been sequenced to date and another 14 mycobacterial genomes are at various stages of completion. A comparative analysis of the gene products of key metabolic pathways revealed that the major differences among these species are in the gene products constituting the cell wall and the gene families encoding the acidic glycine-rich (PE/PPE/PGRS) proteins. Mycobacterium leprae has evolved by retaining a minimal gene set for most of the gene families, whereas M. avium ssp. paratuberculosis has acquired some of the virulence factors by lateral gene transfer.

Keywords
  • Mycobacterium
  • comparative genomics
  • metabolic pathways
  • pathogenicity

Introduction

The study of pathogenic bacteria is undergoing a paradigm shift. The enormous amount of data coming from sequencing projects and the availability of bioinformatics tools for faster analysis of the generated data are revolutionizing the science of bacterial pathogenesis. The availability of complete sequences of different species belonging to a single genus enables comparative studies to understand the differences and commonalities among a group of species. Studies involving comparisons of complete microbial genomes can reveal significant differences in gene content and genome organization between closely related bacteria, provide insights into physiology and pathogenesis, and can identify polymorphic sequences with potential relevance to pathogenesis, immunity and evolution (Schoolnik, 2002; Alsmark, 2004; Bai, 2004; Eppinger, 2004; Ferretti, 2004; Moreira, 2004; Nascimento, 2004; Prentice, 2004).

Mycobacterium is one of the most studied pathogenic genera owing to the severity of its impact on human populations. To date, five genomes of mycobacterial species have been sequenced [Mycobacterium tuberculosis H37Rv (Cole, 1998; Camus, 2002), M. tuberculosis CDC1551 (Fleischmann, 2002), Mycobacterium bovis (Garnier, 2003), Mycobacterium leprae (Cole, 2001) and Mycobacterium avium ssp. paratuberculosis (Li, 2005)] and another 14 are at various stages of completion (http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi). Mycobacterium tuberculosis, the causative agent of tuberculosis (TB) in humans, infects one-third of the world population, with a newly infected individual added every second (Snider, 1994). Each year about 2 million people die from TB infection (http://www.who.int/tb/en). Moreover, TB is the leading cause of death among people who are HIV-positive, accounting for about 13% of AIDS deaths worldwide. Leprosy, caused by M. leprae, is one of the oldest recorded diseases and still remains a major public health problem (Cole, 2001). Mycobacterium bovis and M. avium ssp. paratuberculosis are cattle pathogens causing TB and Johne's disease, respectively. Owing to a variety of diseases caused by these organisms and variations in their genome size, a comparative study of these species would generate a wealth of information providing insights into the regions responsible for pathogenesis and host-specificity.

Earlier comparative studies in mycobacteria have mostly involved different strains of the same species of M. tuberculosis to understand the variability of pathogenesis among different strains (Fleischmann, 2002). Other similar studies either involved the comparison of M. tuberculosis with M. bovis or of M. tuberculosis with M. leprae (Brosch, 2000; Cole, 2001; Garnier, 2003). A comprehensive examination covering all the currently sequenced genomes has not been done. A comparative genome analysis of the five available genomes from the genus Mycobacterium will enable a better understanding of the genome structure of these bacteria and the horizontal gene transfer pattern, and help to identify the species-specific genes.

In the current study, we discuss the genes present in various metabolic pathways and compare these genes across the five sequenced genomes of Mycobacterium. The discussion is divided into the following sections: (1) genome features; (2) energy metabolism; (3) amino acid biosynthesis; (4) biosynthesis of cofactors, prosthetic groups; (5) degradation of carbon compounds, amino acids and amines; (6) central intermediary metabolism; (7) lipid metabolism; (8) PE and PPE gene families; (9) macromolecule metabolism and degradation; (10) regulatory genes; and (11) insights into pathogenesis.

The analysis is based on the annotations of these genomes and any errors in the annotation will be reflected in the results and discussion presented here.

Genome features

Mycobacteria typically have GC-rich sequences: the GC content of M. tuberculosis and M. bovis is around 65%; it is higher (69.3%) in M. avium ssp. paratuberculosis and lower (57.7%) in M. leprae. Mycobacterium avium ssp. paratuberculosis has a slightly higher percentage (91.5%) of the genome encoding proteins compared with M. tuberculosis (90.9%) and M. bovis (90.5%), whereas only half (49.6%) of the genome encodes functional proteins in M. leprae. The genomic features of these species are summarized in Table 1. There are also variations in the gene complement. Mycobacterium tuberculosis and M. bovis have ∼3900 genes encoding proteins, M. avium ssp. paratuberculosis has 4350 genes and M. leprae has only ∼1650 genes encoding functional proteins. The variations in the gene complement are reflected in the genes of lipid metabolism, PE/PPE gene family, insertion sequence (IS) elements and hypothetical proteins (Fig. 1). Mycobacterium avium ssp. paratuberculosis has an increased redundancy in the genes involved in lipid metabolism probably resulting in a more robust cell wall compared with the other species owing to its colonization of the ruminant intestine, whereas M. leprae has evolved by having a minimal gene set for most of the pathways (Cole, 2001; Vissa & Brennan, 2001; Li, 2005). There are variations in the number of insertion sequences among these species with M. tuberculosis having 148 genes belonging to insertion sequences and phages compared with 107 of M. bovis, 87 of M. avium ssp. paratuberculosis and only two functional and 26 truncated copies of transposases of M. leprae (Fig. 1). The higher number of insertion element and phage-related genes might indicate greater intraspecies variability in M. tuberculosis compared with the other species.

View this table:
Table 1

Genome features of mycobacterial species

Mycobacterium tuberculosis H37RvMycobacterium tuberculosis CDC1551Mycobacterium bovisMycobacterium lepraeMycobacterium avium ssp. paratuberculosis
Genome size (bp)4 411 5294 403 8364 345 4923 268 2034 829 781
Protein coding genes39274186392016044350
GC (mol%)65.665.665.657.769.3
Average protein length339317334336338
Protein coding (%)90.990.790.549.691.5
Figure 1

(a) Variations in lipid metabolism, phages/IS elements, regulatory and PE/PPE/PGRS gene families in Mycobacterium species. (b) Variations in the number of hypothetical proteins in Mycobacterium species.

Comparison of the proteins across the five genomes has revealed a common backbone of 1326 proteins (Appendix S1). The genetic closeness of these species is evident from the presence of a higher number of shared genes (estimated at 219) compared with the mycobacterial core (Charlebois & Dolittle, 2004; Marmiesse, 2004). Mycobacterium tuberculosis and M. bovis share about 3700 genes between them, whereas they share a relatively lower number of genes (about 2600) with M. avium ssp. paratuberculosis (Table 2a). The genetic closeness of M. tuberculosis and M. bovis is also reflected in gene order, which is highly conserved in these two species; by contrast, there are many genomic rearrangements in M. avium ssp. paratuberculosis and M. leprae compared with M. tuberculosis (Fig. 2). Comparison of the protein sequences of the five organisms based on blastp analysis (Altschul, 1997) identified 26, 47, 414, 155 and 966 unique proteins in M. tuberculosis H37Rv, M. bovis, M. tuberculosis CDC 1551, M. leprae and M. avium ssp. paratuberculosis respectively (Table S1). The presence of a higher number of unique proteins, especially in M. avium ssp. paratuberculosis, might possibly indicate that in spite of a pattern of reductional evolution (Cole, 1998, 2001), these genomes might have also gained some additional genes during the course of their evolution (Koonin, 2001; Kinsella, 2003; Krzywinska, 2004; Nakamura, 2004). It should be noted that these genes are only ‘unique’ with respect to the other sequenced members of the genera and may be present in other bacteria or in other members of the genus that have not yet been sequenced. The number of unique genes includes some duplicate genes and portions of disrupted genes. When a similar analysis was performed using tblastn, the number of unique proteins was reduced to six, eight, 122, 149 and 872 in M. tuberculosis H37Rv, M. bovis, M. tuberculosis CDC 1551, M. leprae and M. avium ssp. paratuberculosis, respectively (Table 2b; Table S1). In M. avium ssp. paratuberculosis and M. leprae, the number of genes is similar to those based on genome annotation, although the number of unique proteins is reduced in M. tuberculosis H37Rv, M. bovis and M. tuberculosis CDC 1551. This difference is largely a reflection of the use of different genome annotation programs for M. tuberculosis H37Rv (Krogh, 1994) and M. tuberculosis CDC 1551 (Salzberg, 1998). Note that some homologous genes might be present but fail the criteria that the expected value cutoff is less than 10−20 (the genes are evolutionarily distant) or have lengths that differ by more than 65% (the genes are truncated or fused). The unique genes are listed in Table S1. The unique genes mostly encode hypothetical proteins and do not enter further into the following discussions.

View this table:
Table 2a

Shared and unique genes among mycobacterial species*

Mycobacterium tuberculosis H37RvMycobacterium tuberculosis CDC1551Mycobacterium bovisMycobacterium lepraeMycobacterium avium ssp. paratuberculosis
Mycobacterium tuberculosis H37Rv263644373213962624
Mycobacterium tuberculosis CDC 15513644417363313842609
Mycobacterium bovis373236334713942622
Mycoacterium leprae1396138413941551366
Mycobacterium avium ssp. paratuberculosis2624260926221366966
  • Numbers in bold indicate the genes uniquely present in the corresponding genome.

  • * e-20, 35% length, using whole genome sequences.

Figure 2

Gene order comparison of Mycobacterium tuberculosis with Mycobacterium bovis, Mycobacterium avium ssp. paratuberculosis and Mycobacterium leprae. Gene homologs were determined by reciprocal blast hits and each match is represented by a dot.

View this table:
Table 2b

Shared and unique genes among mycobacterial species*

Mycobacterium tuberculosis H37RvMycobacterium tuberculosis CDC1551Mycobacterium bovisMycobacterium lepraeMycobacterium avium ssp. paratuberculosis
Mycobacterium tuberculosis H37Rv63827380213962878
Mycobacterium tuberculosis CDC 15513827122381013962840
Mycobacterium bovis38023810813972851
Mycoacterium leprae1396139613971491393
Mycobacterium avium ssp. paratuberculosis2878284028511393872
  • Numbers in bold indicate the genes uniquely present in the corresponding genome.

  • * e-20, 35% length, using whole genome sequences.

Energy metabolism

All the genomes have a functional glycolytic pathway and tricarboxylic acid (TCA) cycle. Mycobacterium tuberculosis and M. bovis have an additional carbohydrate kinase gene (pfkB) that is absent in M. leprae and M. avium ssp. paratuberculosis. The substrate specificity of pfkB in Mycobacterium is unknown; however, the presence of this gene as part of an operon with the genes for universal stress protein A, a histidine kinase and an uncharacterized phosphoribosyltransferase suggests that pfkB is a stress-induced gene. The absence of pfkB might indicate a differential response to stress in M. leprae and M. avium ssp. paratuberculosis. Mycobacterium avium ssp. paratuberculosis has a metal-independent form (class I) of fructose-bisphosphate aldolase (fba) whereas the other species have a Zn-dependent form (class II) of fba (Marsh & Lebherz, 1992). The presence of a class II fba in other actinobacterial genomes and the similarity of the M. avium ssp. paratuberculosis fba with a proteobacterial fba gene (69% identity with fba gene from Roseovarius nubinhibens) might indicate that M. avium ssp. paratuberculosis has acquired a copy of this gene by nonorthologous gene displacement. As the functions of both the classes of genes are interchangeable (Koonin & Galperin, 2003) this may not lead to any physiological differences between the species.

Mycobacterium tuberculosis CDC 1551, M. bovis and M. avium ssp. paratuberculosis have two isocitrate lyase homologues (icl, aceA), whereas M. tuberculosis H37Rv and M. leprae have only one functional copy of the isocitrate lyase gene. The icl gene is totally absent in M. leprae whereas aceA is split and nonfunctional in M. tuberculosis H37Rv (Cole, 1998, 2001; Honer Zu Bentrup, 1999). Isocitrate lyase is an essential anapleurotic enzyme of the glyoxylate cycle responsible for the growth of mycobacteria on acetate and palmitate and survival in the microaerophilic conditions inside the host (Rosenkrands, 2000; Li, 2002). The presence of a single copy of this gene in M. tuberculosis and M. leprae might result in reduced virulence and survival inside macrophages. Moreover, the presence of a single copy of this gene in these species would make it an attractive antimycobacterial drug target as knocking-out of both the icl homologues leads to the rapid elimination of mycobacteria from lungs (Munoz-Elias & McKinney, 2005).

All the genes involved in aerobic respiration are conserved in M. tuberculosis, M. bovis and M. avium ssp. paratuberculosis whereas most are either lost or reduced to pseudogenes in M. leprae (Table S2). The pyruvate carboxylase in M. tuberculosis and M. bovis is replaced by phosphoenol pyruvate (PEP) carboxylase in M. leprae and M. avium ssp. paratuberculosis. In the absence of the NADH oxidase operon in M. leprae, PEP carboxylase might help in oxidation of NADH by converting PEP to fumarate or malate, whereas it might provide M. avium ssp. paratuberculosis with an additional option to produce ATP. In the case of anaerobic respiration, the essential genes nirA and cysH encoding nitrate and phosphate reductases, respectively, are duplicated in M. avium ssp. paratuberculosis whereas the narX gene encoding a nitrate reductase is absent. A closer look at the duplicated genes (nirA, cysH) indicated that these genes are flanked by insertion elements probably indicating a recent duplication of these genes mediated by insertion elements. Whereas the genes frdB and frdC encoding proteins responsible for interconversion of fumarate and succinate are fused to form a single gene (frdBC) in M. bovis, frdBCD are absent in M. avium ssp. paratuberculosis (Fig. 3). The fumarate reductase complex, frdABCD, functions as an anaerobic phosphorylative electron transport chain in bacteria and plays a major role in metabolism of M. tuberculosis under starvation (Betts, 2002). The absence of frdBCD genes along with a nitrate reductase gene (narX), which is up-regulated during anaerobiosis in M. tuberculosis (Hutter & Dick, 1999), might possibly indicate a different mechanism of survival for M. avium ssp. paratuberculosis under anaerobic conditions. The fusion of frdB and frdC genes appears to have no affect on M. bovis as it is still able to use fumarate reductase as an electron acceptor. The fdhF gene encoding formate dehydrogenase and the fdxB gene encoding a protein involved in electron transport in M. avium ssp. paratuberculosis have low sequence similarity with the genes of other mycobacterial species. fdhF is longer than its corresponding gene in other species and fdxB is shorter. There is an additional flavin adenine dinucleotide (FAD) binding domain in the fdhF gene of M. avium ssp. paratuberculosis, probably suggesting the dual function for this protein, whereas the fdxB gene in M. avium ssp. paratuberculosis has lost the NADH and FAD binding domains that are present in other mycobacteria but has an intact iron–sulfur domain. This might possibly indicate an alternate mechanism of action for these proteins resulting in an alternate survival strategy for M. avium ssp. paratuberculosis under anaerobic conditions as both genes are essential for the decomposition of formic acid under anaerobic conditions.

Figure 3

Comparison of frdABCD genes in mycobacterial species (top to bottom): Mycobacterium tuberculosis, Mycobacterium bovis, Mycobacterium avium ssp. paratuberculosis, Mycobacterium leprae) as seen in an ACT (Carver, 2005) genome browser. The ACT browser gives a comparative view of the genomes based on homology. Homologous regions between genomes are shown by red vertical lines. The white and blue rectangular boxes labeled a represent the genes on the positive strand, whereas the white and blue rectangular boxes labeled b represent the genes on the negative strand. The numbers labeled c represent the scale bar for genome length and the intergenic region is shown as gray regions labeled d. The light blue boxes labeled E represent repeat regions. The frdABCD operon (shown by a black oval) is present in M. tuberculosis and M. bovis, whereas it is absent in M. avium ssp. paratuberculosis and M. leprae. There is also variation in the membrane proteins in the four genomes. While M. bovis has two genes, mmpS6 and mmpL6 (shown by pink oval), M. tuberculosis has only mmpL6 and it is shorter than its corresponding gene in M. bovis. Mycobacterium avium ssp. paratuberculosis and M. leprae do not have frdABCD, mmpS6 and mmpL6 genes.

Amino acid biosynthesis

Most of the genes involved in amino acid biosynthesis are highly conserved across all the species, emphasizing their role as essential genes. The genes in the glutamate, histidine and aromatic amino acid family biosynthesis pathway are conserved in all the genomes with the exception of M. leprae, which has lost some of these genes. The three essential genes of the aspartate family, asnB, dapA and lysA, are duplicated in M. avium ssp. paratuberculosis (Table S3). The gene dapA encodes dihydropicolinate synthase, which converts l-aspartate semialdehyde (ASA) to dihydropicolinate (DHDP) in one of the precursor steps in the formation of diaminopimelate (DAP) (Cirillo, 1994; Pavelka & Jacobs, 1996), which in turn is a precursor for lysine biosynthesis, whereas lysA encodes DAP decarboxylase involved in the final step of lysine biosynthesis converting DAP to lysine (Gokulan, 2003). The presence of duplicate copies of lysA and dapA might indicate an increased need for lysine in M. avium ssp. paratuberculosis owing to its role in cell-wall biosynthesis (Strominger, 1962). Increased lysine coupled with a higher number of lipid metabolism genes in M. avium ssp. paratuberculosis might result in a more robust cell wall that would lead to an enhanced protection against the acidic conditions prevailing in the ruminant gut. Additionally, it is also possible that M. avium ssp. paratuberculosis, in the absence of mycobactin, might use lysine as a precursor for the synthesis of siderophores in vivo as is seen in some Streptomyces species (Schupp, 1988). The duplicate copy of asnB, a natural anitbiotic resistance gene, might lead to increased levels of antibiotic resistance in M. avium ssp. paratuberculosis (Ren & Liu, 2006). The gene cysA2 encoding a thiosulfate sulfurtransferase is duplicated in M. tuberculosis H37Rv whereas both M. tuberculosis and M. bovis have three copies (cysM, cysM2, cysM3) of the gene encoding cystathione β-synthase. These genes might help M. tuberculosis and M. bovis in the low-oxygen environments within the macrophages as thiosulfate sulfurtransferases have been found to have a role in the assembly of the iron–sulfur clusters that act as biosensors for oxygen and iron concentrations (Unden, 1995; Florczyk, 2001). Mycobacterium avium ssp. paratuberculosis and M. leprae have a single copy of the gene (cysM2) encoding cystathione β-synthase and also the glyA gene encoding serine hydroxymethyltransferase, which is present as two copies (glyA, glyA2) in M. tuberculosis and M. bovis. Mycobacterium tuberculosis might have acquired a duplicate copy of the gene given the major role of this gene in cell physiology (Chaturvedi & Bhakuni, 2003). The gene ilvG, encoding acetolactate synthase involved in isoleucine and valine biosynthesis, is disrupted and nonfunctional in M. avium ssp. paratuberculosis owing to a frameshift. However, it has a functional copy of the gene ilvX, which performs a similar function.

Biosynthesis of cofactors, prosthetic groups and carriers

The genes involved in folic acid, pantothenate, pyridoxine and thiamine biosynthesis are highly conserved in all the species. Mycobacterium tuberculosis and M. bovis have some genes that are duplicated whereas M. avium ssp. paratuberculosis and M. leprae have evolved by having minimal genes required in all the pathways, indicating a metabolic streamlining. The genes bioF2, involved in the biosynthesis of biotin, ribA, involved in riboavin biosynthesis, ggtB, involved in glutathione degradation, and idsB and grcC2, involved in terpenoid biosynthesis, are duplicated in M. tuberculosis and M. bovis. The bisC gene encoding biotin sulfoxide reductase (Pierson & Campbell, 1990) and trxA encoding a thioredoxin are absent in M. avium ssp. paratuberculosis and M. leprae. As a result of the loss of the bisC gene, M. avium ssp. paratuberculosis and M. leprae may not be able to utilize biotin sulfoxide as a source for biotin biosynthesis whereas the absence of the thioredoxin gene might lead to variations in pathogenesis (Wieles, 1995). The gene cobL encoding a methyl transferase involved in cobalamine biosynthesis is disrupted in M. bovis whereas it is intact in M. tuberculosis and M. avium ssp. paratuberculosis. There are significant differences in the genes involved in the biosynthesis of molybdopterin. Two gene clusters involved in the biosynthesis of molybdopterin are conserved in the two strains of M. tuberculosis. One of the gene clusters consisting of genes moaA, moaB, moaC and moaD is absent in M. avium ssp. paratuberculosis, only two genes (moaB and moeA) are functional in M. leprae, whereas M. bovis has a nonfunctional copy of moaC2 and has lost the gene moaE, but has two additional genes in moaA3 and moaB3 (Table S4). The variations in the genes encoding molybdopterin, a cofactor required for nitrate reductase activity, might lead to differences in the nitrate reductase activities in these species (Bertero, 2003). Additionally, the loss of most of these genes along with the nitrate reductase gene cluster narGHJI in M. leprae indicates its inability to use nitrate as the final electron acceptor under anaerobic conditions.

Degradation of carbon compounds, amino acids and amines

There are certain variations in the preference for carbon compounds among these species. The fundamental difference between M. tuberculosis and M. bovis is the inability of M. bovis to make pyruvate when glycerol is used as the sole carbon source (Garnier, 2003; Hewinson, 2006). The genome sequence indicates that all the genes of M. bovis encoding proteins required for the formation of pyruvate are nonfunctional. The pyruvate kinase gene (pykA) useful for the conversion of PEP to pyruvate is a pseudogene (Keating, 2005). The gene ald, encoding l-alanine dehydrogenase, is also a pseudogene in the case of M. bovis, blocking the conversion of alanine to pyruvate. Additionally, M. bovis cannot utilize glycerol as a carbon source to form pyruvate owing to the disruption of the glycerol kinase (glpK) and also the ugpA involved in the import of glyceraldehyde-3-phosphate (Garnier, 2003). The disruption in M. tuberculosis of galT encoding an enzyme involved in the conversion of alpha-d-galactose-1-phosphate to UDP-galactose indicates its inability to use galactose as a precursor for lactose biosynthesis. The gene galE2 encoding an epimerase that converts UDP-glucose to UDP-galactose is nonfunctional as a result of being disrupted in M. avium ssp. paratuberculosis, whereas the gene gabD1 encoding a dehydrogenase involved in the 4-aminobutyrate degradation pathway is totally absent. This may not affect functionality in this species as it has the genes galE1 and gabD2 encoding proteins that perform functions similar to galE2 and gabD1, respectively (Table 3). The important feature of the amino acid degradation pathway in M. avium ssp. paratuberculosis and M. leprae is the absence of the urease operon consisting of ureA, ureB, ureC, ureD, ureF and ureG. This indicates a low preference by M. avium ssp. paratuberculosis for ammonia as a nitrogen source probably as a result of the lower levels of ammonia in the intestine than in the lungs and might lead to differences in colonization and host–pathogen interactions (Clemens, 1995; Burne & Chen, 2000). Two separate genes (rocD1, rocD2) encode ornithine aminotransferase in M. bovis, M. leprae and M. tuberculosis H37Rv whereas they are fused to form a single gene (rocD1) in M. avium ssp. paratuberculosis and M. tuberculosis CDC 1551.

View this table:
Table 3

Variation in genes involved in the degradation of carbon compounds, amino acids and amines*

Mycobacterium tuberculosis H37RvMycobacterium tuberculosis CDC1551Mycobacterium lepraeMycobacterium bovisMycobacterium avium ssp. paratuberculosis
galT′, galTAbsentAbsentgalTgalT
galE2MT0560AbsentgalE3MAP4031,4032
gabD1MT1772AbsentgabD2Absent
glpKMT3798glpKglpKa,glpKbglpK
pykAMT1653pykAPseudogenepykA
ugpAMT2901AbsentugpAa,ugpAbAbsent
ureABCFGDMT1896-1901AbsentureABCFGDAbsent
aldMT2850aldalda,aldbald
rocD2, rocD1MT2384.1*ML1773,1774rocD2,rocD1rocD1*
  • Genes in bold type indicate multiple copies; genes in italics indicate split nonfunctional genes.

  • * For a complete list of 46 genes see Table S5.

Central intermediary metabolism

There are minor differences in the genes encoding the proteins involved in central intermediary metabolism. Mycobacterium avium ssp. paratuberculosis lacks the gene glpQ2, encoding an esterase involved in the synthesis of glycerol-3 phosphate, and sdaA, encoding a serine dehydratase involved in the conversion of serine to pyruvate. However, this might not have any effect as both losses seem to be compensated for by the presence of glpQ1 and ilvA, which perform functions similar to glpQ2 and sdaA, respectively. All the taxa under study have the genes rmlABCD encoding proteins that are essential for the synthesis of rhamnose and the gene wbbL encoding a transferase that mobilizes rhamnose to the cell wall. In addition, all the organisms seem to have additional copies for some of the genes (rmlA2, rmlB2) in this pathway perhaps because rhamnose is an essential ingredient of the mycobacterial cell wall and there is no salvage pathway for rhamnose biosynthesis (Ma, 2001). The gene epiA encoding a nucleotide-sugar epimerase is absent in M. bovis, and epiB is nonfunctional in M. avium ssp. paratuberculosis. Mycobacterium tuberculosis has two genes (gca, gmdA) encoding mannose dehydratase, M. avium ssp. paratuberculosis has only gmdA whereas M. bovis has only gca (Table 4). The genes atsB, atsD, atsF and atsH encoding arylsulfatases are absent in M. avium ssp. paratuberculosis and M. leprae whereas atsA is nonfunctional in M. bovis. The absence of the arylsulfatase genes in M. avium ssp. paratuberculosis and M. leprae coupled with the presence of minimal genes encoding cystathione β-synthase and thiosulfate sulfurtransferases might indicate a paucity of sulfated glycolipids in these species that will possibly result in differential host–pathogen interactions and reduced tolerance to stress (Mougous, 2002). The genes involved in purine and pyrimidine nucleotide biosynthesis are highly conserved in all the species with an exception of two genes, purT (absent in M. leprae) and purU (absent in M. avium ssp. paratuberculosis and M. leprae). The absence of purU might suggest that M. avium ssp. paratuberculosis and M. leprae will not be able to use N-formyl derivatives as precursors for the formation of formate. Most of the genes in the 2′-deoxyribonucleotide metabolism and nucleotide and nucleoside salvage pathways are either reduced to pseudogenes or lost in M. leprae whereas they are conserved in all the other species. The treX gene encoding a protein involved in trehalose metabolism is duplicated in M. avium ssp. paratuberculosis whereas the maltooligosyltrehalose synthase gene (treY) is disrupted in M. bovis. As a result, M. bovis may not be able to use glycogen as a precursor for the biosynthesis of trehalose, but this will not be critical as it has the genes encoding the enzymes of other two pathways (De Smet, 2000).

View this table:
Table 4

Variation in genes involved in central intermediary metabolism*

Mycobacterium tuberculosis H37RvMycobacterium tuberculosis CDC1551Mycobacterium lepraeMycobacterium bovisMycobacterium avium ssp. paratuberculosis
glpQ2MT0332AbsentglpQ2Absent
sdaAMT0075sdaAsdaAAbsent
rmlB3MT3574AbsentrmlB2Absent
wbbl2MT1576AbsentwbbL2Absent
epiAMT1562AbsentAbsentepiA
epiBMT3893AbsentrfbBAbsent
gcaMT0121AbsentgcaAbsent
gmdAMT1561AbsentAbsentgmdA
atsAMT0738PseudogeneatsAb,atsAaatsA
atsBMT3398AbsentatsBAbsent
atsDMT0692PseudogeneatsDAbsent
atsFMT3162AbsentMb3104Absent
atsHMT3903AbsentMb3825Absent
cysQMT2189ML1301cysQcysQ1, cysQ2
sseC, sseC2MT3200ML2199sseC2sseC
purUMT3041AbsentpurUAbsent
nrdFMT2033AbsentnrdF1Absent
nrdZMT0596AbsentnrdZAbsent
addAbsentaddaddadd
glgYMT1614PseudogenetreYa, treYbglgY
glgXMT1615PseudogenetreXglgX1, glgX2
  • Genes in bold type indicate multiple copies; genes in italics indicate split nonfunctional genes.

  • * For a complete list of 46 genes see Table S5.

Lipid metabolism

Mycobacteria have a diverse array of molecules responsible for lipid metabolism. There are about 250 enzymes involved in this pathway, including homologs of those found in plants and animals. Similar to the pathways discussed above, most of the genes of M. leprae are either lost or reduced to pseudogenes, whereas M. avium ssp. paratuberculosis has a higher redundancy of genes in this pathway compared with M. tuberculosis (Li, 2005). Because the cell wall is an interface between the pathogen and the host, the differences in the lipids constituting the cell wall might reflect the variations in pathogenesis among these species.

Fatty acid biosynthesis

There are about 65 genes involved in the biosynthesis and modification of fatty acids, with the genes of mycolic acid biosynthesis conserved in all the species. Five genes consisting of three essential genes, fabG2, accD4 and kasB, and two nonessential genes, fabG3 and fabG5 (Sassetti, 2003), are present as duplicates in the fatty acid biosynthesis pathway of M. avium ssp. paratuberculosis whereas the other taxa have a single copy (Table 5). The presence of multiple copies of these genes could be a possible mechanism to increase virulence as it has been demonstrated that mycobacteria can produce unique complex lipids by the combined action of fatty acid synthases and polyketide synthases (Kolattukudy, 1997). Mycobacteria have multiple copies of fabG (fabG1fabG5), a gene encoding a β-ketoacyl carrier protein reductase that catalyses the first of the two reductive steps of the fatty acid synthesis cycle (Marrakchi, 2002). The presence of multiple copies of this gene might contribute to the virulence of mycobacteria (Banerjee, 1998). However, it will be interesting to see if all the copies of these genes are functional as it was recently demonstrated in Lactococcus lactis that only one of the two existing copies of fabG is actually functional (Wang & Cronan, 2004). The fatty acid biosynthesis genes fabG1 and inhA, responsible for isoniazid resistance, are part of an operon in the case of M. tuberculosis and M. bovis, but they are expressed separately in the case of M. avium ssp. paratuberculosis, much as in M. avium and Mycobacterium smegmatis (Banerjee, 1998). The duplication of accD4, an essential gene of M. tuberculosis (Sassetti, 2003) that is involved in mycolic acid biosynthesis (Gande, 2004), might possibly indicate some additional and specific carboxylations in M. avium ssp. paratuberculosis that are probably involved in the synthesis of unusual lipids, whereas the duplication of kasB, a gene responsible for isoniazid resistance (Slayden & Barry, 2002), and desA3, a target for antimicrobial drug isoxyl (Phetsuksiri, 2003), might lead to an increased resistance of M. avium ssp. paratuberculosis against isoniazid and isoxyl. The gene Rv3472 encoding a hypothetical protein in M. tuberculosis H37Rv, the gene acpA encoding an acyl carrier protein, the gene Rv0914 encoding a lipid carrier protein and the gene cdh encoding a protein involved in phospholipid biosynthesis are absent in M. avium ssp. paratuberculosis. The genes Rv2261 and Rv2262 encoding hypothetical proteins in M. tuberculosis are fused to form a single protein in M. bovis and M. tuberculosis CDC 1551, whereas the corresponding genes are absent in M. avium ssp. paratuberculosis and M. leprae. These genes might possibly have some role in pathogenesis, as they are specifically present in TB-causing species.

View this table:
Table 5

Variation in genes of fatty acid metabolism and degradation*

Mycobacterium tuberculosis H37RvMycobacterium tuberculosis CDC1551Mycobacterium lepraeMycobacterium bovisMycobacterium avium ssp. paratuberculosis
fabG2MT1398PseudogenefabG2fabG2_1,fabG2_2
fabG3MT2058AbsentfabG3fabG3_1,fabG3_2
fabG5MT2836PseudogeneMb2788cfabG5_1,fabG5_2
accD4MT3906accD4accD4accD4_1,accD4_2
fabG1MT1530fabG1fabG1fabG1
desA3MT3326PseudogeneMb3258cdesA3_1,desA3_2
Rv3472MT3578AbsentMb3501Absent
Rv0033MT0038AbsentacpAAbsent
Rv0914cMT0939PseudogeneMb0938cAbsent
CdhMT2346ML1417CdhAbsent
Rv2261c, 2262cMT2322PseudogeneMb2285cAbsent
choDMT3517choDchoDchoD
fadHMT1212PseudogenefadHAbsent
echA18,echA18′AbsentAbsentechA18Absent
fadD11′, fadD11MT1600AbsentfadD11fadD11_1,fadD11_2
fadE22MT3147PseudogenefadE22a, fadE22bfadE22
fadB3MT1754AbsentFad3a, fad3bAbsent
echA3MT0660PseudogeneechA3echA3
Rv1136,1137MT1169.1AbsentMb1168,1169echA1_1
echA1MT0232AbsentAbsentechA1_2
  • Genes in bold type indicate multiple copies; genes in italics indicate split nonfunctional genes.

  • * For a complete list of 153 genes see Table S5.

Fatty acid degradation

Genes involved in fatty acid degradation (fad genes) are fairly conserved in both strains of M. tuberculosis but show many differences with respect to M. bovis or M. avium ssp. paratuberculosis. The cholesterol oxidase gene, choD, is truncated in M. avium ssp. paratuberculosis whereas fadH is absent. The two genes echA18 and fadD11 are disrupted in M. tuberculosis. The gene fadD11 is present in duplicate in M. avium ssp. paratuberculosis whereas echA18 is absent. The genes fadD27, fadE22 and fadB3 are disrupted in M. bovis, whereas the gene fadE27 is disrupted in M. avium ssp. paratuberculosis. Mycobacterium avium ssp. paratuberculosis has lost around 19 genes that are present in M. tuberculosis but has gained about 35 additional genes in the form of duplicates of existing genes. In a recent report on M. tuberculosis, it was shown that some of the fatty acid degradation (fadD) genes belong to a new class of fatty acyl-AMP ligases (FAALs) and that these fadD gene products can combine with pks gene products in various ways to form complex hybrid metabolites (Trivedi, 2004). The presence of multiple copies of various fadD genes in M. avium ssp. paratuberculosis might be seen as an adaptation of this organism to increase virulence and resistance by producing diverse metabolites compared with M. tuberculosis or M. bovis. The enoyl-CoA hydratase gene, echA3, in M. avium ssp. paratuberculosis is longer than in the other species owing to an insertion before the gene in the same reading frame. Mycobacterium avium ssp. paratuberculosis has two copies of the gene echA1_1 and echA1_2; whereas echA1_2 is homologous to echA1 of M. tuberculosis, it is absent in M. bovis. The second copy of the gene, echA1_1, is split into two halves in M. bovis and M. tuberculosis. echA1 is absent in M. leprae.

Cell envelope

There are variations in the gene products constituting the cell envelope among the five taxa studied. These variations might be due to the fact that the membrane proteins constitute important components of the mycobacterial cell wall and have a role in virulence and host specificity (Daffe & Etienne, 1999; Barry, 2001). All the genomes have the genes (fbpA, fbpB, fbpC1, fbpC2) belonging to the antigen 85 complex (Ag85). The Ag85 complex genes encode secreted proteins that have a role in pathogenesis and also catalyse the transfer of mycolates, leading to the formation of mycolated cell wall products such as α,α′-trehalose monomycolate (TMM) and α, α′-trehalose dimycolate (TDM) (Belisle, 1997; Kremer, 2002). The genes mmpL1 and mmpL9 encoding transmembrane proteins are split and appear to be nonfunctional in M. bovis, whereas they are totally absent in M. avium ssp. paratuberculosis. The gene mmpL13 is disrupted in M. tuberculosis (Table 6). The deletion of a genomic region leads to the loss of the gene mmpS6 and shortening of mmpL6 in M. tuberculosis as compared with M. bovis, whereas both these genes are absent in M. avium ssp. paratuberculosis and M. leprae. Differences in the mmpL genes might have some effect on the transport of lipids to the membrane as some of these genes exist in close proximity with the polyketide synthesis (pks) genes and are involved in the transport of lipids produced by its proximal pks gene (Tekaia, 1999). Interestingly, in the case of the M. bovis mmpL1 gene, even the downstream pks gene, pks6, is nonfunctional, indicating that the corresponding lipid produced by pks6 may not be required by M. bovis. However, the entire gene cluster is absent in M. avium ssp. paratuberculosis and M. leprae. Mycobacterium avium ssp. paratuberculosis also lacks another gene cluster, consisting of mmpL7, implicated in virulence in M. tuberculosis (Camacho, 1999, 2001; Jain & Cox, 2005). The mmpL8 and pks2 genes are also absent in M. avium ssp. paratuberculosis and M. leprae. These two genes are involved in sulpholipid biosynthesis, thereby contributing to virulence in M. tuberculosis (Converse, 2003). All the regions appear to be precisely deleted in M. avium ssp. paratuberculosis and M. leprae, clearly suggesting alternative virulence factors. A similar case occurs with mmpS6 and mmpL6 genes where the corresponding region along with the upstream region containing frdBCD genes is lost in M. avium ssp. paratuberculosis and M. leprae. However, M. avium ssp. paratuberculosis has seven copies of the gene mmpL4. The presence of mmpL4_2 and mmpL4_3 downstream of drrABC genes that encode proteins responsible for daunorubicin resistance suggests that these might encode membrane proteins that help in the efflux of this antibiotic. There is also a high degree of variation in the genes coding for lipoproteins that make up the cell envelope. Similar to the conserved membrane proteins, many of these are missing in M. avium ssp. paratuberculosis and M. leprae as compared with M. tuberculosis and M. bovis. While the gene lprM is absent in M. bovis, lprP, lprQ and lprR are present only in M. bovis. Whereas the M. bovis genes lpqG and lpqL are smaller than the corresponding genes in other mycobacteria, the genes lppS and lpqL are duplicated in M. avium ssp. paratuberculosis. Mycobacterium leprae has only a few functional lipoprotein genes. The presence of extensive variation in the cell envelope genes might lead to major differences in the virulence of these organisms.

View this table:
Table 6

Variation in genes related to cell envelope*

Mycobacterium tuberculosis H37RvMycobacterium tuberculosis CDC1551Mycobacterium lepraeMycobacterium bovisMycobacterium avium ssp. paratuberculosis
mmpL1MT0412AbsentmmpL1a,mmpL1bAbsent
mmpL2MT0528AbsentmmpL2mmpL2
mmpL4MT0466ML2378mmpL4mmpL4_1- mmpL4_7
mmpL6MT1608ML2378mmpL6Absent
mmpL7MT3012mmpL7mmpL7Absent
mmpL8MT3931AbsentmmpL8Absent
mmpL9MT2402AbsentmmpL9a,mmpL9bAbsent
mmpL12MT1573AbsentmmpL12Absent
mmp13a, mmpL13bMT1179.1ML0971, ML0972mmpL13Absent
AbsentMT1802AbsentmmpL14Absent
mmpS1MT0415AbsentmmpS1,mmpS6mmpS1
mmpS2MT0527AbsentmmpS2Absent
lprMMT2022AbsentAbsentlprM
lprLMT0623AbsentAbsnetlprL
lpqLMT0432AbsentlpqLlpqL_1,lpq_2
lppSMT2594lppSlppSlppS_1,lppS_2
  • Genes in bold type indicate multiple copies; genes in italics indicate split nonfunctional genes.

  • * For a complete list of 153 genes see Table S5.

Polyketide synthesis

There are many differences in the genes encoding polyketide synthases (pks) among the five organisms studied. Great variation occurs between M. tuberculosis and M. bovis, whereas many of the genes are lost in M. leprae and M. avium ssp. paratuberculosis. The gene pks6 is split in M. bovis, and the genes pks3 and pks1 in M. bovis are fused genes with respect to M. tuberculosis. The gene pks3 is a result of the fusion of pks3 and pks4 of M. tuberculosis, whereas pks1 is a fusion of pks1 and pks15. The genes pks1 and pks15 are absent in M. avium ssp. paratuberculosis, whereas the genes pks3 (MAP2604) and pks4 (MAP2603) are of a different length than in M. tuberculosis H37Rv (Fig. 4). The splitting or fusion of these genes might have an effect on virulence under in vivo conditions. Additionally, this might explain the differences in the virulence of mycobacterial species as these gene products can combine with the fatty acid degradation (fadD) gene products in different combinations to produce a variety of hybrid metabolites (Trivedi, 2004). The gene cluster (mbtAmbtJ) encoding the biosynthetic enzymes responsible for assembly of the virulence-conferring siderophore mycobactin and essential for growth of M. tuberculosis in macrophages (Quadri, 1998; De Voss, 2000) is truncated in M. avium ssp. paratuberculosis (Li, 2005). As a result, M. avium ssp. paratuberculosis needs to be supplemented with mycobactin in any growth medium (Quadri, 1998; Li, 2005). In spite of the split in the genes mbtA and mbtB, the order of the first eight genes (mbtAmbtH) is maintained but an insertion in the genomic region between trpE2 (mbtJ) and lipK (mbtI) and another between lipK (mbtI) and mbtH have led to the disruption of the operon. It will be interesting to see if this disruption in the operon structure of M. avium ssp. paratuberculosis has any bearing on its survival in macrophages. The gene cluster constituting the genes ppsAppsE encoding the polyketide phenolpthiocerol and the gene mas encoding mycocerosic acid, which together form pthiocerol dimycocerosate (DIM), are absent in M. avium ssp. paratuberculosis. This suggests a variation in the mode of virulence in M. avium ssp. paratuberculosis compared with other mycobacteria as the pps operon was implicated to have a role in virulence in pathogenic mycobacteria (Daffe & Laneelle, 1988; Azad, 1997). The mbt operon is absent in M. leprae, whereas the virulence gene cluster comprising the pps operon is intact.

Figure 4

Diagrammatic representation of the pks3 gene in Mycobacterium species. Mycobacterial species have many variations in the genes involved in polyketide synthesis (pks), providing them with variations in virulence factors. Mycobacterium bovis and Mycobacterium leprae have a single pks3 gene (encircled by black oval) whereas it is split into two genes (pks3, pks4) in Mycobacterium tuberculosis. In Mycobacterium avium ssp. paratuberculosis, the genes pks3 and pks4 are of different sizes than in M. tuberculosis and are inverted (indicated by yellow criss-cross lines).

PE and PPE gene families

The genome of M. tuberculosis contains two large families of acidic, glycine-rich proteins, the PE and the PPE gene families. These gene families, constituting about 10% of the genome in M. tuberculosis H37Rv, are the major source of divergence between the genomes of M. tuberculosis and M. bovis, which are otherwise>99% similar (Garnier, 2003). Mycobacterium tuberculosis has around 100 genes belonging to the PE gene family, consisting of PE and PE-PGRS genes, and about 60 genes belonging to the PPE gene family. Some of the characterized genes indicate that they are expressed based on the changing microenvironments encountered by the pathogen and play an important role in survival and multiplication of mycobacteria in their chosen environment (Brennan & Delogu, 2002; Voskuil, 2004). Additionally, they represent a source of antigenic variation (Cole, 1998; Choudhary, 2003; Chakhaiyar, 2004) and might interfere with the immune responses by inhibiting antigen processing (Talarico, 2005). Mycobacterium leprae has almost no genes belonging to this family, whereas M. avium ssp. paratuberculosis has seven PE and 37 PPE genes. The absence of PE-PGRS genes in M. avium ssp. paratuberculosis might suggest a limited variation in the cell envelope proteins and altered colony morphology, as these proteins have been reported to be components of the cell envelope leading to extensive variation among mycobacteria (Espitia, 1999; Brennan, 2001; Banu, 2002; Delogu, 2004). The lack of PE-PGRS genes in M. avium ssp. paratuberculosis that are responsible for the survival of M. tuberculosis in macrophages (Ramakrishnan, 2000) might even suggest variations in the survival mechanism of M. avium ssp. paratuberculosis inside macrophages. These variations in the PE/PPE gene families might lead to predominant differences in the pathogenesis between the mycobacterial species.

Macromolecule metabolism and degradation

The genes encoding aminoacyl tRNA synthetases, ribosomal subunit proteins and proteins involved in translation are highly conserved across all the species. Mycobacterium avium ssp. paratuberculosis and M. leprae have lost a single gene, rpmB. The genes hsdS′, hsdM and mrr encoding proteins involved in DNA restriction, dinF encoding a protein involved in DNA repair and helZ coding for DNA helicase are absent in M. avium ssp. paratuberculosis (Table 7). The absence of genes for DNA restriction might possibly indicate a higher rate of gene transfer in M. avium ssp. paratuberculosis (Marri, 2006), whereas the absence of dinF and helZ might have an effect on the DNA repair mechanisms in this organism. The genes alkA and recB encoding proteins involved in DNA replication and repair are intact in the two M. tuberculosis strains. The gene alkA is split in M. bovis whereas recB is split in both M. avium ssp. paratuberculosis and M. bovis. The truncation of the M. bovis alkA gene might not be critical for the survival of M. bovis but could affect the induction of an effective DNA repair response under nitrosative stress (Durbach, 2003). However, the presence of a functional recD might suppress the defects in recombination caused as a result of the inactivation of recB (Amundsen, 2000). Mycobacterium leprae lacks most of the genes of DNA replication and repair along with the recBCD operon. Most of the genes encoding proteins involved in DNA transcription are lost in M. leprae whereas they are conserved across the other three species. Mycobacterium tuberculosis has as many as 13 genes coding for sigma factors (Manganelli, 1999). All the genes except sigM are conserved in M. bovis whereas M. leprae has only four functional genes (sigA, sigB, sigC and sigE). The M. avium ssp. paratuberculosis sigC gene is longer than the corresponding gene in other mycobacterial species due to an insertion before the gene, whereas the sigK gene is absent (Table 7). As sigC regulates the expression of virulence-related genes in M. tuberculosis (Sun, 2004), it will be interesting to see if the M. avium ssp. paratuberculosis sigC regulates the same set of genes, as this might lead to variation in pathogenesis. The duplication of sigF encoding a transcription factor that controls the expression of genes responsible for mycobacterial persistence during chemotherapy might suggest an increased persistence of M. avium ssp. paratuberculosis in the host during chemotherapy (Michele, 1999).

View this table:
Table 7

Variation in genes of macromolecule metabolism and degradation*

Mycobacterium tuberculosis H37RvMycobacterium tuberculosis CDC1551Mycobacterium lepraeMycobacterium bovisMycobacterium avium ssp. paratuberculosis
rpmB2MT2118AbsentrpmB2Absent
rpmBMT0114AbsentrpmB1Absent
mrrMT2603AbsentmrrAbsent
hsdS ′MT2825AbsenthsdS ′Absent
hsdMMT2826PseudogenehsdMAbsent
dinFMT2902AbsentdinFAbsent
helZMT2160.1, MT2161AbsenthelZAbsent
alkAMT1358PseudogenealkAa,alkAbalkA
recBMT0658AbsentrecBa,recBbMAP4092c,4093c
sigCMT2129ML1448sigCMAP1814
sigFMT3385PseudogenesigFsigF_1,sigF_2
sigKMT0461PseudogenesigKAbsent
sigMMT4030PseudogenesigMa,sigMbsigM
pcpMT0334AbsentpcpAbsent
Rv0198cMT0208ML2613Mb0204cAbsent
Rv1977MT2029AbsentAbsentAbsent
Rv3883cMT3998ML0041Mb3913cAbsent
ptrBa,ptrBbMT0805ptrBptrBptrBa
plcCMT2414AbsentAbsentAbsent
plcBMT2415AbsentAbsentAbsent
plcAMT2416PseudogeneAbsentAbsent
lipSMT3265AbsentmesTa,mesTbAbsent
lipDMT1974AbsentlipDAbsent
lipFMT3591AbsentlipFAbsent
lipJMT1951PseudogenelipJAbsent
lipRMT3169AbsentlipRAbsent
Rv1105AbsentAbsentMb1135Absent
mhpEMT3575AbsentmhpEAbsent
  • Genes in bold type indicate multiple copies; genes in italics indicate split nonfunctional genes; a single gene in bold type indicates one that has a different length in other organisms.

  • * For a complete list of 190 genes see Table S5.

The genes encoding proteins involved in the degradation of DNA and RNA are highly conserved across all the species whereas the genes encoding proteins involved in the degradation of glycopeptides, polysaccharides, esterases and lipases have significant differences. The genes encoding pyrrolidone carboxyl peptidase (pcp) and Mycosin-I (Rv3883) are absent in M. avium ssp. paratuberculosis. The absence of the pcp gene might have some effect on protein folding and degradation (Kim, 2001) while the absence of mycosin, a cell-wall-associated serine protease expressed after the infection of macrophages, might lead to variations in pathogenesis (Dave, 2002). The gene ptrB encoding a protease involved in the degradation of oligopeptides is split in M. tuberculosis H37Rv whereas it is intact in other taxa. The genes plcC, plcB and plcA encoding phospholipase C are present only in M. tuberculosis strains. The presence of these genes specifically in M. tuberculosis and their role in virulence might make them potential targets for anti-TB drugs (Johansen, 1996; Raynaud, 2002). The lipS gene is split into two in M. bovis, whereas it is completely absent in M. avium ssp. paratuberculosis. The other genes of the lipid degradation pathway absent in M. avium ssp. paratuberculosis include lipD, lipf, lipJ, lipR and Rv1105, probably indicating a lesser role of lipid catabolism.

Regulatory genes

Mycobacteria are expected to have a wide array of regulatory proteins owing to the complexity of the environmental and metabolic choices for these organisms. Mycobacterium tuberculosis and M. bovis have 190 regulatory proteins, most of which are conserved between both species, whereas M. avium ssp. paratuberculosis has a higher number (235). Mycobacterium avium ssp. paratuberculosis, unlike M. tuberculosis and M. bovis, can survive in the environment outside the host and expansion of the regulatory gene repertoire could possibly help in its survival under a wide range of environmental conditions (Whittington, 2004, 2005). The gene sirR encoding an iron-dependent repressor (Gupta, 1999) is nonfunctional in M. avium ssp. paratuberculosis and M. leprae, whereas the genes virS and nadR are absent (Table 8). The virS gene encoding a bacterial virulence-regulating protein is also absent in M. smegmatis and M. avium. The absence of virS further confirms its role as a regulator of genes that differentiate the M. tuberculosis complex from other mycobacterial species (Raffaelli, 1999), whereas the possible effects of the absence of nadR, a gene encoding a bifunctional protein that has a role in the transport of nicotinamide mononucleotide, remain to be tested. Moreover, the nadR homolog in M. tuberculosis and M. bovis is shorter (323 amino acids) than the corresponding gene in Escherichia coli (417 amino acids), losing a portion of the N-terminal region, indicating a lack of repressor function (Hill, 1998). Mycobacterium avium ssp. paratuberculosis has laterally acquired a second copy of the narL gene encoding a nitrate/nitrite regulatory protein (Marri, 2006). The presence of the second copy of narL along with duplicate copies for some of the genes encoding nitrate/nitrite reductase and its inability to use ammonia might possibly indicate a preference of M. avium ssp. paratuberculosis for nitrate/nitrite as a nitrogen source. The presence of a duplicate copy of the gene oxyS might possibly make M. avium ssp. paratuberculosis more susceptible to organic hydroxyperoxide stress, as overproduction of oxyS reduces the expression levels of the gene ahpC encoding alkyl hydroperoxide reductase (Domenech, 2001). The presence of lysR, a regulatory gene of lysA (absent in M. tuberculosis and M. bovis), suggests an E. coli-like expression of lysA, regulated by lysR, in the case of M. avium ssp. paratuberculosis (in M. tuberculosis and M. bovis the expression of lysA is constitutive) (Stragier, 1983). Mycobacterium bovis and M. avium ssp. paratuberculosis have an additional copy of the gene embR encoding a transcriptional activator of embAB genes. The embAB genes encode proteins that are involved in cell-wall arabinan biosynthesis and are the target for antimycobacterial drug ethambutol (Belanger, 1996). The duplicate copy of embR might lead to an increased level of resistance in M. bovis and M. avium ssp. paratuberculosis against ethambutol. The genes Rv0600 and Rv0601 of M. tuberculosis H37Rv encoding proteins belonging to a two-component regulatory system are fused in M. tuberculosis CDC1551 whereas they are absent in M. avium ssp. paratuberculosis along with another gene, Rv2027.

View this table:
Table 8

Variation in genes involved in regulation*

Mycobacterium tuberculosis H37RvMycobacterium tuberculosis CDC1551Mycobacterium lepraeMycobacterium bovisMycobacterium avium ssp. paratuberculosis
sirRMT2858PseudogenesirRMAP2894,2895
nadRMT0222AbsentnadRAbsent
narLMT0866AbsentnarLnarL_1,narL_2
oxySMT0125.1oxySoxySoxyS_1,oxyS_2
embRMT1305AbsentembR, embR2embR_1,embR_2
Rv0600c,0601cMT0630AbsentMb0616c, 0617cAbsent
pknAMT0018pknApknApknA
pknDMT0958PseudogenepknDb,pknDapknD
pknEMT1785AbsentpknEAbsent
pknIMT2982PseudogenepknIAbsent
pknJMT2149AbsentpknJpknJ
pknKMT3165AbsentpknKAbsent
  • Genes in bold type indicate multiple copies; genes in italics indicate split nonfunctional genes; a single gene in bold type indicates one that has a different length in other organisms.

  • * For a complete list of 190 genes see Table S5.

The characteristic feature of the regulatory system of mycobacteria is the presence of eukaryotic-like Ser/Thr protein kinases (STPKs). Mycobacterium tuberculosis has 11 members of this protein kinase family which act as regulators of metabolic processes such as cell development, interaction with host cells and transcription (Av-Gay & Everett, 2000). The gene pknD is split into two in M. bovis (Peirs, 2000), whereas the gene pknH has a deletion compared with that in M. tuberculosis. Although the putative active sites in pknH are conserved in M. bovis in spite of the deletion, this might have a bearing on its specificity for substrate (Garnier, 2003). The nonfunctional pknD might affect phosphate transfer across the membrane in M. bovis, as pknD encodes a protein responsible for phosphate transfer (Av-Gay & Everett, 2000). While M. leprae has only four of the 11 STPKs, most of these are either disrupted or completely absent in M. avium ssp. paratuberculosis. The truncation of the gene pknA and loss of pknI might have an effect on the in vivo growth of M. avium ssp. paratuberculosis, as the products of these genes are involved in cell division (Chaba, 2002). This might possibly be one of the reasons for a slower growth rate of M. avium ssp. paratuberculosis. The disruption of pknH in M. avium ssp. paratuberculosis might have implications for the survival of this organism under heat or acid stress, as pknH phosphorylates a regulatory protein (product of the embR gene) involved in arabinan biosynthesis (Molle, 2003; Sharma, 2004). The genes pknE and pknK involved in membrane transport are totally absent. However, the losses in some of the STPKs appear to have been compensated for by the presence of a higher number (41 vs. 30 in M. tuberculosis H37Rv) of two-component regulator genes in M. avium ssp. paratuberculosis.

Insights into pathogenesis

The cell wall serves as the interface between pathogen and host where all initial events of infection occur. With this in mind, the primary differences in the pathogenesis between these sequenced mycobacterial species are likely attributed to variations in the proteins and lipids constituting the cell wall. Compared with M. tuberculosis and M. bovis, M. leprae has a very limited variation in the lipids of the cell wall owing to fewer genes encoding lipid biosynthesis whereas M. avium ssp. paratuberculosis has a higher number and redundancy of these genes, indicating greater genetic ability to produce variation in the cell-wall lipid composition (Cole, 1998, 2001; Garnier, 2003; Li, 2005). Moreover, the ability of mycobacteria to generate distinct lipids from various combinations of fatty acid degradation (fadD) genes and polyketide synthase (pks) genes (Trivedi, 2004) would be of further use in generating diversity. This is especially true for M. tuberculosis and M. bovis, which have a similar number of genes encoding lipid biosynthesis, yet each species produces distinct lipids, especially sulfated lipids (Brodin, 2004). However, the lipoarabinomanan (LAM) structure is the same among the species discussed herein (Rivera-Marrero 2002).

Cell-surface proteins can mediate adherence to host cells or invasion of host tissues and are key players during these initial infection events. For example, it has been demonstrated previously that MAP2121c (also known as MMP for major membrane protein) is a surface protein that plays a role in invasion of bovine epithelial cells (Nigou, 2003). This surface protein is coded for in the M. avium ssp. paratuberculosis and M. leprae genomes, but is absent in M. bovis and the two M. tuberculosis strains. One possible explanation for this distribution of MMP may lie in a shared host receptor that may be present in the Schwann's cells of the peripheral nerves, which M. leprae infects, and the intestinal epithelial cells that line the cattle gut, which M. avium ssp. paratuberculosis infects. This shared receptor, which is currently unknown, would not be present in the trachea or alveolar cells that M. bovis and M. tuberculosis are known to infect. A similar case can be made for the mmpL5 gene encoding a protein involved in lipid transport. This gene was specifically present in the cattle strains of M. avium ssp. paratuberculosis and absent in sheep strains, possibly indicating that some of these mmpL gene products could also help in host specificity (Marsh & Whittington, 2005).

Secondly, the PE/PPE/PGRS genes have been found to be a major source of antigenic difference between M. tuberculosis CDC 1551 and M. tuberculosis H37Rv (Fleischmann, 2002) and also between M. tuberculosis and M. bovis (Garnier, 2003). Some of the variations in virulence could also be attributed to the differential presence of genes in M. tuberculosis (Bannantine, 2003) or M. bovis (Constant, 2002) or differential expression of the virulence-related genes (Charlet, 2005). The variation is again limited in M. leprae, which has lost most/all of the PE/PPE/PGRS genes. Mycobacterium avium ssp. paratuberculosis has only seven PE and 32 PPE genes and it has lost the genes mmpL7, mmpL8, pks2, mas and Rv3883 along with the genes in the pps operon that were shown to have a role in virulence in M. tuberculosis (Smith, 2003). However, the paucity of PE/PPE genes and the absence of some of these virulence genes is compensated for by the acquisition of virulence factors as a result of lateral gene transfer (Paustian, 2005; Marri, 2006).

Finally, each of the genomes has about one-third of the genes encoding proteins with unknown function. Some of these proteins might also contribute to pathogenesis and host specificity.

Concluding remarks

The availability of five genome sequences of Mycobacterium species has provided a better understanding of the evolution of these species. The streamlining of most of the metabolic pathways and the presence of numerous pseudogenes in M. leprae suggests an evolutionary process towards its specialized growth in Schwann's cells, whereas an increased gene repertoire correlates with the ability of M. avium ssp. paratuberculosis to survive under diverse environmental conditions outside the host.

All the genomes have similar functional pathways for energy metabolism, amino acid biosynthesis, cofactor biosynthesis, nucleotide metabolism and macromolecule metabolism, with a majority of the genes conserved among the five genomes. Some of the key differences include the loss of the nitrate reductase, nitrite reductase, fumarate reductase, urease and NADH oxidase operons by M. leprae, resulting in curtailed growth under anaerobic and microaerophilic conditions. The loss of fumarate reductase and the urease operons in M. avium ssp. paratuberculosis indicates that nitrate is a major source of energy under anaerobic conditions. Furthermore, M. avium ssp. paratuberculosis has lost many of the genes (mmpL7, mmpL8, pks2, mas, ppsABCD) implicated to have a role in virulence in M. tuberculosis, but has acquired some additional novel virulence factors by lateral gene transfer, suggesting different pathogenic pathways. The higher redundancy in M. avium ssp. paratuberculosis of the gene products that contribute to the cell-wall structure might enhance its ability to survive in the environment of the ruminant gut. Additionally, the presence of duplicate copies of sigF (a gene responsible for mycobacterial persistence during chemotherapy), embR (a regulator of genes involved in arabinan biosynthesis), kasB (a target for isoniazid) and desA3 (a target for isoxyl) might indicate an increased resistance of M. avium ssp. paratuberculosis to some antimicrobial drugs. The similar number of genes in M. tuberculosis and M. bovis indicates that the differences in the pathogenesis between these species could result from variations in the PE/PPE/PGRS genes and the ability of these mycobacteria to produce a diversity of lipids from the combination of fad and pks gene products in vivo.

There remain about one-third of the genes in each of these genomes that are not functionally characterized to date. Characterization of these genes will not only help increase our understanding of the physiology and molecular biology of these pathogens, but will also enable us to gain insights into pathogenesis and host specificity, and help in creating improved drugs.

Acknowledgements

We thank Weilong Hao for the gene comparison program and Drs Christian Baron, Marie Elliot and Justin Nodwell for their review of the manuscript. We thank the Editor and three anonymous reviewers for their insightful comments. This work was supported by a Natural Sciences and Engineering Research Council of Canada (NSERC) grant to G.B.G. We acknowledge funding by the USDA-CSREES and USDA-Agricultural Research Service to J.P.B.

Supplementary material

Appendix S1. Methods.

Table S1. List of unique genes in the Mycobacterium species.

Table S2. Variations in genes involved in energy metabolism.

Table S3. Variations in genes involved in amino acid biosynthesis.

Table S4. Variations in genes involved in biosynthesis of cofactors, prosthetic groups and carriers.

Table S5. Comprehensive list of genes involved in all the pathways in Mycobacterium species.

This material is available as part of the online article from http://www.blackwell-synergy.com

References

View Abstract