OUP user menu

Evolution and pathogenesis of Staphylococcus aureus: lessons learned from genotyping and comparative genomics

Ye Feng, Chih-Jung Chen, Lin-Hui Su, Songnian Hu, Jun Yu, Cheng-Hsun Chiu
DOI: http://dx.doi.org/10.1111/j.1574-6976.2007.00086.x 23-37 First published online: 1 January 2008


Staphylococcus aureus is an opportunistic pathogen and the major causative agent of numerous hospital- and community-acquired infections. Multilocus sequence typing reveals a highly clonal structure for S. aureus. Although infrequently occurring across clonal complexes, homologous recombination still contributed to the evolution of this species over the long term. agr-mediated bacterial interference has divided S. aureus into four groups, which are independent of clonality and provide another view on S. aureus evolution. Genome sequencing of nine S. aureus strains has helped identify a number of virulence factors, but the key determinants for infection are still unknown. Comparison of commensal and pathogenic strains shows no difference in diversity or clonal assignments. Thus, phage dynamics and global transcriptome shifts are considered to be responsible for the pathogenicity. Community-acquired methicillin-resistant S. aureus (C-MRSA) is characterized by a short SCCmec and the presence of a Panton–Valentine leukocidin locus, but no studies have proven their exact biologic roles in C-MRSA infection, indicating the existence of other mechanisms for the genesis of C-MRSA.

  • methicillin-resistant Staphylococcus aureus
  • comparative genomics
  • clonal complex
  • genotype


Staphylococcus aureus is an extraordinarily versatile pathogen that can survive in hostile environmental conditions, colonize mucous membranes and skin, and can cause severe, nonpurulent, toxin-mediated disease or invasive pyogenic infections in humans. In the 1940s, penicillin G was the treatment of choice for infections caused by S. aureus. However, since the 1960s, S. aureus strains resistant to the penicillinase-resistant penicillins, as represented by the original member of the class, methicillin, have gradually emerged worldwide (Ayliffe, 1997; Chambers, 2001). These strains have been historically referred to as methicillin-resistant S. aureus (MRSA) and are resistant to all β-lactam agents. Recently, these strains have become multi-resistant, exhibiting resistance to macrolides and lincosamides, and often to tetracyclines and gentamicin as well. Resistance to trimethoprim and sulfonamides is also prevalent in some countries. This type of MRSA is now a common cause of nosocomial infections in both developing and developed countries.

Different types of MRSA have been described with origins in the communities of different countries worldwide (Chambers, 2001; Vandenesch, 2003; Zetola, 2005). Resistance to penicillin and methicillin, but not to most or all other drug classes, characterizes these types of MRSA. For the most part, it appears to be an organism occurring in the community setting (Riley, 1995; Chambers, 2001), but hospital outbreaks have also been described (O'Brien, 1999).

Comparative genomics, including comparison at the sequence, transcriptome, and proteome levels, has been an increasingly important approach for scientists to improve knowledge on the pathogenesis and drug resistance of S. aureus. For example, vancomycin, as the last resort against multi-resistant MRSA, has gradually lost its potency due to the appearance of vancomycin-resistant strains. Whereas high-level vancomycin resistance in S. aureus has been shown to rely on horizontal transfer of vanA from Enterococcus faecalis (Chang, 2003; Weigel, 2003), the mechanisms underlying vancomycin-intermediate-resistant remain poorly understood. The first two sequenced S. aureus strains, Mu50 and N315, are a pair of sister strains whose genome sequences are nearly identical, making it difficult to target vancomycin-related genes. Cui (2005) identified c. 100 genes that show differential transcription by use of microarray expression analysis. These genes are thought to increase vancomycin resistance by involving the cell wall metabolic pathway.

The more the details regarding the evolution and pathogenesis of S. aureus are elucidated, the more the questions generated, awaiting further laboratory, epidemiologic, and clinical studies. Herein, the progress made on S. aureus during recent years, as well as the major challenges confronting researchers in this field is reviewed.

Evolution of the core genome

Clonal structure

The population of S. aureus presents a highly clonal structure. The clonality of S. aureus was initially discovered by multilocus enzyme electrophoresis and pulsed field gel electrophoresis, and later gained support from multilocus sequence typing (MLST). MLST is currently the most popular typing method through the sequencing of seven housekeeping genes (arcC, aroE, glpF, gmk, pta, tpi, and yqiL). For each gene, the different sequences are assigned as alleles and the alleles at the seven loci provide an allelic profile, which unambiguously defines the sequence type (ST) of each isolate. Furthermore, isolates with at least six of seven matching genes are thought to belong to the same clonal complex (CC). It has been shown that most MRSA strains can be grouped into five lineages: CC8, CC5, CC30, CC45, and CC22 (Enright, 2002), and 87% of S. aureus isolates, including both carriage and clinical isolates, are grouped into the 11 most frequent clonal complexes (Feil, 2004).

The clear clonal structure has inferred few genetic exchanges between lineages; in contrast, in a sexual species, frequent recombination disrupts linkage associations between alleles and the relationships between clonal complexes are more accurately represented as a network, rather than the usual bifurcating phylogenic tree. Examination of the sequence changes at MLST loci has proven that point mutations give rise to new alleles at least 15-fold more frequently than recombination (Feil, 2003). Most prokaryocytes exhibit a clonal structure to some extent. The clonality may result from geographic subdivision that can block genetic exchanges, a rapid propagation of certain clones that can overwhelm other sporadic clones, or some cryptic mechanism that can produce true clonality (i.e. long-term clonal evolution).

It seems that S. aureus belongs to the true clonality type, as the arbitrary mobility of mobile genetic elements (MGEs) is not allowed in S. aureus. In the laboratory, S. aureus is notoriously difficult to manipulate genetically, as evidenced by the rejection of exogenous plasmids. In addition, each lineage of S. aureus has its own phage range. As one of the earliest typing methods used for S. aureus, phage typing is based on the selective phage sensitivity of this species. Differences in the phage pattern between lineages are caused by the restriction–modification (RM) system, which has been observed in many taxonomically unrelated bacteria. Waldron & Lindsay (2006) showed that in S. aureus, the RM systems not only serve to protect the bacterial cell from phage lysis, but stringently control all types of foreign DNA acquisition, namely, transduction, conjugation, and transformation. Here, the RM systems specifically refer to two type I RM systems located in the genomic islands, νSaα and νSaβ, respectively, the only RM systems in S. aureus chromosome. The two islands have been found in all S. aureus strains, and the gene hsdS in the RM systems, which is responsible for sequence specificity, varies substantially between lineages. Therefore, it is tempting to speculate that the RM system plays a major role in forming the clonal structure in S. aureus.

Compared with S. aureus, Staphylococcus epidermidis does not have νSaα and νSaβ in its chromosome. The ratio of the recombination-to-point mutation in S. epidermidis is approximately twofold, far higher than that in S. aureus (Miragaia, 2007). Therefore, S. epidermidis has a putative population with an epidemic structure, in which its nine clones have emerged upon a recombining background and evolved quickly through lateral genetic exchanges. In staphylococci, it is thought that recombination often occurs in a phage-mediated fashion. Therefore, it is very likely that the absence of the type I RM systems results in the free transfer of phages between lineages, which can be regarded as additional evidence that the RM system has an effect on limiting recombination and the evolution of the population structure.


Although recombination occurs in S. aureus at a low frequency, its significance in the pathogenesis should not be overlooked. By calculating nucleotide substitution rates among orthologous genes of different strains, 45 genes have been identified to demonstrate anomalously high divergence at synonymous sites (Hughes & Friedman, 2005). Apart from those with hypothetical functions, most of the genes involved in recombination are related to pathogenesis, such as genes encoding staphylocoagulase, exotoxins, enterotoxins, and fibrinogen-binding proteins. Some of these genes have been verified by independent studies. For example, staphylocoagulase is an extracellular protein that causes coagulation of plasma and is regarded as the hallmark protein for the classification of S. aureus infections. Phylogenetic relations among coa do not seem to correlate with those among the flanking regions or the housekeeping genes used for MLST, indicating that coa can be laterally transferred among different lineages (Watanabe, 2005).

Sometimes, the recombination can even change the clonal structure. The relationships between STs are not always consistent, even between the seven housekeeping loci (Feil, 2003). More than half of these incongruent comparisons involve the arcC locus; this is often accounted for as a ‘hitchhiking effect.’arcC is in close proximity to three putative virulence genes, namely clfB, aur, and isaB. Because these genes encode proteins that are exposed to the host immune response, these loci are more likely to become recombination hot spots in order to introduce genetic diversity for adaptation to selection pressure. Such recombination will frequently extend into the arcC locus and may influence its sequence evolution.

Large chromosomal replacements have been identified in S. aureus, although rarely occurring naturally. The ST239 mosaic chromosome has ∼557 kb spanning oriC from its ST30 parent and ∼2220 kb spanning terC from its ST8 parent (Robinson & Enright, 2004). ST239 has thrived to become a pandemic lineage of MRSA represented by numerous clones, including epidemic EMRSA-1, -4, -7, -9, -11, Brazilian, Portuguese, and Vienna clones (Aires de Sousa, 1998; Witte, 1999), suggesting that a successful recombination event can breed a new pandemic clone.

Difference of gene content between lineages and between species

To date, nine S. aureus strains have been sequenced, including one laboratory strain (NCTC8325), one bovine strain (RF122), and seven human strains (COL, USA300, MW2, MSSA476, MRSA252, Mu50, and N315). The overall structures of all sequenced S. aureus chromosomes exhibit good synteny between each other. Approximately 78% of the genes are conserved among strains and constitute the ‘core genome.’ The remaining 22% of the genes comprise an ‘accessory genome,’ including genomic islands, pathogenicity islands (SaPIs), prophages, integrated plasmids, and transposons.

The entire ‘core genome’ is not as stable as the term suggests. Some regions in the core genome are exceptionally variable between lineages; therefore, the core genome can be further divided into stable core and core variable genomes, which can be easily discriminated by microarray analysis (Lindsay, 2006). Specifically, many of the ‘core variable’ genes encode virulence factors involved in pathogenesis, e.g., toxins, superantigens, exoenzymes, and regulatory elements. Apart from a higher nucleotide substitution rate, core variable genes often contain variable number tandem repeats (VNTRs). The best-studied VNTR loci in S. aureus are genes encoding microbial surface components recognizing adhesive matrix molecules (MSCRAMMs). Attachment to tissue, a key step during the infection process, is primarily mediated by the binding of MSCRAMMs to fibrinogen, fibronectin, collagen, and other components of the host extracellular matrix (Foster & Hook, 1998). A number of MSCRAMMs (e.g. ClfA and B; SdrC, D, and E; and FnbA and B) are characteristics of peptide repeats, which are prone to allow slippage error in replication or to induce recombination in these loci. It is well understood that hypervariation of virulence genes is due to competition with the host immune system and/or the fact that they are not critical for basic metabolism.

Homologue analysis has shown that MRSA252 and RF122 are more divergent than the other seven S. aureus strains (Fig. 1). Some other small details also demonstrate, and thus support, this notion. For example, SarT and U, two regulators that are believed to have evolved from SarA, are only missing from MRSA252 and RF122, and are present in all other seven strains. When S. aureus and the four sequenced coagulase-negative staphylococci (CoNS) strains (two S. epidermidis strains, one Staphylococcus haemolyticus strain, and one Staphylococcus saprophyticus strain) are combined for comparison, a large proportion of genes are conserved in their sequence and order on the chromosome comprising the backbone of the staphylococci genus genome (Fig. 2). A 0.4-Mbp region downstream staphylococcal cassette chromosome (SCC) has little homology among species, in which many important S. aureus-specific genes are located, such as spa (encoding protein A) and coa (encoding coagulase). Takeuchi (2005) designated it as an ‘oriC environ’ and hypothesized that this region is related to chromosomal inversion events within the staphylococci genus and has made an important contribution to the evolution and differentiation of the staphylococcal species.

Figure 1

Protein homology between nine sequenced Staphylococcus aureus genomes. In each box is the number of orthologues shared by the corresponding strains and median nucleotide divergence that reflects divergence between the two strains. The orthologue was constructed by the orthomcl program (Li, 2003). Nucleotide divergence is defined as the number of mismatch bases divided by the number of comparable bases. The color intensity in each box is in inverse proportion to the nucleotide divergence. The accession numbers of the S. aureus genomes are: NC_007622 (RF122).

Figure 2

Circular representation of the MW2 chromosome compared with other Staphylococcus aureus and CoNS strains. The outmost magenta arcs represent mobile genetic elements and a large surface-anchored protein-encoding gene (ebh); the black curve line represents the ‘oriC environ’ that starts from SCCmec and ends at about 0.4 Mbp on the chromosome. The four blue circles from the outside inward represent orthologues of MW2's coding sequences on Staphylococcus haemolyticus JCSC1435 (accession no. NC_002952), respectively. The green circle represents MW2's coding sequences. The innermost circle represents GC skew (orange, positive value; purple, negative value).

Table 1 lists the known virulence factors and regulators of S. aureus that are present or not present in the four sequenced CoNS strains. Nearly all prophages, genomic islands, and pathogenicity islands that harbor toxin genes are absent from the CoNS strains, which is supposed to be the most important reason for exceeding virulence of S. aureus. Adhesins and exoenzymes are also different between species. In contrast, agr and sarA, the two regulators responsible for the global regulation of virulence factors in S. aureus, are conserved in all staphylococcal species. Theoretically, most of the toxins and other S. aureus-specific virulence factors emerged in S. aureus after speciation. It would be interesting, then, to determine how agr and sarA have developed their new function in regulating virulence factors. It is possible that a functional coevolution occurred between agr/sarA and virulence factors, whereas those not regulated by agr/sarA, such as enterotoxins A and K (Tremaine, 1993), have possibly not completed coevolution.

View this table:
Table 1

Major virulence factors and regulators in Staphylococcus aureus that are present or absent from the sequenced CoNS strains

ProductGene nameLocationS. epidermidisS. saprophyticusS. haemolyticus
1-Phosphatidylinositol phosphodiesteraseplc
Triacylglycerol lipaselip+++
Serine proteasehtrA+++
Cysteine proteasesspB,C+
Serine V8 proteasesspA++
Serine proteasesspl(s)νSaβ
Zinc metalloproteinase aureolysinaur++
Cell wall hydrolaselytN
proteases ClpXclpX+++
Exotoxins/superantigen-like proteinsset(s)νSaα
Panton-Valentine leukocidinlukS,F-PVSaPI
Toxic shock syndrome toxin 1tstSaPI
γ-Hemolysin componentshlgA,B,C
exfoliative toxinseta,etb
Extracellular matrix binding proteinsebhA,B+
Elastin-binding proteinebpS+++
Fibronectin-binding proteinsfnbA,B
Intercellular adhesion proteinsicaA,B,C,D+
Collagen adhesin precursorcna
Clumping factorsclfA,B
Ser-Asp rich proteinssdr+++
Immunoglobulin G (IgG)-binding protein Aspa
Capsular polysaccharide synthesis proteinscapA-G++
Ferrichrome ABC transporterfhuD++
IgG-binding protein SBIsbi
Iron uptake IsdisdA-G,srtB
Two-component regulatory systems
Accessory gene regulatoragrA,B,C,D+++
S. aureus exoprotein expression regulatorsaeS,R+
Staphylococcal respiratory response proteinsrrA,B+++
Autolysis-related locusarlS,R+++
SarA protein family
Staphylococcal accessory regulator AsarA+++
Staphylococcal accessory regulator RsarR+++
Staphylococcal accessory regulator SsarS
Staphylococcal accessory regulator T,UsarT,U
Repressor of toxinsrot+++
  • * Some products, such as sarT and U, toxins located in mobile genetic elements, are not present in all S. aureus strains. The two sequenced S. epidermidis strains are RP62a (accession no. NC_007168).

  • set cluster are now re-designated as ssl (Staphylococcal Superantigen-Like proteins) cluster Lina (2004).

  • ‡ (s) indicates it is a gene cluster rather than a single gene.

  • § Most enterotoxin genes are located in νSaβ, but some enterotoxin genes, such as sea, seg2, sek2, sel, sec3, are located in prophages and SaPIs.

  • νSaα and νSaβ are two genomic islands. Except those located in genomic islands, prophages, and pathogenicity islands (SaPIs), other virulence factors and regulators are located in the core genome.

agr Groups

To gain an insight into the relatedness among the S. aureus species, including those strains not sequenced, the concatenated sequence of MLST alleles is often used for reconstructing a phylogenetic tree. Sometimes, SAS genes that encode putative surface proteins are also included to provide more informing sites. Based on MLST, SAS sequence, and agr typing, Robinson (2005a) proposed a ‘two-subspecies’ hypothesis stating that both subspecies contain agr I, II, and III groups. The topology derived from the hypothesis is in agreement with the conditional tree constructed by the use of microarray analysis of core variable genes from 161 isolates (Fig. 3; Lindsay, 2006).

Figure 3

Comparison of relatedness derived from two different methods. (a) Conditional tree constructed by the use of microarray analysis of core variable genes from 161 isolates (Lindsay, 2006). (b) Phylogenetic tree based on MLST, SAS sequences, and agr typing (Robinson, 2005a). The dotted line separates lineages into two putative subspecies.

The essential part of the hypothesis is the agr locus. Bacterial interference is a commonly observed phenomenon in which strains of different species or lineages exclude each other in the sites of infection or colonization. In S. aureus, agr is responsible for this phenomenon. It encodes a two-component signaling pathway with the activating ligand of an auto-inducing peptide (AIP). Polymorphism in the sequence of AIP and its corresponding receptor divide S. aureus strains into four major groups. Within a given group, each strain produces a peptide that can activate the agr response in the other member strains, whereas AIPs belonging to different groups are mutually inhibitory (Ji, 1997; Jarraud, 2000).

However, the species are not subdivided into three or five monophyletic agr groups. Strains of the same agr group are not related to each other. For example, MRSA252 and two CC1 strains, MW2 and MSSA476, belong to agr III, but MRSA252 is the most divergent among the seven human strains compared according to the proportion of strain-specific genes and pairwise synonymous substitution rates (Holden, 2004; Hughes & Friedman, 2005). Thus, Robinson (2005a) proposed that the evolution of S. aureus includes four phases. The initial phase is the speciation event that led to the origin of S. aureus; the second phase is the divergence of S. aureus into two subspecies groups, each having agr I, II, and III; the third phase is the divergence of agr I and IV within subspecies group 1; and the final phase is the recombination event between agr I and IV, resulting in agr I/IV.

It is still unclear whether the divergence of the two subspecies groups precedes the divergence of the agr groups. However, it can be speculated that some more important events may have occurred during S. aureus evolution based on the hypothesis (Fig. 4). CCs should arise, at least after the divergence of the three agr groups, because it seems impossible that the ancient agr is able to evolve to the same agr variants in different lineages convergently. Meanwhile, the genomic islands, νSaα and β, which exist in all S. aureus strains, must have entered the genome shortly after the speciation of S. aureus. Given the important role of νSaα and β in lineage formation, the hypothesis must be accepted that the divergence of RM systems within the islands did not occur at least until the divergence of agr groups.

Figure 4

Illustration of the hypothetical Staphylococcus aureus evolutionary history. The whole S. aureus species can be divided into two putative subspecies (Robinson, 2005a). The circles with different colors represent different agr groups, and the circles with numbers inside represent the corresponding clonal complexes. The arrows on the right side indicate the important phases during the S. aureus evolution.

Some diseases are known to be related to certain agr groups, such as the association of agr III with menstrual toxic shock syndrome (Ji, 1997) and Panton–Valentine leukocidin (PVL)-induced necrotizing pneumonitis (Gillet, 2002), the association of agr IV with exfoliatin production (Jarraud, 2000), and the association of agr I and II with reduced vancomycin susceptibility (Sakoulas, 2002). It is probable that the genome of a certain agr group has specific gene combinations that give rise to a specific phenotype.

Evolution of the accessory genome


Methicillin resistance in MRSA results from the presence of a modified penicillin-binding protein (PBP-2a alias PBP2' and MecA), which has a reduced affinity for methicillin and other β-lactams, and hence retains critical functions necessary for cell homeostasis (Chambers, 1985; Lowy, 1998; Mallorqui-Fernandez, 2004). PBP-2a is encoded by the mecA gene located in the staphylococcal chromosome within a discrete region called the SCC (SCCmec; Hiramatsu, 2001). Apart from the mec divergon that further encodes a transmembrane signal-transduction system to trigger the resistance response, SCCmec possesses another essential genetic component, the ccr complex, which is responsible for the mobility of SCCmec. The rest of SCCmec is designated as the junkyard (J) region, whose presence does not appear to be essential for bacterial cells (Ito, 2003). Five types of SCCmec have been described, according to the combination of different variants of mec and ccr complex and subtypes of J regions.

The first MRSA clinical isolate was reported in 1961, only 1 year after the introduction of the drug into the clinic (Jevons, 1961; Hiramatsu, 2001). Although the origin of SCCmec is unknown, evidence of an interspecies exchange of DNA has been found between CoNS and S. aureus (Wielders, 2001; Wisplinghoff, 2003; Hanssen, 2004). Frequent conversion of methicillin-sensitive S. aureus (MSSA) to MRSA by the lateral transfer of SCCmec has also been described (Enright, 2000; Fitzgerald, 2001; Robinson & Enright, 2003), suggesting that MSSA is the origin of MRSA and that MRSA strains may evolve multiple times independently, rather than from a single ancestral strain. It is noteworthy that MRSA is restricted to five CCs, but that S. aureus as a whole is distributed among 11 CCs. It may be the case that the five MRSA lineages have a greater capacity to accept SCCmec by some unknown mechanism, even though SCCmec is routinely inserted into a region adjacent to orfX that seems to exist in all S. aureus variants. It may also be the case that the five lineages are more virulent and prevalent; selection pressure from antibiotics in the hospital setting has perhaps necessitated the five lineages to retain SCCmec.

A variety of insertion sequences (ISs), transposons, and plasmids have been found in SCCmec, including Tn554, IS1272, IS431, pUB110, pT181, and p1258. Perhaps the mec complex could even be regarded as a mobile element, as its integration into SCC probably causes the conversion of SCC into an antibiotic determinant. Apart from increasing the range of drug resistance to antibiotics, such as methicillin, macrolides, aminoglycosides, tetracycline, and bleomycin, the insertion of these mobile elements provides potential hot spots for recombination, therefore helping remodel the structure of SCCmec and giving rise to a greater number of structural variants. SCCmec III appears to be composed of two SCC elements because it contains two copies of the ccr complex and two copies of Tn554, which may be explained by the sequential integration of two copies of SCC, followed by deletion of internal parts (Ito, 2001).

Another mobile element often integrated into SCC is the RM system, which exists in SCC476, SCCmec V, and SCCpbp4. The origins of these RM systems are unknown, but differences in the nucleotide sequences show that they originated from different places. It is interesting to ponder why RM systems prefer insertion into SCC; however, their roles in S. aureus evolution should not be overemphasized because only a small proportion of S. aureus possess SCC that contain RM systems.

Genomic islands

The two islands, νSaα and νSaβ, have been found in nearly all S. aureus isolates of divergent clonal, geographic, and disease origins (Fitzgerald, 2003). Both islands are nurseries of tandem paralogous gene clusters. νSaα encodes for a cluster of staphylococcal superantigen-like proteins, the so-called set cluster (now redesignated as the ssl cluster; Lina, 2004), and a cluster of lipoproteins (lpl cluster), while νSaβ encodes for a serine protease cluster (spl cluster) and an enterotoxin cluster. All these clusters are virulence factors, especially the enterotoxin gene cluster. Staphylococcal diseases are often the result of the intake of enterotoxin-contaminated food (Bunnin., 1997).

Although the two genomic islands are ancient features of the S. aureus genome, the evolution of these clusters is still active. Frequent recombination and deletion events lead to a variation of the copy number of toxin genes between the isolates. Interestingly, Thomas (2006) found that within the enterotoxin gene cluster, most isolates have a prevalent archetype that carries two pseudo-enterotoxins, ϕent1 and 2, while in a few isolates, recombination between the two pseudogenes has led to the emergence of new toxins. It is therefore tempting to speculate that the accumulation of virulence genes may not always confer an optimal selective advantage on isolates. Likewise, Fitzgerald (2003) proposed an ‘independent loss’ model for the set cluster, such that the ancestral state of the set cluster may be represented by a complete complement of set genes and then the loss of the set genes has occurred several times independently within separate lineages. These phenomena contradict the traditional view that more toxin variants offer the pathogen more choices against the host immune system, and that amplification may be selected if the paralogues have a weak, but slightly selected product.

SaPIs and prophages

SaPIs and prophages are both important vectors carrying virulence factors. Identified virulence factors include staphylokinase, enterotoxins, toxic shock syndrome toxin, and PVLs. Horizontal transfer of SaPIs relies on the ‘helper’ phage. It is now known that SaPI-1 can be excised and circularized by staphylococcal phages Φ13 and 80α, and then it can be efficiently encapsidated into special small phage heads and replicates during the latter growth, which transduces it at a very high frequency (Lindsay, 1998; Ruzin, 2001). SaPI-2 and SaPI-3 can also be excised from chromosomes and form extrachromosomal closed circular DNA (Baba, 2002).

Many of the genes contained in SaPIs are homologous to the described phage genes, suggesting they are of bacteriophage origin. Yarwood (2002b) proposed a recombination model for SaPI genesis, following which a mis-recombination event could have led to the replacement of a segment of phage DNA necessary for complete phage function with a chromosomal segment. In this way, SaPI would have become dependent on a wild-type helper phage for excision, packaging, and/or mobilization.

A remarkable feature of SaPIs, prophages, and phages is their mosaic structure. Phage genes can be classified into six functional categories: DNA replication, integration, packaging, head, tail, and lysis (Kwan, 2005). Accordingly, the distribution of these phage genes maps to discretely functional modules. One functional module found in one phage can be replaced in another phage by a sequence-unrelated module that frequently fulfills the same or a related function. Based on this theory, a module, rather than the entire phage genome, has a relatively independent evolutionary history (Brussow, 2004).

The mosaic structure confuses the nomenclature of the prophage and SaPI to some extent. Lindsay & Holden (2004) suggested classifying MGEs on the basis of integrase gene homology, as this enzyme usually determines the MGE insertion site within the genome. However, due to a module exchange, an SaPI/prophage with the same integrase may have an entirely different gene content. For example, even though ΦPVL shares an integrase and the PVL locus with ΦSLT, most genes of ΦPVL are more like prophage ΦSa3, while genes of ΦSLT are more similar to ΦSa2. SaPI-3 in Mu50 and MW2 are clustered together according to the integrase sequence, but with respect to gene content, SaPI-3 in MW2 seems to be more similar to SaPI-5 in USA300 (Fig. 5).

Figure 5

Illustration of the mosaic structure of phages and SaPI in Staphylococcus aureus. Segments having sequence identities of more than 90% are linked by green shading. Known functions of ORFs are colored as follows: lysogeny, blue; replication and recombination, red; packaging and head protein, yellow; tail protein, green; lysis, cyan; toxin, black. (a) Alignment of phage/prophage sequences. Structures of the four sequences are indicated based on the following nucleotide sequences: ΦSa3 in NCTC8325 (accession no. NC_007793).


Phage dynamics

Staphylococcus aureus is often considered to be an opportunistic pathogen. On the one hand, it can cause life-threatening diseases; on the other, healthy people also carry S. aureus in their anterior nares. From longitudinal studies, it has become clear that 10–35% of individuals carry S. aureus persistently, 20–75% carry S. aureus intermittently, and 5–50% never carry S. aureus in their noses (Armstrong-Esther, 1976).

Another staphylococci species, S. epidermidis, is also an opportunistic pathogen. The essential pathogenesis of foreign-body-associated S. epidermidis infection is biofilm formation, which is a two-step process. The first step, bacterial attachment to a surface, is related to a cell surface protein (an autolysin) encoded by the chromosomal atlE gene. The second step, including cell aggregation and biofilm accumulation, is mediated by the products of the chromosomal intercellular adhesion (ica) operon. Phase variation of virulence in S. epidermidis can occur by insertion/excision of IS256 or by a rearrangement-mediated genetic defect, which results in inactivation of the ica operon (Ziebuhr, 1999, 2000).

There has been no report on chromosomal rearrangement in S. aureus to date, and the number of IS in S. aureus is evidently less than in CoNS. IS256 has only been found to influence teicoplanin resistance in S. aureus by insertion inactivation of the tca gene (Maki, 2004). It has also been shown that a mutation in the ica genes of a clinical S. aureus isolate has little effect on biofilm formation (Beenken, 2004). The pathogenic mechanism of S. epidermidis thus differs from S. aureus. Because a considerable part of the toxin genes are located in SaPIs and prophages, phage dynamics are of apparent importance for the pathogenesis of S. aureus. Assays of consecutive isolates have revealed that commensal strains possess a very low transformation rate and evolved slowly over time; in contrast, phages are remarkably active within pathogenic strains and the genome plasticity of pathogenic strains is evidently elevated (Goerke, 2004). According to clinical sampling and animal model experiments, the role of certain phages for pathogenesis has been proven by the fact that isogenic isolates, with and without the phage, can have a strikingly different ability to cause disease (Moore & Lindsay, 2001; Bae, 2006).

Toxin genes do not accumulate within the chromosome without limit because MGEs can often exclude each other. For example, no clinical isolates have been found to simultaneously contain TSST-1 (in SaPI-1) and SEB (in SaPI-3), perhaps because the two SaPIs share identical att sequences and therefore compete for the same insertion site in the chromosome (Yarwood, 2002b). Negative correlations between tst and lukE-splB and between lukE-splB and seg-sei have also been reported (Moore & Lindsay, 2001). Because lukE-splB are located in νSaβ, which exists in all S. aureus strains, it is likely that phages that have tst or seg-sei are inhibited by the type I RM system of certain lineages. Although free transfer of MGEs is not allowed across lineages, it is allowed within the same sequence type. The TW strain has accumulated all detectable MGEs that were variably expressed by other epidemic ST239 strains in the United Kingdom, and developed an enhanced ability to cause bloodstream infection (Edgeworth, 2007). Epidemiologists should always be vigilant of such ‘superbugs.’

The actual pathogenesis of phages in S. aureus is much more complex, as exemplified by phage ΦSa3, which has been studied by Goerke (2006) in detail. This phage encodes for the immune evasion molecules (SAK, SCIN, and CHIPS) that are widely distributed in clinical isolates. Usually, they insert specifically within the β-hemolysin (Hlb) gene, so that the recipient is negatively affected by the inactivation of a virulence factor, but atypical integration (not in hlb) has also been found. Analysis of the sequence of the integrase gene demonstrated no difference between typical and atypical sak-encoding prophages. Thus, the integrase allows illegal integration, which contradicts the classical view that the integrase specifically recognizes the chromosome attB site. In addition, ΦSa3 was also found to able to be stabilized extra-chromosomally during its life cycle, although it is not known whether the phage is able to express toxin genes in this state.

The knowledge of the type I RM system in S. aureus is still unsatisfactory, although it is thought to be, at least partly, responsible for lineage segregation. The extent of how the type II RM systems restrict gene flow is also unclear. Phage K is a large, virulent bacteriophage that infects a broad range of staphylococci, including multiple-drug-resistant strains of S. aureus. A remarkable paucity of the sau3A1 restriction site (GATC) is thought to be an efficient mechanism that phage K developed to avoid host restriction-modification systems (O'Flaherty, 2004). In contrast, vanA-encoding Tn1546 and the PVL locus have an abundance of sau3A1 restriction sites, which may be the basis for the two elements residing in only a small proportion of strains. While these sequence properties support the role of the type II RM system in restricting gene flow, the contradicting phenomenon in Helicobacter pylori forces one to abandon the notion. More than 20 RM systems, comprising more than 4% of the total genome, have been identified in sequenced H. pylori strains (Lin, 2001), but H. pylori is naturally competent to take up DNAs.

Expression and regulation of virulence factors

A number of virulence genes have been identified in the S. aureus genome, conferring upon S. aureus the ability to cause various types of disease; which genes are necessary for which infection is still unclear. Many of the previous epidemiologic studies have focused on the presence or absence of a given genetic determinant. Regrettably, this type of research cannot explain the phenomenon that toxic shock syndrome cases are rare, while c. 20% of S. aureus strains carry the toxin gene tst (Moore & Lindsay, 2001; Peacock, 2002).

A direct comparison of S. aureus isolates collected from both disease and asymptomatic carriers revealed no difference in the diversity or clonal assignments (Feil, 2003). Thus, commensal and pathogenic strains are not two distinct types of organisms, but the same organism in different states. When S. aureus switches from commensal to pathogen, it has to face a completely different environment and undergo a much more severe host defense system. Thus, a global change of expression pattern is expected. Voyich (2005) found that under in vitro growth conditions, the highly expressed genes involved in transcription and protein biosynthesis, maturation, and folding typically dominate bacterial gene expression; when phagocytized by neutrophils, the overall functional profile of highly expressed genes would shift to pathogenicity-related genes, such as those involved in virulence, metabolism, capsule synthesis, and gene regulation.

Phage dynamics also cause conversion between commensalism and pathogenicity, as discussed above. It should be noted that virulence factors in phages do not express their pathogenic roles independently. A recent study has shown that the expression of PVL leukotoxin induced global changes in transcriptional levels of genes encoding secreted- and cell wall-anchored staphylococcal proteins that are located in the core chromosome (Labandeira-Rey, 2007).

The expression of staphylococcal virulence factors and cell surface adhesion proteins is largely regulated by two-component regulatory systems (agr, saeRS, srrAB, arlSR, and lytRS) and the SarA protein family (SarA, SarR, Rot, SarS, SarT, and SarU). agr is thought to be the most important locus governing growth-phase-dependent regulation of virulence factors (Novick, 1993; Novick, 2003). By effecting promoter P3 transcript RNAIII, agr controls the up-regulation of genes encoding secreted proteins (α-toxin, β-hemolysin, TSST-1, and leukotoxins) and down-regulation of genes encoding surface proteins (protein A, coagulase, and fibronectin binding protein). SarA activates agr via an SarA-agr promoter interaction (Chien & Cheung, 1998; Rechtin, 1999) during the postexponential phase and therefore alters the synthesis of virulence factors. SarA also regulates several cell wall-associated proteins and exoproteins directly in an agr-independent way (Chien, 1999).

Most of these studies on agr regulation were performed under in vitro conditions. When it comes to in vivo conditions (i.e. in animal models of infection), expression of agr does not significantly affect the expression of virulence factors (Goerke, 2000, 2001; Heyer, 2002; Yarwood, 2002a), perhaps because the environment of an actual infection generates many signals that are not present in laboratory media. Furthermore, cell intensity may be an important source that gives S. aureus different stimuli. When in the laboratory, S. aureus are usually cultured under planktonic conditions. Staphylococcus aureus more often grows in biofilm form during an infection because a biofilm can help it withstand stronger host defense responses and antibiotic stress. By microarray analysis, it is known that the processes involved in cell wall synthesis and other distinct physiologic activities of the cell play a crucial role in biofilm persistence (Resch, 2005).

What the exact role of agr in the actual pathogenic process is still unknown and there are many other phenomena that cannot be explained by the current knowledge. For example, epidemic MRSA (EMRSA) is thought to retain a high secretion of toxin, given its enhanced ability to colonize and infect. However, all EMRSA isolates tested are poor α-toxin producers, while the sporadic strain maintains a relatively high level of protein A (spa) transcripts during the exponential and postexponential growth phases (Sabersheikh & Saunders, 2004). Given the enormous difficulty of deciphering the complicated network of virulence genes, a compromising approach would be to find an association between lineage and disease. Theoretically, strains from the same lineage have the same core variable genes and share the common pool of SaPIs and phages for genetic exchange. Therefore, they would potentially infect the same population of people and cause the same type of disease. However, it would be impertinent to conclude that all strains are equally virulent. Even though each lineage may have a specific set of core variable genes, especially adhesin genes, which could determine the power of adhering to epithelial cells of certain populations and decide the potential group of the host population, many other prerequisites, such as fast growth rate and strong survival ability, work together to determine whether a clone would develop into a successful pathogen. Indeed, some lineages deserve careful surveillance based on epidemiologic investigations. For example, CC1 has emerged as the leading lineage that has a strong association with community-acquired diseases.

Community-acquired MRSA (C-MRSA)

During the last two decades, staphylococci have shown a trend of increasing virulence. CoNS have generally been regarded as saprophytes or organisms with no or very low virulence. However, there has been an increase in the documentation of human infections due to CoNS, especially with S. epidermidis. With respect to S. aureus, C-MRSA outbreaks have been reported worldwide. The extreme heterogeneity of the genetic background in C-MRSA indicates that any strain of S. aureus has the potential to become a C-MRSA. The reemergence of 80/81 isolates is the best example (Robinson, 2005b). This early MSSA clone waned in the 1960s when methicillin was introduced into clinical use. However, it did not really disappear, but probably took refuge, in healthy people as a commensal strain. Upon acquisition of SCCmec IV, it reemerged as a pandemic clone. Given the giant pool of MSSA strains, including those carried in healthy people, it is unavoidable that a new C-MRSA outbreak will occur sooner or later.

To investigate the pathogenesis of C-MRSA, it is most important to distinguish C-MRSA from hospital-acquired MRSA (H-MRSA), which is seemingly easy, but is actually difficult. On the one hand, nosocomial colonization with MRSA usually goes undetected and may lead to infection many months after hospital discharge (when the patient is in the community). On the other hand, enhancement of drug resistance in recent years has made C-MRSA strains achieve more success in replacing other MRSA strains in some hospitals (Seybold, 2006). A general definition now is that C-MRSA strains should be isolated in an outpatient setting or from patients within 48 h of hospital admission; such patients must have no history of MRSA infection and no history in the previous year of either admission to a nursing home, hospitalization, dialysis, or surgery.

SCCmec IV and the PVL locus are now considered to be two characteristic features of C-MRSA strains. SCCmec IV is rarely seen in H-MRSA strains, but it is the dominant allotype in C-MRSA. C-MRSA is now thought to arise from horizontal transmission of SCCmec IV into MSSA. SCCmec IV was highly prevalent among S. epidermidis from the 1970s and has not been found among MRSA isolates recovered during that time. The first S. aureus isolates carrying SCCmec IV were recovered in the early 1980s and then spread worldwide (Wisplinghoff, 2003). Robinson & Enright (2003) found that nearly one-half of conversions from MSSA clones to MRSA clones involved SCCmec IV. Within the past 2 years, SCCmec V has emerged with an intimate association with C-MRSA, especially in Australia and Taiwan (Ito, 2004; Boyle-Vavra, 2005; O'Brien, 2005). Compared with multi-resistance conferred by SCCmec II and III, both SCCmec IV and V have a small size, which has been attributed to less metabolic burdens of protein synthesis during replication. Previous observations have reported that C-MRSA strains grew significantly faster than H-MRSA strains (Baba, 2002; Okuma, 2002). This high growth rate may be a prerequisite for C-MRSA to achieve successful colonization in humans by outcompeting the numerous bacterial species in the environment.

PVL is a bi-component, synergohymenotropic toxin that exerts cytolytic pore-forming activity directed at the cell membranes of neutrophils, monocytes, and macrophages (Kaneko & Kamio, 2004). Clinically, PVL is associated with skin abscesses and necrotizing pneumonitis (Lina, 1999; Gillet, 2002). Although PVL genes are usually found in only 2% of S. aureus clinical isolates (Holmes, 2005), they have been found in most C-MRSA strains (Vandenesch, 2003). Within the two sequenced C-MRSA strains, MW2 and USA300, the PVL locus resides in prophage ΦSa2. However, Lindsay (2006) have found that in a collection of lukS, F-PV-positive isolates, only one carried ΦSa2 genes, suggesting that the PVL locus is able to be horizontally transferred among a wide range of phages.

ST36:USA200 and ST30:USA1100 provide an ideal model for investigating the role of SCCmec IV and the PVL locus in the pathogenesis of C-MRSA infections. They both belong to CC30 and share very similar genetic backgrounds. The SCCmec II, PVL-negative ST36:USA200 strain was endemic in health care facilities during 1996–2000, while the SCCmec IV, PVL-positive ST30:USA1100 strain was epidemic in community populations during 1998–2001 (Binswanger, 2000).

Although the short SCCmec and PVL locus are important for C-MRSA based on results of epidemiologic analyses, no studies have proven their exact biologic roles in the pathogenesis of community-acquired infections. In fact, there are also C-MRSA strains that possess neither the short SCCmec nor the PVL locus. Moreover, there has been research suggesting that the PVL locus is not the major determinant of C-MRSA because the PVL-negative and -positive strains performed similarly during neutrophil lysis (Said-Salim, 2005) and were equally lethal in a sepsis model (Voyich, 2006).

Because the mechanism of C-MRSA infection is poorly understood, the genomics approach has been applied in some studies. Oligoarrays were used to detect specific chromosomal regions for C-MRSA (Koessler, 2006). Proteomics on MW2 and LAC were also explored, indicating that exoproteins accounted in part for the success of C-MRSA (Burlak, 2007). Transcriptomic comparison between H-MRSA (COL and MRSA252) and C-MRSA (MW2 and MnCop) revealed that several putative membrane or exported proteins of unknown function were up-regulated only in the community strains (Voyich, 2005). The difference found between these representative strains may be the difference between lineages, rather than the real difference between C-MRSA and H-MRSA. Obviously, a direct comparison of C-MRSA and H-MRSA of the same sequence type would generate more persuasive data.

Concluding remarks

Genetic exchanges inside S. aureus and between S. aureus and other staphylococcal species are one of the most important topics in the research of the evolution and pathogenesis of S. aureus. A scarcity of recombination contributes to the highly clonal structure of S. aureus; lateral transfer of MGEs is controlled by some cryptic rules, resulting in mutual exclusion of toxin genes and preference of specific toxin genes within certain lineages. A better understanding of the underlying mechanism of genetic exchange would help one to reconstruct the history of lineage formation and to make some interesting predictions, for example, whether those dangerous determinants, such as vanA and the PVL locus, would disseminate to a larger extent or whether the barriers to gene flow would make clonal complexes evolve to become a new biological species.

The short SCCmec and the presence of PVL locus characterize C-MRSA, but no studies have demonstrated their exact biologic roles in C-MRSA infection. It is often thought that C-MRSA and H-MRSA belong to different lineages within a geographic area, but this is probably not the case. In Taiwan, ST59 accounts for nearly all C-MRSA infections and c. 20% of H-MRSA infections; the proportion is still in an increasing trend. Thus, it is hypothesized that all lineages have the potential to develop into both C-MRSA and H-MRSA clones, if without competition from other lineages. Given the fact that C-MRSA and H-MRSA can be isolated within the same lineage, it is likely that the difference of virulence gene expression has differentiated the two types of organisms. Likewise, commensal and pathogenic strains are the same organism of two different states rather than two different types of organisms. Conversion from commensal to pathogen must also be achieved by a shift of the global expression profile. agr and sarA are two global transcription regulators based on in vitro experiments, but their regulatory effects are dramatically weakened under in vivo conditions. The suspicion therefore arises as to whether their roles in pathogenesis are overestimated. Thus, a future challenge for researchers is to investigate the interaction between regulators and the virulence genes in the pathogenesis of S. aureus.


The study of methicillin-resistant Staphylococcus aureus in the Department of Pediatrics, Chang Gung Memorial Hospital, Chang Gung University College of Medicine, was supported in part by grants 94-2321-B-182A-002 from National Science Council, Taiwan, and CMRPG33029 from Chang Gung Memorial Hospital, Taiwan.


  • Editor: Ramon Diaz Orejas


View Abstract