Evolutionary Dynamics of Abundant 7-bp Satellites in the Genome of Drosophila virilis
The factors that drive the rapid changes in abundance of tandem arrays of highly repetitive sequences, known as satellite DNA, are not well understood. Drosophila virilis has one of the highest relative amounts of simple satellites of any organism that has been studied, with an estimated >40% of its genome composed of a few related 7-bp satellites. Here, we use D. virilis as a model to understand technical biases affecting satellite sequencing and the evolutionary processes that drive satellite composition. By analyzing sequencing data from Illumina, PacBio, and Nanopore platforms, we identify platform-specific biases and suggest best practices for accurate characterization of satellites by sequencing. We use comparative genomics and cytogenetics to demonstrate that the highly abundant AAACTAC satellite family arose from a related satellite in the branch leading to the virilis phylad 4.5–11 Ma before exploding in abundance in some species of the clade. The most abundant satellite is conserved in sequence and location in the pericentromeric region but has diverged widely in abundance among species, whereas the satellites nearest the centromere are rapidly turning over in sequence composition. By analyzing multiple strains of D. virilis, we saw that the abundances of two centromere-proximal satellites are anticorrelated along a geographical gradient, which we suggest could be caused by ongoing conflicts at the centromere. In conclusion, we illuminate several key attributes of satellite evolutionary dynamics that we hypothesize to be driven by processes including selection, meiotic drive, and constraints on satellite sequence and abundance.