The concept of the pan-genome was first introduced by Tettelin et al. in 2005 in bacteria. It refers to the complete repertoire of genomic information within a biological clade (e.g., a species) and is composed of the core genome, shared by all individuals, and the dispensable genome, which is shared by some or specific to individuals. The core genome is generally associated with fundamental biological functions and major phenotypic traits, reflecting species stability, whereas the dispensable genome is linked to environmental adaptation and unique biological characteristics, reflecting species diversity.
By employing high-throughput sequencing and bioinformatics approaches—specifically library construction and third-generation long-read sequencing across different species, subspecies, or strains followed by individual assembly—researchers can construct a pan-genome map. Based on this map, a graphical variation database encompassing structural variations (SVs) and presence-absence variations (PAVs) can be built. This enriches the genetic information available for the species and facilitates the investigation of critical biological questions.
Technical Workflow
Applications
1. Species Origin and Evolution
2. Gene Resources for Key Traits and Scientific Breeding
3. Adaptive Evolution
4. Invasiveness of Alien Species