1. What is the difference between metagenomics and microbial diversity analysis? How should I choose?
1) Experimental Differences: Microbial diversity analysis primarily focuses on the amplification and sequencing of ribosomal small subunit gene sequences (such as 16S or 18S rDNA). In contrast, metagenomic sequencing simply requires fragmenting the extracted DNA for library preparation.
2) Analytical Differences: Microbial diversity sequencing relies on OTU (Operational Taxonomic Unit) clustering and is limited to studying species classification and relative abundance. Metagenomic sequencing, however, involves sequence assembly and evaluation, allowing for in-depth analysis at the species, gene, and functional levels. In short, microbial diversity analysis tells us who is in the environment, while metagenomics tells us what those microbes are capable of doing.
2. What is assembly, and why is it necessary?
After obtaining metagenomic sequences, quality control (QC) is first performed to remove adapters and low-quality reads. The sequences then undergo assembly. Assembly is the process of stitching the series of short sequencing reads into continuous long sequences, specifically merging short reads into longer scaffolds.
3. How can host contamination be excluded in metagenomic projects?
(1) During sampling: Avoid collecting samples from areas close to host tissues.
(2) During extraction: Use appropriate extraction kits designed to minimize host DNA.
(3) During analysis: If a reference genome of the host is available, host genomic contamination can be removed through alignment-based filtering.
4. If there is significant host contamination and no host reference genome is available, can metagenomic sequencing still be performed?
It is generally not recommended. If the sample contains a large amount of host DNA alongside the environmental DNA, severe contamination will occur after sequencing. Without a known host genome sequence to align against and filter out the contamination, the accuracy of the downstream analysis will be compromised, and the amount of usable data will be very limited. However, if the level of host genomic contamination is minimal after the extraction process, it can be considered negligible for the subsequent data analysis.