\(^1\)Senior AI solutions developer,
Parexel
\(^2\)Associate Professor,
The University of Verona
The fundamental principle of amplicon sequencing for microbial community analysis is to target a specific, short, and conserved genetic region that is present in most (or all) members of the microbial group of interest (e.g., bacteria, fungi, archaea). This region also needs to contain enough variation to distinguish different species or even strains.
For bacteria and archaea, the most common target is the 16S ribosomal RNA (rRNA) gene. For fungi, it’s often the Internal Transcribed Spacer (ITS) region. These genes are chosen because:
The Illumina MiSeq system has been, and largely remains, the most common and widely used platform for amplicon sequencing in microbial community analysis, particularly for 16S rRNA gene sequencing:
Ease of Use: Method is relatively easy to perform with well-established protocols.
Cost-Effectiveness: Highly cost-effective for multiplexed targeted sequencing applications (e.g., 16S rRNA gene sequencing) due to high throughput.
High Throughput for Targeted Sequencing: Capable of generating a large number of reads suitable for analyzing many samples simultaneously in amplicon-based studies.
Straightforward Analysis for Standard Applications: Well-developed bioinformatics pipelines and tools are available for common applications like 16S amplicon analysis.
High Accuracy (Low Raw Error Rate): Illumina chemistry generally provides a low raw sequencing error rate.
Robust Ecosystem and Community Support: Large user base, extensive documentation, and readily available bioinformatics tools and resources.
Relatively Fast Turnaround: Offers quicker run times compared to higher-throughput sequencers, suitable for moderate-sized projects.
Limited Resolution for Taxa Identification: In amplicon sequencing, taxonomic assignment is based on a short, single sequence region, limiting resolution, especially at the species or strain level. This is true for amplicon sequencing in general.
Inherent Biases in PCR-Based Workflows:
Impact of Sequencing Errors: While low, sequencing errors are not perfectly constant and can still lead to overestimation of diversity if not properly mitigated by denoising algorithms.
Assumptions Regarding Quantitative Accuracy:
Limited Read Length: Max read lengths (2x300 bp)
Consumable Costs (when analyzing few samples): The cost of a MiSeq reagent kit can be substantial for very low-throughput projects, making it less economical in such scenarios.
Requires Bioinformatics Expertise: Despite straightforward pipelines, interpreting complex biological data and troubleshooting still demands significant bioinformatics knowledge.
Sample Collection
DNA Extraction: This step is critical as it influences the diversity and quantity of DNA obtained and can be source-dependent (e.g., specific kits for soil vs. stool).
Quality Control: Assess DNA quantity (e.g., Qubit, NanoDrop) and integrity (e.g., gel electrophoresis) to ensure sufficient and high-quality input for downstream steps.
Primer Design/Selection: Choose appropriate 16S rRNA gene region-specific primers (e.g., V3-V4, V4, V1-V2) that are suitable for the target microbial community and sequencing platform.
PCR Amplification (First Round): Perform PCR to amplify the target 16S rRNA gene region from the extracted DNA. This step uses primers that bind to conserved regions flanking the variable region of interest.
PCR Product Cleanup: Purify the PCR products to remove unligated primers, nucleotides, and enzymes. This is often done using magnetic beads (e.g., AMPure XP) or gel extraction.
Quality Control: Verify amplicon size and presence using gel electrophoresis or a bioanalyzer/TapeStation. Quantify purified PCR products.
Index PCR (Second Round PCR): Second PCR step is performed using primers that incorporate:
Library Cleanup: Purify the indexed PCR products again to remove excess primers, primer-dimers, and short fragments.
Library Quantification and Quality Control:
Library Pooling: Based on the quantification results, pool equimolar amounts of the barcoded libraries. This ensures an even representation of samples on the sequencing run.
Denaturation and Dilution: Denature the pooled libraries into single strands and dilute them to the appropriate loading concentration for the MiSeq flow cell.
Sequencing Run: Initiate the sequencing run on the MiSeq instrument. The instrument performs bridge amplification on the flow cell, followed by sequencing-by-synthesis, generating millions of reads.
Raw Data Generation: The MiSeq instrument generates raw sequencing data in BCL (Base Call) files.
BCL to FASTQ Conversion (Demultiplexing): Illumina’s instrument control software (or a separate utility like bcl2fastq/dragen) processes the BCL files. This step performs base calling which converts the fluorescent signals into base calls (A, T, C, G).
Quality Scoring: Assigns a quality score (Phred score) to each base call.
Demultiplexing: Reads the unique index sequences for each cluster and separates the reads originating from different samples into individual files.
FASTQ File Output: The final output of the sequencing run for each sample consists of FASTQ files. Each FASTQ file contains the raw sequence reads and their corresponding quality scores. Typically, you’ll get two FASTQ files per sample (forward and reverse reads for paired-end sequencing).
@M03461:328:000000000-JLTJF:1:1101:8238:1893 1:N:0:38
TACGGGAGGCAGCAGTGGGGAATATTTGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCATGTGTGAAGAAGGCCTTTGGGTTGTAAAGCTCTTTTAGTTAGGAAGATAAT
+
AACCCE7@CFGGEDGG<8,+--C,CF,;,@<-C9--@+++++;;E,,CCE9,6@,C8<C@B+B,,,<,,,,,,,,69EEF9,,BB+B,,6,6:CEFFF9,C,,,4,,,,C,5B
library(ggplot2)
# 1. Define a range of probabilities of incorrect base calls (P)
# We'll use a sequence from a very small number (close to 0) to 1.
# Using a log-scale for P to get a good distribution for Phred scores.
p_values <- c(10^seq(-4, 0, length.out = 100)) # From 0.0001 to 1
# 2. Calculate the corresponding Phred scores (Q)
# For P=0, Q would theoretically be infinity, but practically Phred scores are finite.
q_values <- -10 * log10(p_values)
# 3. Calculate the corresponding probabilities of correct base calls (1-P)
p_correct <- 1 - p_values
# 4. Create a data frame with these values
phred_data <- data.frame(
p_error = p_values,
p_correct = p_correct,
phred_score = q_values
)
# 5. Use ggplot2 to plot Phred score against the probability of incorrect base call
ggplot(phred_data, aes(x = p_error, y = phred_score)) +
# Add a line plot
geom_line(color = "steelblue", size = 1.2) +
labs(
title = "Dependency of Phred Score on Probability of Error",
x = "Probability of Incorrect Base Cal",
y = "Phred Score"
) +
scale_x_continuous(
limits = c(0, 1),
breaks = c(0, 0.1, 0.2, 0.4, 0.6, 0.8, 1.0),
labels = scales::percent_format(accuracy = 0.01) # Format x-axis as percentages
) +
scale_y_continuous(
limits = c(0, 40), # Common range for Phred scores
breaks = seq(0, 40, by = 5)
) +
theme_minimal() + # Use a minimal theme for a clean look
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 16),
axis.title.x = element_text(size = 12, margin = margin(t = 10)),
axis.title.y = element_text(size = 12, margin = margin(r = 10)),
axis.text = element_text(size = 10),
panel.grid.major = element_line(color = "grey90", linetype = "dashed"),
panel.grid.minor = element_blank() # Remove minor grid lines
) +
# Add annotations for key Phred scores, now based on P (probability of error)
annotate("text", x = 0.1, y = 10, label = "Q=10 (P=0.1)", hjust = 0, vjust = -0.5, size = 3, color = "darkgreen") +
annotate("text", x = 0.01, y = 20, label = "Q=20 (P=0.01)", hjust = 0, vjust = -0.5, size = 3, color = "darkgreen") +
annotate("text", x = 0.001, y = 30, label = "Q=30 (P=0.001)", hjust = 0, vjust = -0.5, size = 3, color = "darkgreen") +
annotate("text", x = 0.0001, y = 40, label = "Q=40 (P=0.0001)", hjust = 0, vjust = -0.5, size = 3, color = "darkgreen")
ascii_data <- data.frame(
Decimal = 0:127,
Character = sapply(0:127, function(i) {
char_val <- if (i >= 0 && i <= 31) {
# Common control characters
# Using switch for readability for control characters
switch(i + 1, # R indexing starts at 1
"NUL", "SOH", "STX", "ETX", "EOT", "ENQ", "ACK", "BEL",
"BS", "HT", "LF", "VT", "FF", "CR", "SO", "SI",
"DLE", "DC1", "DC2", "DC3", "DC4", "NAK", "SYN", "ETB",
"CAN", "EM", "SUB", "ESC", "FS", "GS", "RS", "US"
)
} else if (i == 127) {
"DEL" # Delete character
} else {
# Printable ASCII characters
intToUtf8(i)
}
# Pad all characters to a consistent width (e.g., 5 characters) for alignment
# This ensures columns line up nicely when printed side-by-side
sprintf("%-5s", char_val)
}),
stringsAsFactors = FALSE # Prevent automatic conversion to factors
)
section_size <- 32
# split the data into 4 sections so we can show side by side
df_list <- lapply(1:4, function(k) {
start_row <- (k - 1) * section_size + 1
end_row <- k * section_size
section_df <- ascii_data[start_row:end_row, ]
return(section_df)
})
# Combine the sections side-by-side into a single wide data frame
combined_df <- do.call(cbind, df_list)
knitr::kable(combined_df, row.names = FALSE)
Decimal | Character | Decimal | Character | Decimal | Character | Decimal | Character |
---|---|---|---|---|---|---|---|
0 | NUL | 32 | 64 | @ | 96 | ` | |
1 | SOH | 33 | ! | 65 | A | 97 | a |
2 | STX | 34 | ” | 66 | B | 98 | b |
3 | ETX | 35 | # | 67 | C | 99 | c |
4 | EOT | 36 | $ | 68 | D | 100 | d |
5 | ENQ | 37 | % | 69 | E | 101 | e |
6 | ACK | 38 | & | 70 | F | 102 | f |
7 | BEL | 39 | ’ | 71 | G | 103 | g |
8 | BS | 40 | ( | 72 | H | 104 | h |
9 | HT | 41 | ) | 73 | I | 105 | i |
10 | LF | 42 | * | 74 | J | 106 | j |
11 | VT | 43 | + | 75 | K | 107 | k |
12 | FF | 44 | , | 76 | L | 108 | l |
13 | CR | 45 | - | 77 | M | 109 | m |
14 | SO | 46 | . | 78 | N | 110 | n |
15 | SI | 47 | / | 79 | O | 111 | o |
16 | DLE | 48 | 0 | 80 | P | 112 | p |
17 | DC1 | 49 | 1 | 81 | Q | 113 | q |
18 | DC2 | 50 | 2 | 82 | R | 114 | r |
19 | DC3 | 51 | 3 | 83 | S | 115 | s |
20 | DC4 | 52 | 4 | 84 | T | 116 | t |
21 | NAK | 53 | 5 | 85 | U | 117 | u |
22 | SYN | 54 | 6 | 86 | V | 118 | v |
23 | ETB | 55 | 7 | 87 | W | 119 | w |
24 | CAN | 56 | 8 | 88 | X | 120 | x |
25 | EM | 57 | 9 | 89 | Y | 121 | y |
26 | SUB | 58 | : | 90 | Z | 122 | z |
27 | ESC | 59 | ; | 91 | [ | 123 | { |
28 | FS | 60 | < | 92 | \ | 124 | | |
29 | GS | 61 | = | 93 | ] | 125 | } |
30 | RS | 62 | > | 94 | ^ | 126 | ~ |
31 | US | 63 | ? | 95 | _ | 127 | DEL |
In microbial community analysis using amplicon sequencing (like 16S rRNA gene sequencing), there are two primary ways to categorize and count the different “types” of microbes present in a sample:
In this workshop we will be using ASV estimation as implemented in
the dada2
(Callahan et al. 2016)
R package.
Why dada2?
We will be working with a subset of the PRJNA616141 BioProject Containing sequences of two Drosophila species: D. melanogaster and D. subobscura subject to lead poisoning.
On Windows:
On linux:
IDE:
library(tidyverse)
library(dada2)
library(phyloseq)
library(phangorn)
library(ggrepel)
library(DECIPHER)
How to install?
cran_packages <- c("tidyverse", "vegan", "cowplot", "ggrepel", "BiocManager")
install.packages(cran_packages )
bioconductor_packages <- c("dada2", "phyloseq", "DESeq2", "DECIPHER")
BiocManager::install(bioconductor_packages)
required non R software:
cutadapt
. Installation instructions: https://cutadapt.readthedocs.io/en/stable/installation.htmlreads.in <- list.files("fastq/raw", full.names = TRUE)
fwd.reads.in <- reads.in[grepl("_1.fq.gz", reads.in)]
rev.reads.in <- reads.in[grepl("_2.fq.gz", reads.in)]
fwd_qual <- plotQualityProfile(fwd.reads.in)
rev_qual <- plotQualityProfile(rev.reads.in)
plot_grid(fwd_qual + facet_wrap(~file, ncol = 4),
rev_qual + facet_wrap(~file, ncol = 4),
ncol = 1)
cutadapt
tool)Why remove primers?
How to remove primers?
cutadapt
fwd_primer <- "TACGGGAGGCAGCAG"
rev_primer <- "CCAGGGTATCTAATCC"
fwd_fastq_file <- fwd.reads.in[1]
rev_fastq_file <- rev.reads.in[1]
reads_out_folder = "fastq/cutadapt"
cutadapt_call <- cutadapt_primers <- paste("-a ^",
fwd_primer,
"...",
as.character(Biostrings::reverseComplement(Biostrings::DNAString(rev_primer))),
" -A ^",
rev_primer,
"...",
as.character(Biostrings::reverseComplement(Biostrings::DNAString(fwd_primer))), sep = "")
fwd_cutadapt_out <- file.path(reads_out_folder, basename(fwd.read))
rev_cutadapt_out <- file.path(reads_out_folder, basename(rev.read))
cutadapt_call <- paste("cutadapt ",
cutadapt_primers,
" --discard-untrimmed -o ",
fwd_cutadapt_out,
" -p ",
rev_cutadapt_out,
" ",
fwd.read,
" ",
rev.read,
sep = "")
cutadapt_call
#^TACGGGAGGCAGCAG...GGATTAGATACCCTGG: linked adapter
#-a: sequence of a 5’ adapter
#^: anchored - expected to occur in full length at the beginning of the read
#...: any number of bases
#left adapter is anchored and required, right adapter is not anchored ($) so its is optional, will be removed if found
#--discard-untrimmed: if adapter not found remove read
Utility function to call cutadapt
from R
(cutadapt
should be visible by R - on system path):
cutadapt <- function(fwd.reads.in,
rev.reads.in,
reads.out,
fwd_primer,
rev_primer){
if(missing(fwd.reads.in)){
stop("fwd.reads.in must be specified")
}
if(missing(rev.reads.in)){
stop("rev.reads.in must be specified")
}
if(missing(reads.out)){
stop("reads.out must be specified")
}
if(missing(fwd_primer)){
stop("fwd_primer must be specified")
}
if(missing(rev_primer)){
stop("rev_primer must be specified")
}
cutadapt_primers <- paste("-a ^",
fwd_primer,
"...",
as.character(Biostrings::reverseComplement(Biostrings::DNAString(rev_primer))),
" -A ^",
rev_primer,
"...",
as.character(Biostrings::reverseComplement(Biostrings::DNAString(fwd_primer))), sep = "")
dir.create(reads.out)
for(i in seq_along(fwd.reads.in)){
fwd.read <- fwd.reads.in[i]
rev.read <- rev.reads.in[i]
fwd.cutadapt <- file.path(reads.out, basename(fwd.read))
rev.cutadapt <- file.path(reads.out, basename(rev.read))
cutadapt_call <- paste("cutadapt ",
cutadapt_primers,
" --discard-untrimmed -o ",
fwd.cutadapt,
" -p ",
rev.cutadapt,
" ",
fwd.read,
" ",
rev.read,
sep = ""
)
message(cutadapt_call)
cut_primers <- system(cutadapt_call, intern = TRUE)
fileConn <- file(paste(fwd.cutadapt, "_stats.txt", sep = ""))
writeLines(cut_primers, fileConn)
close(fileConn)
}
}
# usage:
cutadapt(fwd.reads.in = fwd.reads.in,
rev.reads.in = rev.reads.in ,
reads.out = "fastq/cutadapt",
fwd_primer = "TACGGGAGGCAGCAG",
rev_primer = "CCAGGGTATCTAATCC")
filterAndTrim()
function)reads.in <- list.files("fastq/cutadapt", full.names = TRUE)
fnFs_trim <- reads.in[grepl("_1.fq.gz$", reads.in)]
fnRs_trim <- reads.in[grepl("_2.fq.gz$", reads.in)]
filtFs <- file.path("fastq", "filtered", basename(fnFs_trim))
filtRs <- file.path("fastq", "filtered", basename(fnRs_trim))
filter_and_trim <- filterAndTrim(fnFs_trim,
filtFs,
fnRs_trim,
filtRs,
truncLen = c(270, 200),
maxN = 0,
maxEE = c(2, 2), #After truncation, reads with higher than maxEE "expected errors" will be discarded. Expected errors are calculated from the nominal definition of the quality score: EE = sum(10^(-Q/10))
minLen = 150,
rm.phix = TRUE, #discard reads that match against the phiX genome (internal control for sequencing)
compress = TRUE,
multithread = FALSE,
matchIDs = TRUE)
If you do not want to use cutadapt but rather to left trim the sequences to remove primers:
reads.in <- list.files("fastq/raw", full.names = TRUE)
fnFs_trim <- reads.in[grepl("_1.fq.gz$", reads.in)]
fnRs_trim <- reads.in[grepl("_2.fq.gz$", reads.in)]
filtFs <- file.path("fastq", "filtered", basename(fnFs_trim))
filtRs <- file.path("fastq", "filtered", basename(fnRs_trim))
out <- filterAndTrim(fnFs_trim,
filtFs,
fnRs_trim,
filtRs,
trimLeft = 20,
truncLen = c(270, 200),
maxN = 0,
maxEE = c(2, 2), #
minLen = 150,
rm.phix = TRUE,
compress = TRUE,
multithread = FALSE,
matchIDs = TRUE)
Inspect reads after filtering:
fwd_qual <- plotQualityProfile(filtFs)
rev_qual <- plotQualityProfile(filtRs)
plot_grid(fwd_qual + facet_wrap(~file, ncol = 4),
rev_qual + facet_wrap(~file, ncol = 4),
ncol = 1)
learnErrors()
function) and denoising (The
dada()
function) with selfConsist = TRUE
Components:
DADA2 learns the specific error rates (e.g., A→T, G→C errors) from your own sequencing data using a parametric error model, which is more accurate than using a universal error model.
The error model is a set of nucleotide-transition probability tables. Specifically, for each possible base transition (e.g., A to C, G to T, C to A, etc.), and for each Phred quality score, DADA2 estimates the probability of that error occurring.
The error model assumes that errors at different nucleotide positions are independent, and that the probability of a specific base transition at a given quality score is generally independent of the surrounding sequence context.
The Phred quality score (\(Q\)) is a crucial component, as it represents the estimated probability of an incorrect base call (\(P\)):
\[ Q = -10 \log_{10}(P) \]
Conversely, the probability of an error (\(P\)) can be derived from the quality score (\(Q\)):
\[ P = 10^{-Q/10} \]
DADA2 utilizes these quality scores to estimate the probability of specific nucleotide transitions. We define this as \(p(B_{actual} \to B_{observed} | Q)\), the probability that a true base \(B_{actual}\) is sequenced as \(B_{observed}\) given a Phred quality score \(Q\).
DADA2 constructs an error rate matrix, or set of matrices \(M\). For specific bases and quality scores, \(M_{B_{actual}, B_{observed}, Q}\) represents the estimated probability (or rate) of a true base \(B_{actual}\) being miscalled as \(B_{observed}\) when the sequencer reports quality score \(Q\).
The observed error rates are often noisy. To obtain a generalized model, DADA2 uses LOESS (Locally Estimated Scatterplot Smoothing) regression. LOESS is applied to fit a smooth curve for each possible base transition (e.g., A \(\to\) C, A \(\to\) G, etc.) as a function of the quality score. This smooth curve represents the refined estimated probability \(p(B_{actual} \to B_{observed} | Q)\).
The probability that a specific observed read \(i\) was generated from a true biological sequence \(j\) is modeled as the product of the position-specific transition probabilities. This probability, denoted \(\lambda_{j \to i}\) assumes that errors at different nucleotide positions are independent.
\[ \lambda_{j \to i} = \prod_{l=1}^{L} p(j(l) \to i(l) | q_i(l)) \]
Where:
\(\lambda_{j \to i}\) is the overall per-read probability that true sequence \(j\) would produce observed sequence \(i\).
\(L\) is the length of the aligned sequences.
\(j(l)\) is the nucleotide at position \(l\) in the true sequence \(j\)
\(i(l)\) is the nucleotide at position \(l\) in the observed read \(i\).
\(q_i(l)\) is the Phred quality score of the base at position \(l\) in read \(i\).
\(p(j(l) \to i(l) | q_i(l)))\) is the specific probability of a true nucleotide \(j(l)\) being read as \(i(l)\), given the quality score \(q_i(l)\) at that position.
The formula multiplies (\(\prod\)) the probabilities for each nucleotide position \(l\) up to the aligned sequence length \(L\).
Expected Number of Error Reads
This calculates the expected number of times you’d see read \(i\) if it were purely the result of errors from the more abundant sequence \(j\).
\[ \mathbf{\Lambda_{ji}} = n_j \cdot \lambda_{j \to i} \] Where:
\(\mathbf{\Lambda_{ji}}\) is the expected count of error reads of sequence \(i\) generated from sequence \(j\).
\(n_j\) is the total number of reads (abundance) for the true sequence \(j\).
\(\lambda_{j \to i}\) is the per-read error probability calculated in the first formula.
To decide if a unique sequence is a true variant or just an error, DADA2 calculates an abundance p-value. It quantifies the likelihood that the observed abundance of unique sequence \(i\) (\(a_i\)) is too high to be explained purely by sequencing errors from a higher-abundance sequence \(j\). It is defined as:
\[ p_A(j \to i) = \frac{P(X \ge a_i \mid X \sim \text{Poisson}(\mathbf{\Lambda_{ji}}))}{1 - P(X=0 \mid X \sim \text{Poisson}(\mathbf{\Lambda_{ji}}))} \] Where:
\(a_i\): The observed abundance of unique sequence \(i\).
\(X\): A random variable representing the number of times sequence \(i\) would be observed if it were purely an error from sequence \(j\).
\(\text{Poisson}(\mathbf{\Lambda_{ji}})\): A Poisson distribution with an expected value equal to \(\mathbf{\Lambda_{ji}}\).
The numerator \(P(X \ge a_i \mid X \sim \text{Poisson}(\mathbf{\Lambda_{ji}}))\) is the probability of observing \(a_i\) or more reads of \(X\).
The denominator \(1 - P(X=0 \mid X \sim \text{Poisson}(\mathbf{\Lambda_{ji}}))\) is the probability of observing at least one read of \(X\), conditioning the p-value on its observation.
The p-value (represented by the sum in the numerator) calculates the probability of observing \(k\) reads, from the actual observed abundance \(a_i\) up to infinity. A low p-value suggests the sequence is a true variant (ASV).
If this p-value is smaller than a defined threshold (\(\Omega_A\)), the null hypothesis is rejected, and the sequence \(i\) is considered a true Amplicon Sequence Variant (ASV).
To sum up:
We have some probability of getting an observed sequence from a true sequence just due to sequencing errors If the amount of the observed sequence is higher than we would expect just based on the errors we declare it as a true sequence.
But how to know which are the true sequences from the pool of sequenced data in order to calculate the above?
The self consistent partitioning model
The selfConsist
argument enables the simultaneous
inference of the error model and the true sample sequences. When
selfConsist
= TRUE, DADA2 performs an iterative
process:
Initialization: The algorithm starts with an initial guess for the error rates. This is typically a “maximum possible” error rate matrix derived from the data, assuming all reads are errors apart from a single, most abundant sequence.
Sample Inference (Partitioning): Using the current error model, the algorithm partitions the dereplicated sequences into groups that are likely to have originated from the same true sequence (later we go into more details).
Error Rate Re-estimation: A new error model is estimated from the partitioned sequences. This is done by comparing the observed sequences to the inferred “true” sequences within each partition.
Convergence Check: The algorithm checks if the error rates have converged. If not, it repeats steps 2 and 3 with the updated error model. This iterative process continues until the error rates stabilize, resulting in a jointly consistent solution for the error model and the inferred sample composition.
The Partitioning Process
Dereplicate sequences to generate unique sequence + abundances.
Start with one partition and declare the most abundant sequence a true ASV.
Bud: Searches through all sequences in all current partitions to find the single sequence that is the best candidate to become a new ASV. This “best candidate” is the one with the lowest abundance p-value (after passing other checks like minimum abundance and fold-change). If this lowest p-value is significant (i.e., less than the threshold \(\Omega_A\)after correction), that sequence “buds” off to form a new partition/cluster.
Shuffle: After a new ASV is found (or even after an iteration where no new ASV is found), re-evaluate the partition assignment for every non-center sequence. Each sequence is moved to the partition/cluster for which its is most likely to originate from ensuring all sequences are in their most likely partition based on the current set of identified ASVs.
For any given unique sequence (\(seq_i\)), the algorithm compares it against every established ASV partition (\(ASV_j\), \(ASV_k\), etc.). For each comparison, it calculates the expected number of times \(seq_i\) would appear as an error from that ASV.
\[ E_{ji} = n_j \cdot \lambda_{ji} \]
Where:
Continue this “Bud & Shuffle” loop until no more sequences can be significantly separated into new ASVs.
learnErrors()
function)errF <- learnErrors(filtFs,
multithread = TRUE,
nbases = 1e+10,
MAX_CONSIST = 20,
verbose = TRUE)
dada2::plotErrors(errF)
errR <- learnErrors(filtRs,
multithread = TRUE,
nbases = 1e+10,
MAX_CONSIST = 20,
verbose = TRUE)
dada2::plotErrors(errR)
derepFastq()
function)DADA2 groups identical reads together. Instead of processing millions of individual reads, it works with a much smaller set of unique sequences and a count of how many times each sequence was observed. This significantly reduces computation time.
derepFs <- derepFastq(filtFs, verbose = TRUE)
derepRs <- derepFastq(filtRs, verbose = TRUE)
dada()
function)dadaFs <- dada(derepFs,
err = errF,
multithread = TRUE,
selfConsist = TRUE)
dadaRs <- dada(derepRs,
err = errR,
multithread = TRUE,
selfConsist = TRUE)
The selfConsist = TRUE
iterative process, performed
independently for each sample (or on pooled sequences), allows DADA2 to
robustly infer the true ASVs present and their abundances, effectively
stripping away sequencing noise. The reliance on the learned error model
and statistical inference makes it highly sensitive to single-nucleotide
differences while being robust against typical sequencing errors.
mergePairs()
)Merge the dereplicated sequences rejecting with a sufficient overlap
(minOverlap
). The overlap needs to be perfect by default
(maxMismatch
). trimOverhang
is recommended if
not used cutadapt
in paired end mode.
mergers <- mergePairs(dadaFs,
derepFs,
dadaRs,
derepRs,
verbose = TRUE,
minOverlap = 20)
makeSequenceTable()
)Result is a matrix of sequences as rows and samples in columns, each
cell is a count. collapseNoMismatch
will collapse sequences
that are identical up to shifts or length variation. The representative
will the most abundant sequence.
seqtab <- makeSequenceTable(mergers)
seqtab_collapse <- collapseNoMismatch(seqtab)
removeBimeraDenovo()
)A bimera is a two-parent chimera, in which the left side is made up of one parent sequence, and the right-side made up of a second parent sequence.
For a given sequence being tested (the “query”) in a specific sample:
minFoldParentOverAbundance
and
minParentAbundance
parameters (isBimeraDenovo
function - can be passed from removeBimeraDenovo
).max_left
).max_right
).(max_left + max_right >= query_sequence_length)
.seqtab_nochim <- removeBimeraDenovo(seqtab_collapse,
multithread = FALSE,
verbose = TRUE)
t(seqtab_nochim) %>%
as.data.frame() %>%
rownames_to_column("seq") %>%
mutate(length = nchar(seq),
reads = rowSums(.[2:ncol(.)])) %>%
group_by(length) %>%
summarise(reads = sum(reads)) %>%
ggplot()+
geom_line(aes(x = length, y = reads)) +
theme_classic() +
geom_vline(xintercept = 400, color = "grey50", lty = 2)+
geom_vline(xintercept = 428, color = "grey50", lty = 2)
V3V4 has a bimodal length distribution..
Removal of unexpectedly short sequences:
seqtab_length <- seqtab_nochim[,nchar(colnames(seqtab_nochim)) %in% seq(400,428)]
Track of sequences:
input_count <- ShortRead::countFastq("fastq/raw")
input_count <- input_count %>%
rownames_to_column("sample") %>%
filter(grepl("_1\\.fq\\.gz$", sample))
cutadapt_count <- ShortRead::countFastq("fastq/cutadapt")
cutadapt_count <- cutadapt_count %>%
rownames_to_column("sample") %>%
filter(grepl("_1\\.fq\\.gz$", sample))
filtered_count <- ShortRead::countFastq("fastq/filtered")
filtered_count <- filtered_count %>%
rownames_to_column("sample") %>%
filter(grepl("_1\\.fq\\.gz$", sample))
##helper function
getN <- function(x) sum(getUniques(x))
track <- data.frame(sample = input_count$sample,
input = input_count$records,
primers = cutadapt_count$records,
filtered = filtered_count$records,
denoisedF = sapply(dadaFs, getN),
denoisedR = sapply(dadaRs, getN),
merged = sapply(mergers, getN),
chimera_remove = rowSums(seqtab_nochim),
length_trim = rowSums(seqtab_length))
rownames(track) = NULL
track <- track %>%
mutate(percent_final = length_trim/input * 100)
knitr::kable(track, digits = 2)
sample | input | primers | filtered | denoisedF | denoisedR | merged | chimera_remove | length_trim | percent_final |
---|---|---|---|---|---|---|---|---|---|
Dmel_K_St_M_1.fq.gz | 127453 | 124153 | 115864 | 115824 | 115832 | 115634 | 115634 | 115629 | 90.72 |
Dmel_Sl_St_F_1.fq.gz | 102775 | 99978 | 92205 | 92172 | 92175 | 92050 | 92050 | 92050 | 89.56 |
Dmel_Sl_St_M_1.fq.gz | 123931 | 120552 | 112549 | 112528 | 112535 | 112354 | 112354 | 112354 | 90.66 |
Dmel_St_K_1.fq.gz | 92464 | 90110 | 83373 | 83356 | 83359 | 83339 | 83283 | 83283 | 90.07 |
Dmel_St_Sl_1.fq.gz | 148858 | 145068 | 134031 | 133996 | 134016 | 133952 | 133862 | 133862 | 89.93 |
Dsub_K_St_F_1.fq.gz | 50278 | 48864 | 44627 | 44622 | 44605 | 44580 | 44478 | 44478 | 88.46 |
Dsub_K_St_M_1.fq.gz | 69370 | 67390 | 61467 | 61459 | 61462 | 61445 | 61421 | 61421 | 88.54 |
Dsub_Sl_St_F_1.fq.gz | 144672 | 140609 | 129372 | 129363 | 129370 | 129145 | 124535 | 124535 | 86.08 |
Dsub_Sl_St_M_1.fq.gz | 98992 | 96208 | 88100 | 88098 | 88091 | 87832 | 80888 | 80888 | 81.71 |
Dsub_St_K_1.fq.gz | 119238 | 116060 | 106907 | 106897 | 106897 | 106855 | 106849 | 106849 | 89.61 |
Dsub_St_Sl_1.fq.gz | 91187 | 88741 | 81798 | 81772 | 81774 | 81730 | 81712 | 81712 | 89.61 |
Taxonomic assignment is the process of putting a biological name on the inferred ASVs. After denoising, an ASV is just a DNA sequence; assigning it a taxonomy (e.g., Lactobacillus) gives it biological meaning, which is essential for downstream analysis and interpretation. Two common approaches are used to accomplish this:
IdTaxa
algorithmLinks to reference data.
Download the training sequences:
# download and save training sequences
gtdb_genus <- "https://zenodo.org/records/13984843/files/GTDB_bac120_arc53_ssu_r220_genus.fa.gz"
gtdb_species <- "https://zenodo.org/records/13984843/files/GTDB_bac120_arc53_ssu_r220_species.fa.gz"
gtdb_genus_train_set <- Biostrings::readDNAStringSet(gtdb_genus)
gtdb_species_train_set <- Biostrings::readDNAStringSet(gtdb_species)
if (!dir.exists("gtdb")) dir.create("gtdb")
Biostrings::writeXStringSet(gtdb_genus_train_set, "gtdb/gtdb_genus_train_set.fasta", format="fasta")
Biostrings::writeXStringSet(gtdb_species_train_set, "gtdb/gtdb_species_train_set.fasta", format="fasta")
Optional! Trim the sequences based on used primers:
forward_primer <- "TACGGGAGGCAGCAG"
reverse_primer <- "CCAGGGTATCTAATCC"
reverse_primer_rc <- as.character(Biostrings::reverseComplement(Biostrings::DNAString(reverse_primer)))
input_fasta_file <- "gtdb/gtdb_genus_train_set.fasta"
output_fasta_file <- "gtdb/gtdb_genus_trimmed_train_set.fasta"
cutadapt_command <- paste(
"cutadapt",
"-a", paste(forward_primer, "...", reverse_primer_rc, sep = ""),
"--discard-untrimmed",
" --action trim",
"-o", output_fasta_file,
input_fasta_file
)
cutadapt_output <- system(cutadapt_command, intern = TRUE)
fileConn <- file(paste(output_fasta_file, "_stats.txt", sep = ""))
writeLines(cutadapt_output , fileConn)
close(fileConn)
train_set <- Biostrings::readDNAStringSet( "gtdb/gtdb_genus_trimmed_train_set.fasta")
hist(Biostrings::width(train_set))
Optional! Remove too short or too long sequences
final_train_set <- train_set[Biostrings::width(train_set) > 350 & Biostrings::width(train_set) < 450]
length(final_train_set)/length(train_set)
## [1] 0.7652882
Train:
tax_labels <- paste("Root", names(final_train_set), sep = ";")
set.seed(123)
gtdb_idtaxa <- DECIPHER::LearnTaxa(final_train_set,
tax_labels)
## ================================================================================
##
## Time difference of 884.75 secs
Optional! Save model for reuse in later sessions:
saveRDS(gtdb_idtaxa, "gtdb/gtdb_idtaxa.rds")
Optional! Load model:
gtdb_idtaxa <- readRDS("gtdb/gtdb_idtaxa.rds")
Infer taxonomy:
set.seed(123)
gtdb_tax_seqtab_length <- DECIPHER::IdTaxa(Biostrings::DNAStringSet(colnames(seqtab_length)),
gtdb_idtaxa)
## ================================================================================
##
## Time difference of 21.68 secs
Cleanup:
lapply(gtdb_tax_seqtab_length, function(x) c(x$taxon, rep("", 8 - length(x$taxon)))) %>%
do.call(rbind, .) %>%
as.data.frame() %>%
select(-V1, -V8) %>%
dplyr::rename(Kingdom = V2,
Phylum = V3,
Class = V4,
Order = V5,
Family = V6,
Genus = V7) %>%
mutate_all(~ifelse(. == "", NA, .)) %>%
mutate_all(~ifelse(grepl("^unclassified_", .), NA, .)) -> gtdb_tax_seqtab_length
gtdb_tax_seqtab_length %>%
knitr::kable(format = "html") %>%
kableExtra::kable_styling() %>%
kableExtra::scroll_box(height = "400px")
Kingdom | Phylum | Class | Order | Family | Genus |
---|---|---|---|---|---|
Bacteria | Pseudomonadota | Alphaproteobacteria | Acetobacterales | Acetobacteraceae | Acetobacter |
Bacteria | Pseudomonadota | Alphaproteobacteria | Rickettsiales | Anaplasmataceae | Wolbachia |
Bacteria | Pseudomonadota | Alphaproteobacteria | Acetobacterales | Acetobacteraceae | NA |
Bacteria | Pseudomonadota | Alphaproteobacteria | Acetobacterales | Acetobacteraceae | Acetobacter |
Bacteria | Bacillota | Bacilli | Lactobacillales | Lactobacillaceae | Latilactobacillus |
Bacteria | Bacillota | Bacilli | Lactobacillales | Lactobacillaceae | Lactiplantibacillus |
Bacteria | Bacillota | Bacilli | Lactobacillales | Lactobacillaceae | Latilactobacillus |
Bacteria | Bacillota | Bacilli | Lactobacillales | Lactobacillaceae | Lactiplantibacillus |
Bacteria | Bacillota | Bacilli | Lactobacillales | Lactobacillaceae | Levilactobacillus |
Bacteria | Bacillota | Bacilli | Lactobacillales | Lactobacillaceae | Fructilactobacillus |
Bacteria | Bacillota | Bacilli | Lactobacillales | Lactobacillaceae | Furfurilactobacillus |
Bacteria | Bacillota | Bacilli | Lactobacillales | Lactobacillaceae | Levilactobacillus |
NA | NA | NA | NA | NA | NA |
Bacteria | Pseudomonadota | Alphaproteobacteria | Acetobacterales | Acetobacteraceae | Acetobacter |
Bacteria | Bacillota | Bacilli | Staphylococcales | Staphylococcaceae | Staphylococcus |
Bacteria | Bacteroidota | Bacteroidia | Chitinophagales | Chitinophagaceae | Sediminibacterium |
Bacteria | Pseudomonadota | Alphaproteobacteria | Rhizobiales | Xanthobacteraceae | NA |
Bacteria | Pseudomonadota | Alphaproteobacteria | Rickettsiales | Anaplasmataceae | Wolbachia |
Bacteria | Bacteroidota | Bacteroidia | Bacteroidales | Bacteroidaceae | Bacteroides |
Bacteria | Actinomycetota | Actinomycetes | Mycobacteriales | Nakamurellaceae | Nakamurella |
Bacteria | Bacillota_A | Clostridia | Lachnospirales | Lachnospiraceae | Lachnospira |
Bacteria | Pseudomonadota | Gammaproteobacteria | Burkholderiales | Burkholderiaceae | NA |
Bacteria | Bacteroidota | Bacteroidia | Bacteroidales | Bacteroidaceae | Prevotella |
Bacteria | NA | NA | NA | NA | NA |
Bacteria | Pseudomonadota | Gammaproteobacteria | Pseudomonadales | Moraxellaceae | Acinetobacter |
Bacteria | Pseudomonadota | Gammaproteobacteria | Xanthomonadales | Xanthomonadaceae | Thermomonas |
Bacteria | Pseudomonadota | Alphaproteobacteria | Rhizobiales | Beijerinckiaceae | Methylobacterium |
Bacteria | Pseudomonadota | Gammaproteobacteria | Enterobacterales | Enterobacteriaceae | NA |
Bacteria | Bacillota_A | Clostridia | Tissierellales | Peptoniphilaceae | Finegoldia |
Bacteria | NA | NA | NA | NA | NA |
Bacteria | Pseudomonadota | Gammaproteobacteria | Pseudomonadales | Moraxellaceae | Moraxella_A |
Bacteria | Actinomycetota | Actinomycetes | Mycobacteriales | Mycobacteriaceae | Lawsonella |
Bacteria | Pseudomonadota | Alphaproteobacteria | Sphingomonadales | Sphingomonadaceae | NA |
Bacteria | Pseudomonadota | Alphaproteobacteria | Sphingomonadales | Sphingomonadaceae | Sphingomonas |
Bacteria | Pseudomonadota | Gammaproteobacteria | Pseudomonadales | Pseudomonadaceae | NA |
Bacteria | Pseudomonadota | Gammaproteobacteria | Xanthomonadales | Xanthomonadaceae | Stenotrophomonas |
Bacteria | Actinomycetota | Actinomycetes | Mycobacteriales | Mycobacteriaceae | Corynebacterium |
Bacteria | Pseudomonadota | Gammaproteobacteria | Pseudomonadales | Pseudomonadaceae | NA |
Bacteria | NA | NA | NA | NA | NA |
Bacteria | NA | NA | NA | NA | NA |
Bacteria | NA | NA | NA | NA | NA |
Bacteria | Actinomycetota | Thermoleophilia | Solirubrobacterales | Solirubrobacteraceae | JAGIBJ01 |
Bacteria | Actinomycetota | Acidimicrobiia | Acidimicrobiales | NA | NA |
Bacteria | Actinomycetota | Actinomycetes | Actinomycetales | Micrococcaceae | Rothia |
Bacteria | Pseudomonadota | Gammaproteobacteria | Burkholderiales | Burkholderiaceae | Comamonas |
Bacteria | Bacillota_A | Clostridia | Peptostreptococcales | Peptostreptococcaceae | NA |
Bacteria | Bacillota | Bacilli | Bacillales | Anoxybacillaceae | Anoxybacillus_A |
Bacteria | Actinomycetota | Actinomycetes | Actinomycetales | Actinomycetaceae | Varibaculum |
Bacteria | Pseudomonadota | Gammaproteobacteria | Burkholderiales | Burkholderiaceae | Pelomonas |
Bacteria | Pseudomonadota | Gammaproteobacteria | Pseudomonadales | Moraxellaceae | Psychrobacter |
Bacteria | Pseudomonadota | Gammaproteobacteria | Burkholderiales | Burkholderiaceae | NA |
Bacteria | Deinococcota | Deinococci | Deinococcales | Deinococcaceae | Deinococcus |
Bacteria | NA | NA | NA | NA | NA |
Bacteria | Pseudomonadota | Gammaproteobacteria | Enterobacterales | Enterobacteriaceae | NA |
Bacteria | Actinomycetota | Actinomycetes | Mycobacteriales | Mycobacteriaceae | Corynebacterium |
Bacteria | Pseudomonadota | NA | NA | NA | NA |
Bacteria | Pseudomonadota | Gammaproteobacteria | Burkholderiales | Burkholderiaceae | Caldimonas |
Bacteria | Bacteroidota | Bacteroidia | Cytophagales | Hymenobacteraceae | Hymenobacter |
Species assignment:
taxa <- as.matrix(gtdb_tax_seqtab_length)
rownames(taxa) <- colnames(seqtab_length)
taxa <- dada2::addSpecies(taxa,
"gtdb/gtdb_species_train_set.fasta",
allowMultiple = TRUE)
taxa %>%
knitr::kable(format = "html") %>%
kableExtra::kable_styling() %>%
kableExtra::scroll_box(height = "400px")
Kingdom | Phylum | Class | Order | Family | Genus | Species | |
---|---|---|---|---|---|---|---|
TGGGGAATATTGGACAATGGGGGCAACCCTGATCCAGCAATGCCGCGTGTGTGAAGAAGGTCTTCGGATTGTAAAGCACTTTCGACGGGGACGATGATGACGGTACCCGTAGAAGAAGCCCCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGGGCTAGCGTTGCTCGGAATGACTGGGCGTAAAGGGCGTGTAGGCGGTTTGTACAGTCAGATGTGAAATCCCCGGGCTTAACCTGGGAGCTGCATTTGATACGTGCAGACTAGAGTGTGAGAGAGGGTTGTGGAATTCCCAGTGTAGAGGTGAAATTCGTAGATATTGGGAAGAACACCGGTGGCGAAGGCGGCAACCTGGCTCATTACTGACGCTGAGGCGCGAAAGCGTGGGGAGCAAACA | Bacteria | Pseudomonadota | Alphaproteobacteria | Acetobacterales | Acetobacteraceae | Acetobacter | ascendens/ghanensis/oryzifermentans/oryzoeni/pasteurianus |
TGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCATGAGTGAAGAAGGCCTTTGGGTTGTAAAGCTCTTTTAGTGAGGAAGATAATGACGGTACTCACAGAAGAAGTCCTGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGAGGGCTAGCGTTATTCGGAATTATTGGGCGTAAAGGGCGCGTAGGCGGATTAGTAAGTTAAAAGTGAAATCCCAAGGCTCAACCTTGGAATTGCTTTTAAAACTGCTAATCTAGAGATTGAAAGAGGATAGAGGAATTCCTAGTGTAGAGGTGAAATTCGTAAATATTAGGAGGAACACCAGTGGCGAAGGCGTCTATCTGGTTCAAATCTGACGCTGAGGCGCGAAGGCGTGGGGAGCAAACA | Bacteria | Pseudomonadota | Alphaproteobacteria | Rickettsiales | Anaplasmataceae | Wolbachia | sp936270435 |
TGGGGAATATTGGACAATGGGCGCAAGCCTGATCCAGCAATGCCGCGTGTGTGAAGAAGGTTTTCGGATTGTAAAGCACTTTCAGCGGGGACGATGATGACGGTACCCGCAGAAGAAGCCCCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGGGCAAGCGTTGCTCGGAATGACTGGGCGTAAAGGGCGCGTAGGCGGTTGTTACAGTCAGATGTGAAATTCCCGGGCTTAACCTGGGGGCTGCATTTGATACGTGACGACTAGAGTGTGAGAGAGGGTTGTGGAATTCCCAGTGTAGAGGTGAAATTCGTAGATATTGGGAAGAACACCGGTGGCGAAGGCGGCAACCTGGCTCATGACTGACGCTGAGGCGCGAAAGCGTGGGGAGCAAACA | Bacteria | Pseudomonadota | Alphaproteobacteria | Acetobacterales | Acetobacteraceae | NA | NA |
TGGGGAATATTGGACAATGGGGGAAACCCTGATCCAGCAATGCCGCGTGTGTGAAGAAGGTTTTCGGATTGTAAAGCACTTTCGGCGGGGACGATGATGACGGTACCCGCAGAAGAAGCCCCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGGGCTAGCGTTGCTCGGAATGACTGGGCGTAAAGGGCGTGTAGGCGGTTTGTACAGTCAGATGTGAAATCCCCGGGCTTAACCTGGGAGCTGCATTTGATACGTGCAGACTAGAGTATGAGAGAGGGTTGTGGAATTCTCAGTGTAGAGGTGAAATTCGTAGATATTGGGAAGAACACCGGTGGCGAAGGCGGCAACCTGGCTCATTACTGACGCTGAGGCGCGAAAGCGTGGGGAGCAAACA | Bacteria | Pseudomonadota | Alphaproteobacteria | Acetobacterales | Acetobacteraceae | Acetobacter | NA |
TAGGGAATCTTCCACAATGGACGAAAGTCTGATGGAGCAACGCCGCGTGAGTGAAGAAGGTTTTCGGATCGTAAAACTCTGTTGTTGGAGAAGAACGTATTTGATAGTAACTGATCAGGTAGTGACGGTATCCAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGCGCAGGCGGTTTCTTAAGTCTGATGTGAAAGCCTTCGGCTCAACCGAAGAAGTGCATCGGAAACTGGGAAACTTGAGTGCAGAAGAGGACAGTGGAACTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAAGAACACCAGTGGCGAAGGCGGCTGTCTGGTCTGTAACTGACGCTGAGGCTCGAAAGCATGGGTAGCAAACA | Bacteria | Bacillota | Bacilli | Lactobacillales | Lactobacillaceae | Latilactobacillus | curvatus |
TAGGGAATCTTCCACAATGGACGAAAGTCTGATGGAGCAACGCCGCGTGAGTGAAGAAGGGTTTCGGCTCGTAAAACTCTGTTGTTAAAGAAGAACATATCTGAGAGTAACTGTTCAGGTATTGACGGTATTTAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGCGCAGGCGGTTTTTTAAGTCTGATGTGAAAGCCTTCGGCTCAACCGAAGAAGTGCATCGGAAACTGGGAAACTTGAGTGCAGAAGAGGACAGTGGAACTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAAGAACACCAGTGGCGAAGGCGGCTGTCTGGTCTGTAACTGACGCTGAGGCTCGAAAGTATGGGTAGCAAACA | Bacteria | Bacillota | Bacilli | Lactobacillales | Lactobacillaceae | Lactiplantibacillus | fabifermentans/paraplantarum/pentosus/pingfangensis/plantarum/plantarum_A |
TAGGGAATCTTCCACAATGGACGAAAGTCTGATGGAGCAACGCCGCGTGAGTGAAGAAGGTTTTCGGATCGTAAAACTCTGTTGTTGGAGAAGAACGTATTTGATAGTAACTGATCAGGTAGTGACGGTATCCAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGCGCAGGCGGTTTCTTAAGTCTGATGTGAAAGCCTTCGGCTCAACCGAAGAAGTGCATCGGAAACTGGGGAACTTGAGTGCAGAAGAGGACAGTGGAACTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAAGAACACCAGTGGCGAAGGCGGCTGTCTGGTCTGTAACTGACGCTGAGGCTCGAAAGCATGGGTAGCAAACA | Bacteria | Bacillota | Bacilli | Lactobacillales | Lactobacillaceae | Latilactobacillus | NA |
TAGGGAATCTTCCACAATGGACGAAAGTCTGATGGAACAACGCCGCGTGAGTGAAGAAGGGTTTCGGCTCGTAAAACTCTGTTGTTAAAGAAGAACATATCTGAGAGTAACTGTTCAGGTATTGACGGTATTTAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGCGCAGGCGGTTTTTTAAGTCTGATGTGAAAGCCTTCGGCTCAACCGAAGAAGTGCATCGGAAACTGGGAAACTTGAGTGCAGAAGAGGACAGTGGAACTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAAGAACACCAGTGGCGAAGGCGGCTGTCTGGTCTGTAACTGACGCTGAGGCTCGAAAGTATGGGTAGCAAACA | Bacteria | Bacillota | Bacilli | Lactobacillales | Lactobacillaceae | Lactiplantibacillus | NA |
TAGGGAATCTTCCACAATGGACGAAAGTCTGATGGAGCAATGCCGCGTGAGTGAAGAAGGGTTTCGGCTCGTAAAACTCTGTTGTTAAAGAAGAACACCTTTGAGAGTAACTGTTCAAGGGTTGACGGTATTTAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGCGCAGGCGGTTTTTTAAGTCTGATGTGAAAGCCTTCGGCTTAACCGGAGAAGTGCATCGGAAACTGGGAGACTTGAGTGCAGAAGAGGACAGTGGAACTCCATGTGTAGCGGTGGAATGCGTAGATATATGGAAGAACACCAGTGGCGAAGGCGGCTGTCTAGTCTGTAACTGACGCTGAGGCTCGAAAGCATGGGTAGCGAACA | Bacteria | Bacillota | Bacilli | Lactobacillales | Lactobacillaceae | Levilactobacillus | brevis |
TAGGGAATCTTCCACAATGGACGAAAGTCTGATGGAGCAATGCCGCGTGAGTGAAGAAGGGTTTCGGCTCGTAAAACTCTGTTGTTAGAGAAGAACGGGCGTGAGAGTAACTGCTCACGTCGCGACGGTATCTAACCAGAAAGTCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGCGCAGGCGGTCTTCTAAGTCTGATGTGAAAGCCTTCGGCTTAACCGGAGAAGTGCATCGGAAACTGGGAAACTTGAGTGCAGAAGAGGACAGTGGAACTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAGGAACACCAGTGGCGAAGGCGGCTGTCTAGTCTGTAACTGACGCTGAGGCTCGAAAGCATGGGTAGCAAACA | Bacteria | Bacillota | Bacilli | Lactobacillales | Lactobacillaceae | Fructilactobacillus | fructivorans |
TAGGGAATCTTCCACAATGGGCGCAAGCCTGATGGAGCAACGCCGCGTGAGTGAAGAAGGGTTTCGGCTCGTAAAACTCTGTTGTTAAAGAAGAACGAATCTGAGAGTAACTGTTCAGATTGTGACGGTATTTAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTATCCGGATTTATTGGGCGTAAAGGGAGTGCAGGCGGTTACTTAAGTCTGATGTGAAAGCCTTCGGCTTAACCGGAGAAGGGCATCGGAAACTGGGCAACTTGAGTGCAGAAGAGGGTAGTGGAACTCCATGTGTAGCGGTGGAATGCGTAGATATATGGAAGAACACCAGTGGCGAAGGCGGCTACCTGGTCTGTAACTGACGCTGAGGCTCGAAAGCATGGGTAGCAAACA | Bacteria | Bacillota | Bacilli | Lactobacillales | Lactobacillaceae | Furfurilactobacillus | NA |
TAGGGAATCTTCCACAATGGACGAAAGTCTGATGGAGCAATGCCGCGTGAGTGAAGAAGGGTTTCGGCTCGTAAAACTCTGTTGTTAAAGAAGAACACCTTTGAGAGTAACTGTTCAAGGGTTGACGGTATTTAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGCGCAGGCGGTTTTTTAAGTCTGATGTGAAAGCCTTCGGCTTAACCGGAGAAGTGCATCGGAAACTGGGAGACTTGAGTGCAGAAGAGGACAGTGGAACTCCATGTGTAGCGGTGGAATGCGTAGATATATGGAAGAACACCAGTGGCGAAGGCGGCTGTCTAGTCTGTAACTGACGCTGAGGCTCGAAAACATGGGTAGCGAACA | Bacteria | Bacillota | Bacilli | Lactobacillales | Lactobacillaceae | Levilactobacillus | NA |
TGGGGAATCTTGGTCAATGGGCGCAAGCCTGAACCAGCGATGCCGCGTGAGTGATGAAGGCCTTCGGGTTGTAAAGCTCTGTGGGGAGGGACGAAAGAATCGGTTCTAACAGGACCGAAGTTGACGGTACCTCCTTAGCAAGCACCGGCTAACTCTGTGCCAGCAGCCGCGGTAAGACAGAGGGTGCAAACGTTGCTCGGAATCACTGGGCGTAAAGGGCGTGTAGGCGGGAAGGTAAGTCGGGCGTGAAATCCCTGAGCTCAACTCAGGAACTGCACTCGAAACTACTTTTCTTGAGTGCCGGAGAGGAAAGCGGAATTCCTGGTGTAGAGGTGAAATTCGTAGATATCAGGAGGAACACCAGTGGCGAAGGCGGCTTTCTGGACGGTAACTGACGCTGAGACGCGAAAGCGTGGGGAGCAAACA | NA | NA | NA | NA | NA | NA | NA |
TGGGGAATATTGGACAATGGGGGAAACCCTGATCCAGCAATGCCGCGTGTGTGAAGAAGGTTTTCGGATTGTAAAGCACTTTCGGCGGGGACGATGATGACGGTACCCGCAGAAGAAGCCCCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGGGCTAGCGTTGCTCGGAATGACTGGGCGTAAAGGGCGTGTAGGCGGTTTGTACAGTCAGATGTGAAATCCCCGGGCTTAACCTGGGAGCTGCATTTGATACGTGCAGACTAGAGTGTGAGAGAGGGTTGTGGAATTCCCAGTGTAGAGGTGAAATTCGTAGATATTGGGAAGAACACCGGTGGCGAAGGCGGCAACCTGGCTCATTACTGACGCTGAGGCGCGAAAGCGTGGGGAGCAAACA | Bacteria | Pseudomonadota | Alphaproteobacteria | Acetobacterales | Acetobacteraceae | Acetobacter | NA |
TAGGGAATCTTCCGCAATGGGCGAAAGCCTGACGGAGCAACGCCGCGTGAGTGATGAAGGTCTTCGGATCGTAAAACTCTGTTATTAGGGAAGAACAAATGTGTAAGTAACTATGCACGTCTTGACGGTACCTAATCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGCGCGCGTAGGCGGTTTTTTAAGTCTGATGTGAAAGCCCACGGCTCAACCGTGGAGGGTCATTGGAAACTGGAAAACTTGAGTGCAGAAGAGGAAAGTGGAATTCCATGTGTAGCGGTGAAATGCGCAGAGATATGGAGGAACACCAGTGGCGAAGGCGACTTTCTGGTCTGTAACTGACGCTGATGTGCGAAAGCGTGGGGATCAAACA | Bacteria | Bacillota | Bacilli | Staphylococcales | Staphylococcaceae | Staphylococcus | caprae/epidermidis/saccharolyticus |
TAAGGAATATTGGTCAATGGACGCAAGTCTGAACCAGCCATGCCGCGTGAAGGATGAAGGTCCTCTGGATTGTAAACTTCTTTTATAGGGGGCGAAAAAAGGTCTTTCTAGATCACTTGACAGTACCCTATGAATAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTATCCGGATTCACTGGGTTTAAAGGGTGCGTAGGCGGGTAGGTAAGTCAGAGGTGAAATCCTGGAGCTTAACTCCAGAACTGCCTTTGATACTATCTATCTTGAATATGGTGGAGGTAAGCGGAATATGTCATGTAGCGGTGAAATGCATAGATATGACATAGAACACCTATTGCGAAGGCAGCTTACTACGCCTATATTGACGCTGAGGCACGAAAGCGTGGGGATCAAACA | Bacteria | Bacteroidota | Bacteroidia | Chitinophagales | Chitinophagaceae | Sediminibacterium | magnilacihabitans |
TGGGGAATATTGGACAATGGGCGCAAGCCTGATCCAGCCATGCCGCGTGAGTGATGAAGGCCCTAGGGTTGTAAAGCTCTTTTGTGCGGGAAGATAATGACGGTACCGCAAGAATAAGCCCCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGGGCTAGCGTTGCTCGGAATCACTGGGCGTAAAGGGTGCGTAGGCGGGTCTTTAAGTCAGGGGTGAAATCCTGGAGCTCAACTCCAGAACTGCCTTTGATACTGAAGATCTTGAGTTCGGGAGAGGTGAGTGGAACTGCGAGTGTAGAGGTGAAATTCGTAGATATTCGCAAGAACACCAGTGGCGAAGGCGGCTCACTGGCCCGATACTGACGCTGAGGCACGAAAGCGTGGGGAGCAAACA | Bacteria | Pseudomonadota | Alphaproteobacteria | Rhizobiales | Xanthobacteraceae | NA | NA |
TGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCATGAGTGAAGAAGGCCTTTGGGTTGTAAAGCTCTTTTAGTGAGGAAGATAATGACGGTACTCACAGAAGAAGTCCTGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGAGGGCTAGCGTTATTCGGAATTATTGGGCGTAAAGGGCGCGTAGGCGGATTAGTAAGTTAAAAGTGAAATCCCAAGGCTCAACCTTGGAATTGCTTTTAAAACTGCTAATCTAGAGATTGAAAGAGGATAGAGGAATTCCTAGTGTAGAGGTGAAATTCGTAAATATTAGGAGGAACACCAGTGGCGAAGGCGTCTATCTGGTTCAAATCTGACGCTGAGGCGCGAAGGCGTGGGGAGCAACCA | Bacteria | Pseudomonadota | Alphaproteobacteria | Rickettsiales | Anaplasmataceae | Wolbachia | NA |
TGAGGAATATTGGTCAATGGACGAGAGTCTGAACCAGCCAAGTAGCGTGAAGGATGACTGCCCTATGGGTTGTAAACTTCTTTTATACGGGAATAAAGTGAGGCACGTGTGCCTTTTTGTATGTACCGTATGAATAAGGATCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGGCGGACGCTTAAGTCAGTTGTGAAAGTTTGCGGCTCAACCGTAAAATTGCAGTTGATACTGGGTGTCTTGAGTACAGTAGAGGCAGGCGGAATTCGTGGTGTAGCGGTGAAATGCTTAGATATCACGAAGAACTCCGATTGCGAAGGCAGCTTGCTGGACTGTAACTGACGCTGATGCTCGAAAGTGTGGGTATCAAACA | Bacteria | Bacteroidota | Bacteroidia | Bacteroidales | Bacteroidaceae | Bacteroides | NA |
TGGGGAATATTGCGCAATGGGCGGAAGCCTGACGCAGCGACGCCGCGTGAGGGATGACGGCCTTCGGGTTGTAAACCTCTTTCAGCTCTGACGAAGCGAGAGTGACGGTAGGAGCAGAAGAAGCACCGGCCAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGAGCTCGTAGGCGGTTTGTCGCGTCGAATGTGAAAACCCGGGGCTCAACCCCGGGCCTGCATTCGATACGGGCAGACTAGAGTTCGGTAGGGGAGTCTGGAATTCCTGGTGTAGCGGTGAAATGCGCAGATATCAGGAGGAACACCGGTGGCGAAGGCGGGACTCTGGGCCGATACTGACGCTGAGGAGCGAAAGCGTGGGGAGCGAACA | Bacteria | Actinomycetota | Actinomycetes | Mycobacteriales | Nakamurellaceae | Nakamurella | NA |
TGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCGTGTGTGAAGAAGGCCTTTTGGTTGTAAAGCACTTTAAGCGAGGAGGAGGCTACTTGGATTAATACTCTAGGATAGTGGACGTTACTCGCAGAATAAGCACCGGCTAACTCTGTGCCAGCAGCCGCGGTAATACAGAGGGTGCGAGCGTTAATCGGATTTACTGGGCGTAAAGCGTGCGTAGGCGGCTTTTTAAGTCGGATGTGAAATCCCTGAGCTTAACTTAGGAATTGCATTCGATACTGGGAAGCTAGAGTATGGGAGAGGATGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGATGGCGAAGGCAGCCATCTGGCCTAATACTGACGCTGAGGTACGAAAGCATGGGGAGCAAACA | Bacteria | Bacillota_A | Clostridia | Lachnospirales | Lachnospiraceae | Lachnospira | NA |
TGGGGAATTTTGGACAATGGGCGCAAGCCTGATCCAGCAATGCCGCGTGAGTGAAGAAGGCCTTCGGGTTGTAAAGCTCTTTTGTCAGGGAAGAAACGGCTGTGGCTAATATCCACGGCTAATGACGGTACCTGAAGAATAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTTTGTAAGTCTGTCGTGAAATCCCCGGGCTCAACCTGGGAATTGCGATGGAGACTGCAAGGCTAGAATCTGGCAGAGGGGGGTAGAATTCCACGTGTAGCAGTGAAATGCGTAGAGATGTGGAGGAACACCGATGGCGAAGGCAGCCCCCTGGGTCAAGATTGACGCTCATGCACGAAAGCGTGGGGAGCAAACA | Bacteria | Pseudomonadota | Gammaproteobacteria | Burkholderiales | Burkholderiaceae | NA | NA |
TGAGGAATATTGGTCAATGGGCGAGAGCCTGAACCAGCCAAGTAGCGTGCAGGAAGACGGCCCTATGGGTTGTAAACTGCTTTTATAAGGGAATAAAGTGGGAGTCGTGACTCTTTTTGCATGTACCTTATGAATAAGGACCGGCTAATTCCGTGCCAGCAGCCGCGGTAATACGGAAGGTCCGGGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGGCCGGAGATTAAGCGTGTTGTGAAATGTAGACGCTCAACGTCTGCACTGCAGCGCGAACTGGTTTCCTTGAGTACGCACAAAGTGGGCGGAATTCGTGGTGTAGCGGTGAAATGCTTAGATATCACGAAGAACTCCGATTGCGAAGGCAGCTCACTGGAGCGCAACTGACGCTGAAGCTCGAAAGTGCGGGTATCGAACA | Bacteria | Bacteroidota | Bacteroidia | Bacteroidales | Bacteroidaceae | Prevotella | NA |
TGGGGAATATTGGACAATGGGCGCAAGCCTGATCCAGCCATGCCGCGTGTGTGAAGAAGGCCTTTTGGTTGTAAAGCACTTTAAGCGAGGAGGAGGCTACTTGGATTAATACTCTGAGATAGTGGACGTTACTCGCAGAATAAGCACCGGCTAACTCTGTGCCAGCAGCCGCGGTAATACAGAGGGTGCGAGCGTTAATCGGATTTACTGGGCGTAAAGCGTGCGTAGGCGGCTTTTTAAGTCGGATGTGAAATCCCTGAGCTTAACTTAGGAATTGCATTCGATACTGGGAAGCTAGAGTATGGGAGAGGATGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGATGGCGAAGGCAGCCATCTGGCCTAATACTGACGCTGAGGTACGAAAGCATGGGGAGCAAACA | Bacteria | NA | NA | NA | NA | NA | NA |
TGGGGAATATTGGACAATGGGGGGAACCCTGATCCAGCCATGCCGCGTGTGTGAAGAAGGCCTTTTGGTTGTAAAGCACTTTAAGCGAGGAGGAGGCTACTCTAGTTAATACCTAGAGATAGTGGACGTTACTCGCAGAATAAGCACCGGCTAACTCTGTGCCAGCAGCCGCGGTAATACAGAGGGTGCGAGCGTTAATCGGATTTACTGGGCGTAAAGCGTGCGTAGGTGGCCAATTAAGTCAAATGTGAAATCCCCGAGCTTAACTTGGGAATTGCATTCGATACTGGTTGGCTAGAGTATGGGAGAGGATGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGATGGCGAAGGCAGCCATCTGGCCTAATACTGACACTGAGGTACGAAAGCATGGGGAGCAAACA | Bacteria | Pseudomonadota | Gammaproteobacteria | Pseudomonadales | Moraxellaceae | Acinetobacter | NA |
TAGGGAATATTGGACAATGGGCGCAAGCCTGATCCAGCCATGCCGCGTGTGTGAAGAAGGCCTTCGGGTTGTAAAGCACTTTTATCCGGAAAGAAAAGCGCTGGGTTAATACCCTGGTGTCCTGACGGTACCGGAAGAATAAGCACCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGTGCAAGCGTTACTCGGAATTACTGGGCGTAAAGCGTGCGTAGGTGGTTTGTTAAGTCTGATGTGAAAGCCCTGGGCTCAACCTGGGAATTGCATTGGATACTGGCAGGCTAGAGTGCGGTAGAGGGTAGTGGAATTCCCGGTGTAGCAGTGAAATGCGTAGAGATCGGGAGGAACATCTGTGGCGAAGGCGACTACCTGGACCAGCACTGACACTGAGGCACGAAAGCGTGGGGAGCAAACA | Bacteria | Pseudomonadota | Gammaproteobacteria | Xanthomonadales | Xanthomonadaceae | Thermomonas | carbonis |
TGGGGAATATTGGACAATGGGCGCAAGCCTGATCCAGCCATGCCGCGTGAGTGATGAAGGCCTTAGGGTTGTAAAGCTCTTTTGTCCGGGACGATAATGACGGTACCGGAAGAATAAGCCCCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGGGCTAGCGTTGCTCGGAATCACTGGGCGTAAAGGGCGCGTAGGCGGCCATTCAAGTCGGGGGTGAAAGCCTGTGGCTCAACCACAGAATTGCCTTCGATACTGTTTGGCTTGAGTTTGGTAGAGGTTGGTGGAACTGCGAGTGTAGAGGTGAAATTCGTAGATATTCGCAAGAACACCAGTGGCGAAGGCGGCCAACTGGACCAACACTGACGCTGAGGCGCGAAAGCGTGGGGAGCAAACA | Bacteria | Pseudomonadota | Alphaproteobacteria | Rhizobiales | Beijerinckiaceae | Methylobacterium | sp001423085 |
TGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGCGTGTGTGAAGAAGGCCTTCGGGTTGTAAAGCACTTTCAGCGAGGAGGAAGGGTAGTGTGTTAATAGCACATTGCATTGACGTTACTCGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCAGATGTGAAATCCCCGCGCTTAACGTGGGAACTGCATTTGAAACTGGCAAGCTAGAGTCTTGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACAAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACA | Bacteria | Pseudomonadota | Gammaproteobacteria | Enterobacterales | Enterobacteriaceae | NA | NA |
TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGAACGAAGAAGGTATTCGTATCGTAAAGTTCTGTCCTATGGGAAGATAATGACAGTACCATAGAAGAAAGCTCCGGCTAAATACGTGCCAGCAGCCGCGGTAATACGTATGGAGCGAGCGTTGTCCGGAATTATTGGGCGTAAAGGGTACGCAGGCGGTTTAATAAGTCGAATGTTAAAGATCGGGGCTCAACCCCGTAAAGCATTGGAAACTGATAAACTTGAGTAGTGGAGAGGAAAGTGGAATTCCTAGTGTAGTGGTGAAATACGTAGATATTAGGAGGAATACCAGTAGCGAAGGCGACTTTCTGGACACAAACTGACGCTGAGGTACGAAAGCGTGGGGAGCAAACA | Bacteria | Bacillota_A | Clostridia | Tissierellales | Peptoniphilaceae | Finegoldia | magna/magna_F/magna_H/magna_J/sp910589545 |
TGGGGAATATTGGACAATGGGCGCAAGCCTGATCCAGCCATGCCGCGTGTGTGAAGAAGGCCTTTTGGTTGTAAAGCACTTTAAGCGAGGAGGAGGCTACCTGGATTAATACTCTAGGATAGTGGACGTTACTCGCAGAATAAGCACCGGCTAACTCTGTGCCAGCAGCCGCGGTAATACAGAGGGTGCGAGCGTTAATCGGATTTACTGGGCGTAAAGCGTGCGTAGGCGGCTTTTTAAGTCGGATGTGAAATCCCTGAGCTTAACTTAGGAATTGCATTCGATACTGGGAAGCTAGAGTATGGGAGAGGATGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGATGGCGAAGGCAGCCATCTGGCCTAATACTGACGCTGAGGTACGAAAGCATGGGGAGCAAACA | Bacteria | NA | NA | NA | NA | NA | NA |
TGGGGAATATTGGACAATGGGGGCAACCCTGATCCAGCCATGCCGCGTGTGTGAAGAAGGCCTTTTGGTTGTAAAGCACTTTAAGCAGGGAGGAGAGGCTAATGGTTAATACCCATTAGATTAGACGTTACCTGCAGAATAAGCACCGGCTAACTCTGTGCCAGCAGCCGCGGTAATACAGAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGAGTGTAGGTGGCTCATTAAGTCACATGTGAAATCCCCGGGCTTAACCTGGGAACTGCATGTGATACTGGTGGTGCTAGAATATGTGAGAGGGAAGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGATGGCGAAGGCAGCTTCCTGGCATAATATTGACACTGAGATTCGAAAGCGTGGGTAGCAAACA | Bacteria | Pseudomonadota | Gammaproteobacteria | Pseudomonadales | Moraxellaceae | Moraxella_A | osloensis/sp002478835/sp902506215 |
TGGGGAATATTGCACAATGGGCGGAAGCCTGATGCAGCGACGCCGCGTGAGGGATGACGGCCTTCGGGTTGTAAACCTCTTTCAGCAGGGACGAAGCGTAAGTGACGGTACCTGCAGAAGAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCGAGCGTTGTCCGGAATTACTGGGCGTAAAGAGCTCGTAGGCGGTTTGTCACGTCGTCTGTGAAATCCTAGGGCTTAACCCTGGACGTGCAGGCGATACGGGCTGACTTGAGTACTACAGGGGAGACTGGAATTTCTGGTGTAGCGGTGGAATGCACAGATATCAGGAAGAACACCGATGGCGAAGGCAGGTCTCTGGGTAGTAACTGACGCTGAGGAGCGAAAGCATGGGTAGCGAACA | Bacteria | Actinomycetota | Actinomycetes | Mycobacteriales | Mycobacteriaceae | Lawsonella | clevelandensis_A |
TGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCAATGCCGCGTGAGTGATGAAGGCCTTAGGGTTGTAAAGCTCTTTTACCCGGGATGATAATGACAGTACCGGGAGAATAAGCTCCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGAGCTAGCGTTATTCGGAATTACTGGGCGTAAAGCGCACGTAGGCGGCTTTGTAAGTTAGAGGTGAAAGCCTGGAGCTCAACTCCAGAATTGCCTTTAAGACTGCATCGCTTGAATCCAGGAGAGGTGAGTGGAATTCCGAGTGTAGAGGTGAAATTCGTAGATATTCGGAAGAACACCAGTGGCGAAGGCGGCTCACTGGACTGGTATTGACGCTGAGGTGCGAAAGCGTGGGGAGCAAACA | Bacteria | Pseudomonadota | Alphaproteobacteria | Sphingomonadales | Sphingomonadaceae | NA | NA |
TGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCAATGCCGCGTGAGTGATGAAGGCCTTAGGGTTGTAAAGCTCTTTTACCCGGGATGATAATGACAGTACCGGGAGAATAAGCTCCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGAGCTAGCGTTGTTCGGAATTACTGGGCGTAAAGCGCACGTAGGCGGCTTTGTAAGTTAGAGGTGAAAGCCTGGAGCTCAACTCCAGAATTGCCTTTAAGACTGCATCGCTCGAATCCGGGAGAGGTGAGTGGAATTCCGAGTGTAGAGGTGAAATTCGTAGATATTCGGAAGAACACCAGTGGCGAAGGCGGCTCACTGGACCGGTATTGACGCTGAGGTGCGAAAGCGTGGGGAGCAAACA | Bacteria | Pseudomonadota | Alphaproteobacteria | Sphingomonadales | Sphingomonadaceae | Sphingomonas | sp900109205 |
TGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCGTGTGTGAAGAAGGTCTTCGGATTGTAAAGCACTTTAAGTTGGGAGGAAGGGCAGTGAATTAATACTTTACTGTTTTGACGTTACCGACAGAATAAGCACCGGCTAACTCTGTGCCAGCAGCCGCGGTAATACAGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGTGGTTTGTTAAGTTGAATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCCAAAACTGGCAAGCTAGAGTAGGGCAGAGGGTGGTGGAATTTCCTGTGTAGCGGTGAAATGCGTAGATATAGGAAGGAACACCAGTGGCGAAGGCGACCACCTGGGCTCATACTGACACTGAGGTGCGAAAGCGTGGGGAGCAAACA | Bacteria | Pseudomonadota | Gammaproteobacteria | Pseudomonadales | Pseudomonadaceae | NA | NA |
TGGGGAATATTGGACAATGGGCGCAAGCCTGATCCAGCCATACCGCGTGGGTGAAGAAGGCCTTCGGGTTGTAAAGCCCTTTTGTTGGGAAAGAAAAGCAGTCGGCTAATACCCGGTTGTTCTGACGGTACCCAAAGAATAAGCACCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGTGCAAGCGTTACTCGGAATTACTGGGCGTAAAGCGTGCGTAGGTGGTTGTTTAAGTCTGTTGTGAAAGCCCTGGGCTCAACCTGGGAATTGCAGTGGATACTGGGCGACTAGAGTGTGGTAGAGGGTAGTGGAATTCCCGGTGTAGCAGTGAAATGCGTAGAGATCGGGAGGAACATCCATGGCGAAGGCAGCTACCTGGACCAACACTGACACTGAGGCACGAAAGCGTGGGGAGCAAACA | Bacteria | Pseudomonadota | Gammaproteobacteria | Xanthomonadales | Xanthomonadaceae | Stenotrophomonas | bentonitica_A/rhizophila_A/sp018612915 |
TGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCGACGCCGCGTGGGGGATGACGGCCTTCGGGTTGTAAACCCCTTTCGGCAGGGACGAAGCTTTTGTGACGGTACCTGCATAAGAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCGAGCGTTGTCCGGAATTACTGGGCGTAAAGAGCTCGTAGGTGGTTTGTCGCGTCGTCTGTGAAATCCCGGGGCTTAACTCCGGGCGTGCAGGCGATACGGGCATAACTTGAGTGCTGTAGGGGAGACTGGAATTCCTGGTGTAGCGGTGAAATGCGCAGATATCAGGAGGAACACCGATGGCGAAGGCAGGTCTCTGGGCAGTAACTGACGCTGAGGAGCGAAAGCATGGGTAGCGAACA | Bacteria | Actinomycetota | Actinomycetes | Mycobacteriales | Mycobacteriaceae | Corynebacterium | bovis |
TGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCGTGTGTGAAGAAGGTCTTCGGATTGTAAAGCACTTTAAGTTGGGAGGAAGGGCTGCTGGTTAATACCCTGCAGTTTTGACGTTACCAACAGAATAAGCACCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGTGGTTGGGTAAGTTGAATGTGAAAGCCCCGGGCTCAACCTGGGAACTGCATCCAAAACTGCCCGACTAGAGTACGGTAGAGGGTGGTGGAATTTCCTGTGTAGCGGTGAAATGCGTAGATATAGGAAGGAACACCAGTGGCGAAGGCGACCACCTGGACTGATACTGACACTGAGGTGCGAAAGCGTGGGGAGCAAACA | Bacteria | Pseudomonadota | Gammaproteobacteria | Pseudomonadales | Pseudomonadaceae | NA | NA |
TGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACA | Bacteria | NA | NA | NA | NA | NA | NA |
TAGGGAATCTTCGGCAATGGGGGCAACCCTGACCGAGCAACGCCGCGTGAGTGAAGAAGGTTTTCGGATCGTAAAGCTCTGTTGTAAGTCAAGAACGGGTGTGAGAGTGGAAAGTTCACACTGTGACGGTAGCTTACCAGAAAGGGACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTCCCGAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGCGCAGGCGGTTTGATAAGTCTGAAGTTAAAGGCTGTGGCTCAACCATAGTTCGCTTTGGAAACTGTCAAACTTGAGTGCAGAAGGGGAGAGTGGAATTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAGGAACACCGGTGGCGAAAGCGGCTCTCTGGTCTGTAACTGACGCTGAGGCTCGAAAGCGTGGGGAGCGAACA | Bacteria | NA | NA | NA | NA | NA | NA |
TGGGGAATATTGGACAATGGGCGCAAGCCTGATCCAGCCATGCCGCGTGTGTGAAGAAGGCCTTTTGGTTGTAAAGCACTTTAAGCGAGGAGGAGGCTACTTGGATTAATACTCTAGGATAGTGGACGTTACTCGCAGAATAAGCACCGGCTAACTCTGTGCCAGCAGCCGCGGTAATACAGAGGGTGCGAGCGTTAATCGGATTTACTGGGCGTAAAGCGTGCGTAGGCGGCTTTTTAAGTCGGATGTGAAATCCCTGAGCTTAACTTAGGAATTGCATTCGATACTGGGAAGCTAGAGTATGGGAGAGGATGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGATGGCGAAGGCAGCCATCTGGCCTAATACTGACGCTGAGGTACGAAAGCATGGGGAGCAAACA | Bacteria | NA | NA | NA | NA | NA | NA |
TGGGGAATCTTGCGCAATGCACGAAAGTGTGACGCAGCAACGCCGCGTGCGGGAAGAAGGTCTTCGGATCGTAAACCGCTTTCAGTTGGGACGAGGGTTGCACGGTGAATAGCCGGCGTAACTGACGGTACCTTCAGAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTATCCGGAATCATTGGGCGTAAAGCGCGTGTAGGCGGTTCAGTAAGTCCGCTCTGAAAGCCCAGGGCTCAACCCTGGGAGGCGGGTGGATACTACTGGACTAGAGTACGGAAGAGGAGATTGGAATTCCTGGTGTAGCGGTGAAATGCGCAGATATCAGGAGGAACACCAACAGCGAAGGCAGATCTCTGGGACGTAACTGACGCTGAGACGCGAAAGCGTGGGGAGCAAACA | Bacteria | Actinomycetota | Thermoleophilia | Solirubrobacterales | Solirubrobacteraceae | JAGIBJ01 | NA |
TGGGGAATCTTGCGCAATGGGCGAAAGCCTGACGCAGCAACGCCGCGTGGGGGATGAAGGCTCTCGGGTTGTAAACCCCTTTCAGCAGGGACGAATCAAGACGGTACCTGCAGAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAACACGTAGGGGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGAGCTCGTAGGTGGCTGTGTAAGTCGGGTGTTAAATCCCCAGGCTCAACCTGGGACGCCACCCGATACTGCACTAGCTAGAGTCCGGTAGGGGAGCGTGGAATTCCTGGTGTAGCGGTGAAATGCGCAGATATCAGGAGGAACACCAGCGGCGAAGGCGGCGCTCTGGGCCGGTACTGACACTGAGGAGCGAAAGCGTGGGTAGCAAACA | Bacteria | Actinomycetota | Acidimicrobiia | Acidimicrobiales | NA | NA | NA |
TGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCGACGCCGCGTGAGGGATGACGGCCTTCGGGTTGTAAACCTCTGTTAGCATCGAAGAAGCGAAAGTGACGGTAGGTGCAGAGAAAGCGCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGCGCGAGCGTTGTCCGGAATTATTGGGCGTAAAGAGCTTGTAGGCGGTTGGTCGCGTCTGCTGTGAAAGGCTGGGGCTTAACCCTGGTTTTGCAGTGGGTACGGGCTAACTAGAGTGCAGTAGGGGAGACTGGAATTCCTGGTGTAGCGGTGGAATGCGCAGATATCAGGAGGAACACCGATGGCGAAGGCAGGTCTCTGGGCTGTAACTGACGCTGAGAAGCGAAAGCATGGGGAGCGAACA | Bacteria | Actinomycetota | Actinomycetes | Actinomycetales | Micrococcaceae | Rothia | dentocariosa |
TGGGGAATTTTGGACAATGGGCGAAAGCCTGATCCAGCAATGCCGCGTGCAGGATGAAGGCCTTCGGGTTGTAAACTGCTTTTGTACGGAACGAAAAGTCTCGGGATAACACCCCGGGACCATGACGGTACCGTAAGAATAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATGTAAGACAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATTTGTGACTGCATAGCTGGAGTACGGCAGAGGGGGATGGAATTCCGCGTGTAGCAGTGAAATGCGTAGATATGCGGAGGAACACCGATGGCGAAGGCAATCCCCTGGGCCTGTACTGACGCTCATGCACGAAAGCGTGGGGAGCAAACA | Bacteria | Pseudomonadota | Gammaproteobacteria | Burkholderiales | Burkholderiaceae | Comamonas | antarctica |
TGGGGAATATTGCACAATGGGCGAAAGTCTGATGCAGCAACGCCGCGTGAGCGATGAAGGCCTTCGGGTCGTAAAGCTCTGTCCTCAAGGAAGATAATGACGGTACTTGAGGAGGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCTAGCGTTATCCGGAATTACTGGGCGTAAAGGGTGCGTAGGTGGTTTCTTAAGTCAGAGGTGAAAGGCTACGGCTCAACCGTAGTAAGCCTTTGAAACTGGGAAACTTGAGTGCAGGAGAGGAGAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTTGCGAAGGCGGCTCTCTGGACTGTAACTGACACTGAGGCACGAAAGCGTGGGGAGCAAACA | Bacteria | Bacillota_A | Clostridia | Peptostreptococcales | Peptostreptococcaceae | NA | NA |
TAGGGAATCTTCCGCAATGGACGAAAGTCTGACGGAGCAACGCCGCGTGAGCGAAGAAGGTCTTCGGATTGTAAAGCTCTGTTGTTAGGGAAGAACAAGTACCGTTCGAAGAGGGCGGTACCGTGACGGTACCTAACGAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGAATTACTGGGCGTAAAGCGCGCGCAGGCGGTCCTTTAAGTCTGATGTGAAAGCCCACGGCTCAACCGTGGAGGGTCATTGGAAACTGGGGGACTTGAGTGCAGAAGAGGAGAGCGGAATTCCACGTGTAGCGGTGAAATGCGTAGAGATGTGGAGGAACACCAGTGGCGAAGGCGGCTCTCTGGTCTGTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGCAAACA | Bacteria | Bacillota | Bacilli | Bacillales | Anoxybacillaceae | Anoxybacillus_A | NA |
TGGGGGATATTGCACAATGGGCGCAAGCCTGATGCAGCGACGCCGCGTGGGGGATGAAGGCCTTCGGGTTGTAAACTTCTTTTGCTCTGAACAAGGCATGCCAATGGGTGTGTTGAGTGTAGGGGTTGATTAGCGCCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGCGCGAGCGTTGTCCGGAATTATTGGGCGTAAAGGGCTTGTAGGTGGCTGGTTGCGTCTGTCGTGAAAGCTCATGGCTTAACTGTGGGTTTGCGGTGGGTACGGGCTGGCTTGAGTGCAGTAGGGGAGGCTGGAATTCCTGGTGTAGCGGTGGAATGCGCAGATATCAGGAGGAATACCGGTGGCGAAGGCGGGTCTCTGGGCTGTTACTGACACTGAGGAGCGAAAGCATGGGGAGCGAACA | Bacteria | Actinomycetota | Actinomycetes | Actinomycetales | Actinomycetaceae | Varibaculum | NA |
TGGGGAATTTTGGACAATGGGCGCAAGCCTGATCCAGCCATGCCGCGTGCGGGAAGAAGGCCTTCGGGTTGTAAACCGCTTTTGTCAGGGAAGAAAAGGTTCTGGTTAATACCTGGGACTCATGACGGTACCTGAAGAATAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATGCAAGACAGAGGTGAAATCCCCGGGCTCAACCTGGGAACTGCCTTTGTGACTGCATGGCTAGAGTACGGTAGAGGGGGATGGAATTCCGCGTGTAGCAGTGAAATGCGTAGATATGCGGAGGAACACCGATGGCGAAGGCAATCCCCTGGACCTGTACTGACGCTCATGCACGAAAGCGTGGGGAGCAAACA | Bacteria | Pseudomonadota | Gammaproteobacteria | Burkholderiales | Burkholderiaceae | Pelomonas | aquatica |
TGGGGAATATTGGACAATGGGGGCAACCCTGATCCAGCCATGCCGCGTGTGTGAAGAAGGCCTTTTGGTTGTAAAGCACTTTAAGCAGTGAAGAAGACTCTATGGTTAATACCCATAGACGATGACATTAGCTGCAGAATAAGCACCGGCTAACTCTGTGCCAGCAGCCGCGGTAATACAGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGAGCGTAGGTGGCTTGATAAGTCAGATGTGAAAGCCCCGGGCTTAACCTGGGAACGGCATCTGATACTGTTAGGCTAGAGTAGGTGAGAGGAAGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGATGGCGAAGGCAGCCTTCTGGCATCATACTGACACTGAGGTTCGAAAGCGTGGGTAGCAAACA | Bacteria | Pseudomonadota | Gammaproteobacteria | Pseudomonadales | Moraxellaceae | Psychrobacter | faecalis |
TGGGGAATTTTGGACAATGGGCGCAAGCCTGATCCAGCAATGCCGCGTGAGTGAAGAAGGCCTTCGGGTTGTAAAGCTCTTTTGTCAGGGAAGAAACGGTGAAGGCTAATATCCTTTGCTAATGACGGTACCTGAAGAATAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTTTGTAAGTCTGATGTGAAATCCCCGGGCTCAACCTGGGAATTGCATTGGAGACTGCAAGGCTAGAATCTGGCAGAGGGGGGTAGAATTCCACGTGTAGCAGTGAAATGCGTAGAGATGTGGAGGAACACCGATGGCGAAGGCAGCCCCCTGGGTCAAGATTGACGCTCATGCACGAAAGCGTGGGGAGCAAACA | Bacteria | Pseudomonadota | Gammaproteobacteria | Burkholderiales | Burkholderiaceae | NA | NA |
TTAGGAATCTTCCCCAATGGGCGCAAGCCTGAGGGAGCGACGCCGCGTGAGGGATGAAGGTCCTCGGATTGTAAACCTCTGAACGAGTGACGAAAGGCCCCGGATGGGGAGATGACGGTAGCTCGGTAATAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTACCCGGAATCACTGGGCGTAAAGGGCGTGTAGGCGGAAGGTTAAGTCCGACTTTAAAGACCGGGGCTCAACCCCGGGAGTGGGTTGGAGACTGGCTTTCTGGACCTCTGGAGAGGCAACTGGAATTCCTGGTGTAGCGGTGGAATGCGTAGATACCAGGAGGAACACCAATGGCGAAGGCAGGTTGCTGGACAGAAGGTGACGCTGAGGCGCGAAAGTGTGGGGAGCGAACC | Bacteria | Deinococcota | Deinococci | Deinococcales | Deinococcaceae | Deinococcus | NA |
TAGGGAATCTTCGGCAATGGACGGAAGTCTGACCGAGCAACGCCGCGTGAGTGAAGAAGGTTTTCGGATCGTAAAGCTCTGTTGTAAGAGAAGAACGAGTATGAGAGTGGAAAGTTCGTACTGTGACGGTATCTTACCAGAAAGGGACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTCCCGAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGCGCAGGCGGTTAGATAAGTCTGAAGTTAAAGGCTGTGGCTTAACCATAGTACGCTTTGGAAACTGTTTAACTTGAGTGCAAGAGGGGAGAGTGGAATTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAGGAACACCGGTGGCGAAAGCGGCTCTCTGGCTTGTAACTGACGCTGAGGCTCGAAAGCGTGGGGAGCAAACA | Bacteria | NA | NA | NA | NA | NA | NA |
TGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAAGGTGTTGAGGTTAATAACCTCAGCAATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTCTGTCAAGTCGGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATTCGAAACTGGCAGGCTAGAGTCTTGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACAAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACA | Bacteria | Pseudomonadota | Gammaproteobacteria | Enterobacterales | Enterobacteriaceae | NA | NA |
TGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCGACGCCGCGTGGGGGATGACGGCCTTCGGGTTGTAAACTCCTTTCAGCCATGACGAAGCCCTTGTGGTGACGGTAGTGGTAGAAGAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCGAGCGTTGTCCGGAATTACTGGGCGTAAAGAGCTCGTAGGTGGTCTGTCGCGTCATTTGTGAAAGCCCGGGGCTTAACTCCGGGTTGGCAGGTGATACGGGCATGACTGGAGTACTGTAGGGGAGACTGGAATTCCTGGTGTAGCGGTGAAATGCGCAGATATCAGGAGGAACACCGGTGGCGAAGGCGGGTCTCTGGGCAGTAACTGACGCTGAGGAGCGAAAGCATGGGTAGCGAACA | Bacteria | Actinomycetota | Actinomycetes | Mycobacteriales | Mycobacteriaceae | Corynebacterium | kroppenstedtii |
TGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCGTGTGTGAAGAAGGTCTTCGGATTGTAAAGCACTTTAAGTTGGGAGGAAGGGCAGTTACCTAATACGTGATTGTTTTGACGTTACCGACAGAATAAGCACCGGCTAACTCTGTGCCAGCAGCCGCGGTAATACAGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGTGGTTAGTTAAGTTGGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATTCAAAACTGACTGACTAGAGTATGGTAGAGGGTGGTGGAATTTCCTGTGTAGCGGTGAAATGCGTAGATATAGGAAGGAACACCAGTGGCGAAGGCGACCACCTGGACTGATACTGACACTGAGGTGCGAAAGCGTGGGGAGCAAACA | Bacteria | Pseudomonadota | NA | NA | NA | NA | NA |
TGGGGAATTTTGGACAATGGGGGCAACCCTGATCCAGCCATGCCGCGTGCGGGAAGAAGGCCTTCGGGTTGTAAACCGCTTTTGTTCGGGAAGAAATCCTCTGGGCTAATACCCCGGGGGGATGACGGTACCGGAAGAATAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTGTGCAAGACAGATGTGAAATCCCCGGGCTTAACCTGGGAACTGCATTTGTGACTGCACGGCTAGAGTGCGGCAGAGGGGAGTGGAATTCCGCGTGTAGCAGTGAAATGCGTAGATATGCGGAGGAACACCGATGGCGAAGGCAGCTCCCTGGGCCTGCACTGACGCTCATGCACGAAAGCGTGGGGAGCAAACA | Bacteria | Pseudomonadota | Gammaproteobacteria | Burkholderiales | Burkholderiaceae | Caldimonas | aquatica |
TAGGGAATATTGGGCAATGGGCGAGAGCCTGACCCAGCCATGCCGCGTGCCGGATGAAGGCCTTCTGGGTTGTAAACGGCTTTTCTCAGGGAAGAAAAAGACCTTGCGAGGTACACTGACGGTACCTGAGGAATAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTGTCCGGATTTATTGGGTTTAAAGGGTGCGTAGGTGGCTTGTTAAGTCCGGGGTGAAAGCCCACAGCTCAACTGTGGAACTGCCCTGGATACTGGCGAGCTTGAGTCCAGACGAGGTTGGCGGAATGGAAGCTGTAGCGGTGAAATGCATAGATAGCTTCCAGAACCCCGATTGCGTAGGCAGCTGACTAGGCTGGTACTGACACTGAGGCACGAAAGCGTGGGGAGCGAACA | Bacteria | Bacteroidota | Bacteroidia | Cytophagales | Hymenobacteraceae | Hymenobacter | NA |
Phyloseq is an R package designed to import, store, analyze, and visualize complex microbiome data. Providing a powerful, unified data structure that seamlessly combines abundance tables (OTUs or ASVs), taxonomy, sample data, and phylogenetic trees into a single object. This integration simplifies data management and enables a wide range of downstream analyses and ggplot2-based visualizations within a consistent workflow.
Model matrix:
samples <- rownames(seqtab_length)
mm <- data.frame(sample = gsub("_\\d+.fq.gz$", "", samples),
species = ifelse(grepl("Dmel", samples), "D. melanogaster", "D. subobscura"))
rownames(mm) <- samples
mm
## sample species
## Dmel_K_St_M_1.fq.gz Dmel_K_St_M D. melanogaster
## Dmel_Sl_St_F_1.fq.gz Dmel_Sl_St_F D. melanogaster
## Dmel_Sl_St_M_1.fq.gz Dmel_Sl_St_M D. melanogaster
## Dmel_St_K_1.fq.gz Dmel_St_K D. melanogaster
## Dmel_St_Sl_1.fq.gz Dmel_St_Sl D. melanogaster
## Dsub_K_St_F_1.fq.gz Dsub_K_St_F D. subobscura
## Dsub_K_St_M_1.fq.gz Dsub_K_St_M D. subobscura
## Dsub_Sl_St_F_1.fq.gz Dsub_Sl_St_F D. subobscura
## Dsub_Sl_St_M_1.fq.gz Dsub_Sl_St_M D. subobscura
## Dsub_St_K_1.fq.gz Dsub_St_K D. subobscura
## Dsub_St_Sl_1.fq.gz Dsub_St_Sl D. subobscura
Create phyloseq object:
ps <- phyloseq::phyloseq(phyloseq::otu_table(seqtab_length, taxa_are_rows = FALSE),
phyloseq::sample_data(mm),
phyloseq::tax_table(taxa))
ps
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 58 taxa and 11 samples ]
## sample_data() Sample Data: [ 11 samples by 2 sample variables ]
## tax_table() Taxonomy Table: [ 58 taxa by 7 taxonomic ranks ]
Filter methods (take specific taxa for further analysis):
ps_acetobacteraceae <- phyloseq::subset_taxa(ps, Family == "Acetobacteraceae")
ps_acetobacteraceae
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 4 taxa and 11 samples ]
## sample_data() Sample Data: [ 11 samples by 2 sample variables ]
## tax_table() Taxonomy Table: [ 4 taxa by 7 taxonomic ranks ]
Filter methods (take only abundant ASVs):
phyloseq::filter_taxa(ps, function(x) any(x > 100), prune = TRUE)
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 13 taxa and 11 samples ]
## sample_data() Sample Data: [ 11 samples by 2 sample variables ]
## tax_table() Taxonomy Table: [ 13 taxa by 7 taxonomic ranks ]
Aggregation methods based on taxonomy:
phyloseq::tax_glom(ps, "Genus")
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 32 taxa and 11 samples ]
## sample_data() Sample Data: [ 11 samples by 2 sample variables ]
## tax_table() Taxonomy Table: [ 32 taxa by 7 taxonomic ranks ]
phyloseq::tax_glom(ps, "Order")
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 22 taxa and 11 samples ]
## sample_data() Sample Data: [ 11 samples by 2 sample variables ]
## tax_table() Taxonomy Table: [ 22 taxa by 7 taxonomic ranks ]
Various plots:
phyloseq::plot_bar(ps, fill = "Family")
Transformation of counts + full integration of plotting functions
with ggplot2
:
ps_percentage <- phyloseq::transform_sample_counts(ps, function(x) x / sum(x) * 100)
ps_percentage <- phyloseq::transform_sample_counts(ps, function(x) x / sum(x))
phyloseq::plot_bar(ps_percentage, fill = "Family") +
scale_y_continuous(labels = scales::percent) +
theme_bw() +
ggtitle("Relative Abundance by Family") +
theme(
axis.text.x = element_text(
angle = 45,
hjust = 1,
vjust = 1,
size = 10
)
)
Alpha diversity:
phyloseq::estimate_richness(ps) %>%
knitr::kable()
Observed | Chao1 | se.chao1 | ACE | se.ACE | Shannon | Simpson | InvSimpson | Fisher | |
---|---|---|---|---|---|---|---|---|---|
Dmel_K_St_M_1.fq.gz | 39 | 39 | 0 | 39 | 2.8823068 | 0.5763420 | 0.2156292 | 1.274907 | 3.7755474 |
Dmel_Sl_St_F_1.fq.gz | 17 | 17 | 0 | 17 | 1.9703687 | 1.1658685 | 0.5604619 | 2.275116 | 1.5462610 |
Dmel_Sl_St_M_1.fq.gz | 20 | 20 | 0 | 20 | 2.1330729 | 0.3380089 | 0.1184892 | 1.134416 | 1.8124576 |
Dmel_St_K_1.fq.gz | 6 | 6 | 0 | 6 | 0.9128709 | 0.5546775 | 0.3437347 | 1.523774 | 0.4989470 |
Dmel_St_Sl_1.fq.gz | 7 | 7 | 0 | 7 | 1.1952286 | 0.5514376 | 0.3296458 | 1.491749 | 0.5656896 |
Dsub_K_St_F_1.fq.gz | 4 | 4 | 0 | NaN | NaN | 0.1009014 | 0.0336597 | 1.034832 | 0.3394968 |
Dsub_K_St_M_1.fq.gz | 4 | 4 | 0 | NaN | NaN | 0.0498331 | 0.0145750 | 1.014791 | 0.3296388 |
Dsub_Sl_St_F_1.fq.gz | 4 | 4 | 0 | 4 | 0.8660254 | 0.2809978 | 0.1443709 | 1.168731 | 0.3100068 |
Dsub_Sl_St_M_1.fq.gz | 6 | 6 | 0 | 6 | 0.9128709 | 0.6032352 | 0.4060142 | 1.683542 | 0.5002721 |
Dsub_St_K_1.fq.gz | 8 | 8 | 0 | 8 | 0.9354143 | 0.1860356 | 0.0792211 | 1.086037 | 0.6675988 |
Dsub_St_Sl_1.fq.gz | 9 | 9 | 0 | 9 | 1.2472191 | 0.4211907 | 0.1971506 | 1.245564 | 0.7784515 |
Alpha diversity plot:
phyloseq::plot_richness(ps,
x = "species", # this is the "soecies" from our model matrix
color = "species")
Rarefaction + alpha diversity:
ps_rarefied <- phyloseq::rarefy_even_depth(ps)
phyloseq::estimate_richness(ps_rarefied) %>%
knitr::kable()
Observed | Chao1 | se.chao1 | ACE | se.ACE | Shannon | Simpson | InvSimpson | Fisher | |
---|---|---|---|---|---|---|---|---|---|
Dmel_K_St_M_1.fq.gz | 38 | 38 | 0.0000000 | 38.000000 | 3.0435436 | 0.5726938 | 0.2134447 | 1.271366 | 4.0883524 |
Dmel_Sl_St_F_1.fq.gz | 17 | 17 | 0.1616904 | 17.365497 | 1.9658972 | 1.1711663 | 0.5609636 | 2.277716 | 1.6681246 |
Dmel_Sl_St_M_1.fq.gz | 20 | 20 | 0.1218349 | 20.564043 | 2.2420828 | 0.3310231 | 0.1161814 | 1.131454 | 1.9978590 |
Dmel_St_K_1.fq.gz | 6 | 6 | 0.0000000 | 6.000000 | 0.9128709 | 0.5532551 | 0.3426930 | 1.521359 | 0.5291344 |
Dmel_St_Sl_1.fq.gz | 6 | 6 | 0.0000000 | 6.000000 | 1.1547005 | 0.5506788 | 0.3288589 | 1.489999 | 0.5291344 |
Dsub_K_St_F_1.fq.gz | 4 | 4 | 0.0000000 | NaN | NaN | 0.0971727 | 0.0322545 | 1.033330 | 0.3394968 |
Dsub_K_St_M_1.fq.gz | 4 | 4 | 0.0000000 | NaN | NaN | 0.0533940 | 0.0157810 | 1.016034 | 0.3394968 |
Dsub_Sl_St_F_1.fq.gz | 4 | 4 | 0.4330127 | NaN | NaN | 0.2758875 | 0.1410752 | 1.164246 | 0.3394968 |
Dsub_Sl_St_M_1.fq.gz | 6 | 6 | 0.2282177 | 7.032922 | 1.2696131 | 0.5989098 | 0.4033108 | 1.675914 | 0.5291344 |
Dsub_St_K_1.fq.gz | 8 | 8 | 0.4677072 | 8.565972 | 1.3056819 | 0.1869213 | 0.0789741 | 1.085746 | 0.7257330 |
Dsub_St_Sl_1.fq.gz | 8 | 8 | 0.0000000 | 8.000000 | 1.2247449 | 0.4284049 | 0.2012855 | 1.252012 | 0.7257330 |
Estimate a GTR + gamma + inv model ML phyogetic tree and add to phylosec object:
gtr_tree <- function(ps){
# 1. Extract sequences from the phyloseq object
seqs <- colnames(phyloseq::otu_table(ps))
# 2. Assign names to the sequences
names(seqs) <- seqs
# 3. Align the sequences using DECIPHER
alignment <- DECIPHER::AlignSeqs(Biostrings::DNAStringSet(seqs),
anchor = NA,
verbose = FALSE,
iterations = 10,
refinements = 10)
# 4. Mask Poorly aligned regions of a multiple sequence alignment.
# May lead to incorrect results in downstream analyses.
# One method to mitigate their effects is to mask columns of the alignment that may be poorly aligned,
# such as highly-variable regions or regions with many insertions and deletions (gaps).
alignment_masked <- DECIPHER::MaskAlignment(alignment,
correction = TRUE,
showPlot = FALSE)
alignment <- as(alignment_masked, "DNAStringSet")
# 5. Convert the alignment to a phangorn object
phangAlign <- phangorn::phyDat(as(alignment, "matrix"),
type = "DNA")
# https://cran.r-project.org/web/packages/phangorn/vignettes/MLbyHand.html
# fit a maximum likelihood model using General Time Reversible. substitution model, "G(4)" discrete gamma model, "I" invariant sites
fitGTR <- phangorn::pml_bb(phangAlign,
model = "GTR+G(4)+I",
rearrangement = "stochastic")
# 7. Add the final phylogenetic tree to the phyloseq object
phyloseq::phy_tree(ps) <- phangorn::midpoint(fitGTR$tree)
return(ps)
}
ps <- gtr_tree(ps)
Plot the phylogenetic tree and decorate with abundances and sample information:
taxa <- phyloseq::tax_table(ps)
seqs <- rownames(taxa)
taxa_df <- as.data.frame(taxa)
taxa_df[,"Family_Genus"] <- ifelse(!is.na(taxa_df[,"Family"]) & !is.na(taxa_df[,"Genus"]),
paste(taxa_df[,"Family"], taxa_df[,"Genus"]),
taxa_df[,"Family"])
rownames(taxa_df) <- seqs
phyloseq::tax_table(ps) <- as.matrix(taxa_df)
phyloseq::plot_tree(ps,
size = "Abundance",
color = "Sample",
label.tips = "Family_Genus") +
scale_color_manual(values = c("Dmel_K_St_M_1.fq.gz" = "red",
"Dmel_Sl_St_F_1.fq.gz" = "red",
"Dmel_Sl_St_M_1.fq.gz" = "red",
"Dmel_St_K_1.fq.gz" = "red",
"Dmel_St_K_1.fq.gz" = "red",
"Dmel_St_Sl_1.fq.gz" = "red",
"Dsub_K_St_F_1.fq.gz" = "dodgerblue",
"Dsub_K_St_M_1.fq.gz" = "dodgerblue",
"Dsub_Sl_St_F_1.fq.gz" = "dodgerblue",
"Dsub_Sl_St_M_1.fq.gz" = "dodgerblue",
"Dsub_St_K_1.fq.gz" = "dodgerblue",
"Dsub_St_Sl_1.fq.gz" = "dodgerblue"))
Various distance metrics. For example weighted unifrac (Lozupone and Knight, 2005):
wunifrac_dist <- phyloseq::distance(ps,
method = "wunifrac",
type = "samples")
wunifrac_dist
## Dmel_K_St_M_1.fq.gz Dmel_Sl_St_F_1.fq.gz
## Dmel_Sl_St_F_1.fq.gz 0.2784217416
## Dmel_Sl_St_M_1.fq.gz 0.0670253022 0.3408667239
## Dmel_St_K_1.fq.gz 0.4813945160 0.5913327770
## Dmel_St_Sl_1.fq.gz 0.4820056614 0.5914439461
## Dsub_K_St_F_1.fq.gz 0.5979706690 0.6939696020
## Dsub_K_St_M_1.fq.gz 0.5982594431 0.6940976185
## Dsub_Sl_St_F_1.fq.gz 0.5733874538 0.6772882795
## Dsub_Sl_St_M_1.fq.gz 0.5786895989 0.6807684427
## Dsub_St_K_1.fq.gz 0.5919359895 0.6898234760
## Dsub_St_Sl_1.fq.gz 0.5868037690 0.6852822098
## Dmel_Sl_St_M_1.fq.gz Dmel_St_K_1.fq.gz Dmel_St_Sl_1.fq.gz
## Dmel_Sl_St_F_1.fq.gz
## Dmel_Sl_St_M_1.fq.gz
## Dmel_St_K_1.fq.gz 0.4543574356
## Dmel_St_Sl_1.fq.gz 0.4552081958 0.0143020262
## Dsub_K_St_F_1.fq.gz 0.5743697243 0.1431786202 0.1408633421
## Dsub_K_St_M_1.fq.gz 0.5746846360 0.1436303422 0.1413144741
## Dsub_Sl_St_F_1.fq.gz 0.5476504321 0.1640710758 0.1629227350
## Dsub_Sl_St_M_1.fq.gz 0.5534644232 0.1540421434 0.1529483530
## Dsub_St_K_1.fq.gz 0.5679253567 0.1151311847 0.1126606621
## Dsub_St_Sl_1.fq.gz 0.5626387675 0.1116805238 0.1072713518
## Dsub_K_St_F_1.fq.gz Dsub_K_St_M_1.fq.gz
## Dmel_Sl_St_F_1.fq.gz
## Dmel_Sl_St_M_1.fq.gz
## Dmel_St_K_1.fq.gz
## Dmel_St_Sl_1.fq.gz
## Dsub_K_St_F_1.fq.gz
## Dsub_K_St_M_1.fq.gz 0.0006153382
## Dsub_Sl_St_F_1.fq.gz 0.0753384083 0.0759018075
## Dsub_Sl_St_M_1.fq.gz 0.0580447995 0.0586089296
## Dsub_St_K_1.fq.gz 0.0335539026 0.0341618584
## Dsub_St_Sl_1.fq.gz 0.0384633132 0.0390692739
## Dsub_Sl_St_F_1.fq.gz Dsub_Sl_St_M_1.fq.gz
## Dmel_Sl_St_F_1.fq.gz
## Dmel_Sl_St_M_1.fq.gz
## Dmel_St_K_1.fq.gz
## Dmel_St_Sl_1.fq.gz
## Dsub_K_St_F_1.fq.gz
## Dsub_K_St_M_1.fq.gz
## Dsub_Sl_St_F_1.fq.gz
## Dsub_Sl_St_M_1.fq.gz 0.0181067009
## Dsub_St_K_1.fq.gz 0.0656932893 0.0571506156
## Dsub_St_Sl_1.fq.gz 0.0657348415 0.0571206377
## Dsub_St_K_1.fq.gz
## Dmel_Sl_St_F_1.fq.gz
## Dmel_Sl_St_M_1.fq.gz
## Dmel_St_K_1.fq.gz
## Dmel_St_Sl_1.fq.gz
## Dsub_K_St_F_1.fq.gz
## Dsub_K_St_M_1.fq.gz
## Dsub_Sl_St_F_1.fq.gz
## Dsub_Sl_St_M_1.fq.gz
## Dsub_St_K_1.fq.gz
## Dsub_St_Sl_1.fq.gz 0.0073500933
Beta Diversity using double principal coordinate analysis (DPCoA) (Pavoine et al., 2004):
ps_dpcoa <- phyloseq::ordinate(ps,
"DPCoA",
"wunifrac")
plot_ordination(ps,
ps_dpcoa,
"samples",
color = "species") +
ggrepel::geom_text_repel(aes(label = sample)) +
theme_bw()
Session Info:
sessionInfo()
## R version 4.4.1 (2024-06-14 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 10 x64 (build 19045)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=English_United States.utf8
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: Europe/Budapest
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] DECIPHER_3.0.0 Biostrings_2.72.0 GenomeInfoDb_1.40.1
## [4] XVector_0.44.0 IRanges_2.38.0 S4Vectors_0.42.0
## [7] BiocGenerics_0.50.0 ggrepel_0.9.6 phangorn_2.12.1
## [10] ape_5.8-1 phyloseq_1.48.0 dada2_1.32.0
## [13] Rcpp_1.0.12 lubridate_1.9.3 forcats_1.0.0
## [16] stringr_1.5.1 dplyr_1.1.4 purrr_1.0.2
## [19] readr_2.1.5 tidyr_1.3.1 tibble_3.2.1
## [22] tidyverse_2.0.0 ggplot2_3.5.1
##
## loaded via a namespace (and not attached):
## [1] DBI_1.2.2 bitops_1.0-7
## [3] deldir_2.0-4 permute_0.9-8
## [5] rlang_1.1.4 magrittr_2.0.3
## [7] ade4_1.7-22 matrixStats_1.5.0
## [9] compiler_4.4.1 mgcv_1.9-1
## [11] systemfonts_1.2.3 png_0.1-8
## [13] vctrs_0.6.5 reshape2_1.4.4
## [15] quadprog_1.5-8 pwalign_1.0.0
## [17] pkgconfig_2.0.3 crayon_1.5.2
## [19] fastmap_1.2.0 labeling_0.4.3
## [21] utf8_1.2.4 Rsamtools_2.20.0
## [23] rmarkdown_2.27 tzdb_0.4.0
## [25] UCSC.utils_1.0.0 xfun_0.44
## [27] zlibbioc_1.50.0 cachem_1.1.0
## [29] jsonlite_1.8.8 biomformat_1.32.0
## [31] highr_0.11 rhdf5filters_1.16.0
## [33] DelayedArray_0.30.1 Rhdf5lib_1.26.0
## [35] BiocParallel_1.38.0 jpeg_0.1-11
## [37] cluster_2.1.6 parallel_4.4.1
## [39] R6_2.5.1 bslib_0.7.0
## [41] stringi_1.8.4 RColorBrewer_1.1-3
## [43] GenomicRanges_1.56.2 jquerylib_0.1.4
## [45] SummarizedExperiment_1.34.0 iterators_1.0.14
## [47] knitr_1.47 splines_4.4.1
## [49] igraph_2.0.3 Matrix_1.7-0
## [51] timechange_0.3.0 tidyselect_1.2.1
## [53] rstudioapi_0.16.0 abind_1.4-8
## [55] yaml_2.3.8 vegan_2.7-1
## [57] codetools_0.2-20 hwriter_1.3.2.1
## [59] lattice_0.22-6 plyr_1.8.9
## [61] Biobase_2.64.0 withr_3.0.0
## [63] ShortRead_1.62.0 evaluate_0.23
## [65] survival_3.6-4 RcppParallel_5.1.10
## [67] xml2_1.3.6 pillar_1.9.0
## [69] MatrixGenerics_1.16.0 foreach_1.5.2
## [71] generics_0.1.3 hms_1.1.3
## [73] munsell_0.5.1 scales_1.3.0
## [75] glue_1.7.0 tools_4.4.1
## [77] interp_1.1-6 data.table_1.15.4
## [79] GenomicAlignments_1.40.0 fastmatch_1.1-6
## [81] rhdf5_2.48.0 grid_4.4.1
## [83] latticeExtra_0.6-30 colorspace_2.1-0
## [85] nlme_3.1-164 GenomeInfoDbData_1.2.12
## [87] cli_3.6.2 textshaping_0.4.0
## [89] kableExtra_1.4.0 fansi_1.0.6
## [91] viridisLite_0.4.2 S4Arrays_1.4.1
## [93] svglite_2.2.1 gtable_0.3.5
## [95] sass_0.4.9 digest_0.6.35
## [97] SparseArray_1.4.8 farver_2.1.2
## [99] multtest_2.60.0 htmltools_0.5.8.1
## [101] lifecycle_1.0.4 httr_1.4.7
## [103] MASS_7.3-60.2