Title: | Tools for Parsing, Manipulating, and Graphing Taxonomic Abundance Data |
---|---|
Description: | A set of tools for parsing, manipulating, and graphing data classified by a hierarchy (e.g. a taxonomy). |
Authors: | Zachary Foster [aut, cre], Niklaus Grunwald [ths], Rob Gilmore [ctb] |
Maintainer: | Zachary Foster <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 0.3.7 |
Built: | 2024-11-16 06:23:33 UTC |
Source: | https://github.com/grunwaldlab/metacoder |
Return the names of data that can be used with functions in the taxa package that use [non-standard evaluation](http://adv-r.had.co.nz/Computing-on-the-language.html) (NSE), like [filter_taxa()].
obj$all_names(tables = TRUE, funcs = TRUE, others = TRUE, warn = FALSE) all_names(obj, tables = TRUE, funcs = TRUE, others = TRUE, warn = FALSE)
obj |
([taxonomy()] or [taxmap()]) The object containing taxon information to be queried. |
tables |
This option only applies to [taxmap()] objects. If 'TRUE', include the names of columns of tables in 'obj$data' |
funcs |
This option only applies to [taxmap()] objects. If 'TRUE', include the names of user-definable functions in 'obj$funcs'. |
others |
This option only applies to [taxmap()] objects. If 'TRUE', include the names of data in 'obj$data' besides tables. |
builtin_funcs |
This option only applies to [taxmap()] objects. If 'TRUE', include functions like [n_supertaxa()] that provide information for each taxon. |
warn |
option only applies to [taxmap()] objects. If 'TRUE', warn if there are duplicate names. Duplicate names make it unclear what data is being referred to. |
'character'
Other NSE helpers:
data_used
,
get_data()
,
names_used
# Get the names of all data accesible by non-standard evaluation all_names(ex_taxmap) # Dont include the names of automatically included functions. all_names(ex_taxmap, builtin_funcs = FALSE)
# Get the names of all data accesible by non-standard evaluation all_names(ex_taxmap) # Dont include the names of automatically included functions. all_names(ex_taxmap, builtin_funcs = FALSE)
This function stores the regex patterns for ambiguous taxa.
ambiguous_synonyms( unknown = TRUE, uncultured = TRUE, regex = TRUE, case_variations = FALSE )
ambiguous_synonyms( unknown = TRUE, uncultured = TRUE, regex = TRUE, case_variations = FALSE )
unknown |
If |
uncultured |
If |
regex |
If |
case_variations |
If |
Sort rows of tables or the elements of lists/vectors in the 'obj$data' list in [taxmap()] objects. Any variable name that appears in [all_names()] can be used as if it was a vector on its own. See [dplyr::arrange()] for the inspiration for this function and more information. Calling the function using the 'obj$arrange_obs(...)' style edits "obj" in place, unlike most R functions. However, calling the function using the 'arrange_obs(obj, ...)' imitates R's traditional copy-on-modify semantics, so "obj" would not be changed; instead a changed version would be returned, like most R functions.
obj$arrange_obs(data, ...) arrange_obs(obj, data, ...)
obj |
An object of type [taxmap()]. |
data |
Dataset names, indexes, or a logical vector that indicates which datasets in 'obj$data' to sort If multiple datasets are sorted at once, then they must be the same length. |
... |
One or more expressions (e.g. column names) to sort on. |
target |
DEPRECIATED. use "data" instead. |
An object of type [taxmap()]
Other taxmap manipulation functions:
arrange_taxa()
,
filter_obs()
,
filter_taxa()
,
mutate_obs()
,
sample_frac_obs()
,
sample_frac_taxa()
,
sample_n_obs()
,
sample_n_taxa()
,
select_obs()
,
transmute_obs()
# Sort in ascending order arrange_obs(ex_taxmap, "info", n_legs) arrange_obs(ex_taxmap, "foods", name) # Sort in decending order arrange_obs(ex_taxmap, "info", desc(n_legs)) # Sort multiple datasets at once arrange_obs(ex_taxmap, c("info", "phylopic_ids", "foods"), n_legs)
# Sort in ascending order arrange_obs(ex_taxmap, "info", n_legs) arrange_obs(ex_taxmap, "foods", name) # Sort in decending order arrange_obs(ex_taxmap, "info", desc(n_legs)) # Sort multiple datasets at once arrange_obs(ex_taxmap, c("info", "phylopic_ids", "foods"), n_legs)
Sort the edge list and taxon list in [taxonomy()] or [taxmap()] objects. See [dplyr::arrange()] for the inspiration for this function and more information. Calling the function using the 'obj$arrange_taxa(...)' style edits "obj" in place, unlike most R functions. However, calling the function using the ‘arrange_taxa(obj, ...)' imitates R’s traditional copy-on-modify semantics, so "obj" would not be changed; instead a changed version would be returned, like most R functions.
obj$arrange_taxa(...) arrange_taxa(obj, ...)
obj |
[taxonomy()] or [taxmap()] |
... |
One or more expressions (e.g. column names) to sort on. Any variable name that appears in [all_names()] can be used as if it was a vector on its own. |
An object of type [taxonomy()] or [taxmap()]
Other taxmap manipulation functions:
arrange_obs()
,
filter_obs()
,
filter_taxa()
,
mutate_obs()
,
sample_frac_obs()
,
sample_frac_taxa()
,
sample_n_obs()
,
sample_n_taxa()
,
select_obs()
,
transmute_obs()
# Sort taxa in ascending order arrange_taxa(ex_taxmap, taxon_names) # Sort taxa in decending order arrange_taxa(ex_taxmap, desc(taxon_names)) # Sort using an expression. List genera first. arrange_taxa(ex_taxmap, taxon_ranks != "genus")
# Sort taxa in ascending order arrange_taxa(ex_taxmap, taxon_names) # Sort taxa in decending order arrange_taxa(ex_taxmap, desc(taxon_names)) # Sort using an expression. List genera first. arrange_taxa(ex_taxmap, taxon_ranks != "genus")
Convert a taxmap object to a phyloseq object.
as_phyloseq( obj, otu_table = NULL, otu_id_col = "otu_id", sample_data = NULL, sample_id_col = "sample_id", phy_tree = NULL )
as_phyloseq( obj, otu_table = NULL, otu_id_col = "otu_id", sample_data = NULL, sample_id_col = "sample_id", phy_tree = NULL )
obj |
The taxmap object. |
otu_table |
The table in 'obj$data' with OTU counts. Must be one of the following:
|
otu_id_col |
The name of the column storing OTU IDs in the otu table. |
sample_data |
A table containing sample data with sample IDs matching column names in the OTU table. Must be one of the following:
|
sample_id_col |
The name of the column storing sample IDs in the sample data table. |
phy_tree |
A phylogenetic tree of class
|
## Not run: # Install phyloseq to get example data # if (!require("BiocManager", quietly = TRUE)) # install.packages("BiocManager") # # BiocManager::install("phyloseq") # Parse example dataset library(phyloseq) data(GlobalPatterns) x <- parse_phyloseq(GlobalPatterns) # Convert back to a phylseq object as_phyloseq(x) ## End(Not run)
## Not run: # Install phyloseq to get example data # if (!require("BiocManager", quietly = TRUE)) # install.packages("BiocManager") # # BiocManager::install("phyloseq") # Parse example dataset library(phyloseq) data(GlobalPatterns) x <- parse_phyloseq(GlobalPatterns) # Convert back to a phylseq object as_phyloseq(x) ## End(Not run)
Return the "branch" taxa for a [taxonomy()] or [taxmap()] object. A branch is anything that is not a root, stem, or leaf. Its the interior of the tree after the first split starting from the roots. Can also be used to get the branches of a subset of taxa.
obj$branches(subset = NULL, value = "taxon_indexes") branches(obj, subset = NULL, value = "taxon_indexes")
obj |
The [taxonomy()] or [taxmap()] object containing taxon information to be queried. |
subset |
Taxon IDs, TRUE/FALSE vector, or taxon indexes used to subset the tree prior to determining branches. Default: All taxa in 'obj' will be used. Any variable name that appears in [all_names()] can be used as if it was a vector on its own. Note that branches are determined after the filtering, so a given taxon might be a branch on the unfiltered tree, but not a branch on the filtered tree. |
value |
What data to return. This is usually the name of column in a table in 'obj$data'. Any result of [all_names()] can be used, but it usually only makes sense to use data that corresponds to taxa 1:1, such as [taxon_ranks()]. By default, taxon indexes are returned. |
'character'
Other taxonomy indexing functions:
internodes()
,
leaves()
,
roots()
,
stems()
,
subtaxa()
,
supertaxa()
# Return indexes of branch taxa branches(ex_taxmap) # Return indexes for a subset of taxa branches(ex_taxmap, subset = 2:17) branches(ex_taxmap, subset = n_obs > 1) # Return something besides taxon indexes branches(ex_taxmap, value = "taxon_names")
# Return indexes of branch taxa branches(ex_taxmap) # Return indexes for a subset of taxa branches(ex_taxmap, subset = 2:17) branches(ex_taxmap, subset = n_obs > 1) # Return something besides taxon indexes branches(ex_taxmap, value = "taxon_names")
EXPERIMENTAL: This function is still being tested and developed; use with caution. Uses the
DESeq2-package
package to conduct differential abundance analysis of count data. Counts can
be of OTUs/ASVs or taxa. The plotting function heat_tree_matrix
is useful for
visualizing these results. See details section below for considerations on preparing data for
this analysis.
calc_diff_abund_deseq2( obj, data, cols, groups, other_cols = FALSE, lfc_shrinkage = c("none", "normal", "ashr"), ... )
calc_diff_abund_deseq2( obj, data, cols, groups, other_cols = FALSE, lfc_shrinkage = c("none", "normal", "ashr"), ... )
obj |
A |
data |
The name of a table in |
cols |
The names/indexes of columns in
|
groups |
A vector defining how samples are grouped into "treatments". Must be the same order
and length as |
other_cols |
If |
lfc_shrinkage |
What technique to use to adjust the log fold change results for low counts. Useful for ranking and visualizing log fold changes. Must be one of the following:
|
... |
Passed to |
Data should be raw read counts, not rarefied, converted to proportions, or modified with any
other technique designed to correct for sample size since DESeq2-package
is designed to be
used with count data and takes into account unequal sample size when determining differential
abundance. Warnings will be given if the data is not integers or all sample sizes are equal.
A tibble with at least the taxon ID of the thing tested, the groups compared, and the
DESeq2 results. The log2FoldChange
values will be positive if treatment_1
is more
abundant and treatment_2
.
Other calculations:
calc_group_mean()
,
calc_group_median()
,
calc_group_rsd()
,
calc_group_stat()
,
calc_n_samples()
,
calc_obs_props()
,
calc_prop_samples()
,
calc_taxon_abund()
,
compare_groups()
,
counts_to_presence()
,
rarefy_obs()
,
zero_low_counts()
## Not run: # Parse data for plotting x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$") # Get per-taxon counts x$data$tax_table <- calc_taxon_abund(x, data = "tax_data", cols = hmp_samples$sample_id) # Calculate difference between groups x$data$diff_table <- calc_diff_abund_deseq2(x, data = "tax_table", cols = hmp_samples$sample_id, groups = hmp_samples$body_site) # Plot results (might take a few minutes) heat_tree_matrix(x, data = "diff_table", node_size = n_obs, node_label = taxon_names, node_color = ifelse(is.na(padj) | padj > 0.05, 0, log2FoldChange), node_color_range = diverging_palette(), node_color_trans = "linear", node_color_interval = c(-3, 3), edge_color_interval = c(-3, 3), node_size_axis_label = "Number of OTUs", node_color_axis_label = "Log2 fold change") ## End(Not run)
## Not run: # Parse data for plotting x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$") # Get per-taxon counts x$data$tax_table <- calc_taxon_abund(x, data = "tax_data", cols = hmp_samples$sample_id) # Calculate difference between groups x$data$diff_table <- calc_diff_abund_deseq2(x, data = "tax_table", cols = hmp_samples$sample_id, groups = hmp_samples$body_site) # Plot results (might take a few minutes) heat_tree_matrix(x, data = "diff_table", node_size = n_obs, node_label = taxon_names, node_color = ifelse(is.na(padj) | padj > 0.05, 0, log2FoldChange), node_color_range = diverging_palette(), node_color_trans = "linear", node_color_interval = c(-3, 3), edge_color_interval = c(-3, 3), node_size_axis_label = "Number of OTUs", node_color_axis_label = "Log2 fold change") ## End(Not run)
For a given table in a taxmap
object, split columns by a
grouping factor and return row means in a table.
calc_group_mean( obj, data, groups, cols = NULL, other_cols = FALSE, out_names = NULL, dataset = NULL )
calc_group_mean( obj, data, groups, cols = NULL, other_cols = FALSE, out_names = NULL, dataset = NULL )
obj |
A |
data |
The name of a table in |
groups |
Group multiple columns per treatment/group. This should be a
vector of group IDs (e.g. character, integer) the same length as
|
cols |
The columns in
|
other_cols |
Preserve in the output non-target columns present in the input data. New columns will always be on the end. The "taxon_id" column will be preserved in the front. Takes one of the following inputs:
|
out_names |
The names of count columns in the output. Must be the same
length and order as |
dataset |
DEPRECIATED. use "data" instead. |
A tibble
Other calculations:
calc_diff_abund_deseq2()
,
calc_group_median()
,
calc_group_rsd()
,
calc_group_stat()
,
calc_n_samples()
,
calc_obs_props()
,
calc_prop_samples()
,
calc_taxon_abund()
,
compare_groups()
,
counts_to_presence()
,
rarefy_obs()
,
zero_low_counts()
## Not run: # Parse data for examples x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$") # Calculate the means for each group calc_group_mean(x, "tax_data", hmp_samples$sex) # Use only some columns calc_group_mean(x, "tax_data", hmp_samples$sex[4:20], cols = hmp_samples$sample_id[4:20]) # Including all other columns in ouput calc_group_mean(x, "tax_data", groups = hmp_samples$sex, other_cols = TRUE) # Inlcuding specific columns in output calc_group_mean(x, "tax_data", groups = hmp_samples$sex, other_cols = 2) calc_group_mean(x, "tax_data", groups = hmp_samples$sex, other_cols = "otu_id") # Rename output columns calc_group_mean(x, "tax_data", groups = hmp_samples$sex, out_names = c("Women", "Men")) ## End(Not run)
## Not run: # Parse data for examples x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$") # Calculate the means for each group calc_group_mean(x, "tax_data", hmp_samples$sex) # Use only some columns calc_group_mean(x, "tax_data", hmp_samples$sex[4:20], cols = hmp_samples$sample_id[4:20]) # Including all other columns in ouput calc_group_mean(x, "tax_data", groups = hmp_samples$sex, other_cols = TRUE) # Inlcuding specific columns in output calc_group_mean(x, "tax_data", groups = hmp_samples$sex, other_cols = 2) calc_group_mean(x, "tax_data", groups = hmp_samples$sex, other_cols = "otu_id") # Rename output columns calc_group_mean(x, "tax_data", groups = hmp_samples$sex, out_names = c("Women", "Men")) ## End(Not run)
For a given table in a taxmap
object, split columns by a
grouping factor and return row medians in a table.
calc_group_median( obj, data, groups, cols = NULL, other_cols = FALSE, out_names = NULL, dataset = NULL )
calc_group_median( obj, data, groups, cols = NULL, other_cols = FALSE, out_names = NULL, dataset = NULL )
obj |
A |
data |
The name of a table in |
groups |
Group multiple columns per treatment/group. This should be a
vector of group IDs (e.g. character, integer) the same length as
|
cols |
The columns in
|
other_cols |
Preserve in the output non-target columns present in the input data. New columns will always be on the end. The "taxon_id" column will be preserved in the front. Takes one of the following inputs:
|
out_names |
The names of count columns in the output. Must be the same
length and order as |
dataset |
DEPRECIATED. use "data" instead. |
A tibble
Other calculations:
calc_diff_abund_deseq2()
,
calc_group_mean()
,
calc_group_rsd()
,
calc_group_stat()
,
calc_n_samples()
,
calc_obs_props()
,
calc_prop_samples()
,
calc_taxon_abund()
,
compare_groups()
,
counts_to_presence()
,
rarefy_obs()
,
zero_low_counts()
## Not run: # Parse data for examples x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$") # Calculate the medians for each group calc_group_median(x, "tax_data", hmp_samples$sex) # Use only some columns calc_group_median(x, "tax_data", hmp_samples$sex[4:20], cols = hmp_samples$sample_id[4:20]) # Including all other columns in ouput calc_group_median(x, "tax_data", groups = hmp_samples$sex, other_cols = TRUE) # Inlcuding specific columns in output calc_group_median(x, "tax_data", groups = hmp_samples$sex, other_cols = 2) calc_group_median(x, "tax_data", groups = hmp_samples$sex, other_cols = "otu_id") # Rename output columns calc_group_median(x, "tax_data", groups = hmp_samples$sex, out_names = c("Women", "Men")) ## End(Not run)
## Not run: # Parse data for examples x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$") # Calculate the medians for each group calc_group_median(x, "tax_data", hmp_samples$sex) # Use only some columns calc_group_median(x, "tax_data", hmp_samples$sex[4:20], cols = hmp_samples$sample_id[4:20]) # Including all other columns in ouput calc_group_median(x, "tax_data", groups = hmp_samples$sex, other_cols = TRUE) # Inlcuding specific columns in output calc_group_median(x, "tax_data", groups = hmp_samples$sex, other_cols = 2) calc_group_median(x, "tax_data", groups = hmp_samples$sex, other_cols = "otu_id") # Rename output columns calc_group_median(x, "tax_data", groups = hmp_samples$sex, out_names = c("Women", "Men")) ## End(Not run)
For a given table in a taxmap
object, split columns by a
grouping factor and return the relative standard deviation for each row in a
table. The relative standard deviation is the standard deviation divided by
the mean of a set of numbers. It is useful for comparing the variation when
magnitude of sets of number are very different.
calc_group_rsd( obj, data, groups, cols = NULL, other_cols = FALSE, out_names = NULL, dataset = NULL )
calc_group_rsd( obj, data, groups, cols = NULL, other_cols = FALSE, out_names = NULL, dataset = NULL )
obj |
A |
data |
The name of a table in |
groups |
Group multiple columns per treatment/group. This should be a
vector of group IDs (e.g. character, integer) the same length as
|
cols |
The columns in
|
other_cols |
Preserve in the output non-target columns present in the input data. New columns will always be on the end. The "taxon_id" column will be preserved in the front. Takes one of the following inputs:
|
out_names |
The names of count columns in the output. Must be the same
length and order as |
dataset |
DEPRECIATED. use "data" instead. |
A tibble
Other calculations:
calc_diff_abund_deseq2()
,
calc_group_mean()
,
calc_group_median()
,
calc_group_stat()
,
calc_n_samples()
,
calc_obs_props()
,
calc_prop_samples()
,
calc_taxon_abund()
,
compare_groups()
,
counts_to_presence()
,
rarefy_obs()
,
zero_low_counts()
## Not run: # Parse data for examples x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$") # Calculate the RSD for each group calc_group_rsd(x, "tax_data", hmp_samples$sex) # Use only some columns calc_group_rsd(x, "tax_data", hmp_samples$sex[4:20], cols = hmp_samples$sample_id[4:20]) # Including all other columns in ouput calc_group_rsd(x, "tax_data", groups = hmp_samples$sex, other_cols = TRUE) # Inlcuding specific columns in output calc_group_rsd(x, "tax_data", groups = hmp_samples$sex, other_cols = 2) calc_group_rsd(x, "tax_data", groups = hmp_samples$sex, other_cols = "otu_id") # Rename output columns calc_group_rsd(x, "tax_data", groups = hmp_samples$sex, out_names = c("Women", "Men")) ## End(Not run)
## Not run: # Parse data for examples x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$") # Calculate the RSD for each group calc_group_rsd(x, "tax_data", hmp_samples$sex) # Use only some columns calc_group_rsd(x, "tax_data", hmp_samples$sex[4:20], cols = hmp_samples$sample_id[4:20]) # Including all other columns in ouput calc_group_rsd(x, "tax_data", groups = hmp_samples$sex, other_cols = TRUE) # Inlcuding specific columns in output calc_group_rsd(x, "tax_data", groups = hmp_samples$sex, other_cols = 2) calc_group_rsd(x, "tax_data", groups = hmp_samples$sex, other_cols = "otu_id") # Rename output columns calc_group_rsd(x, "tax_data", groups = hmp_samples$sex, out_names = c("Women", "Men")) ## End(Not run)
For a given table in a taxmap
object, apply a function to
rows in groups of columns. The result of the function is used to create new
columns. This is equivalent to splitting columns of a table by a factor and
using apply
on each group.
calc_group_stat( obj, data, func, groups = NULL, cols = NULL, other_cols = FALSE, out_names = NULL, dataset = NULL )
calc_group_stat( obj, data, func, groups = NULL, cols = NULL, other_cols = FALSE, out_names = NULL, dataset = NULL )
obj |
A |
data |
The name of a table in |
func |
The function to apply. It should take a vector and return a
single value. For example, |
groups |
Group multiple columns per treatment/group. This should be a
vector of group IDs (e.g. character, integer) the same length as
|
cols |
The columns in
|
other_cols |
Preserve in the output non-target columns present in the input data. New columns will always be on the end. The "taxon_id" column will be preserved in the front. Takes one of the following inputs:
|
out_names |
The names of count columns in the output. Must be the same
length and order as |
dataset |
DEPRECIATED. use "data" instead. |
A tibble
Other calculations:
calc_diff_abund_deseq2()
,
calc_group_mean()
,
calc_group_median()
,
calc_group_rsd()
,
calc_n_samples()
,
calc_obs_props()
,
calc_prop_samples()
,
calc_taxon_abund()
,
compare_groups()
,
counts_to_presence()
,
rarefy_obs()
,
zero_low_counts()
## Not run: # Parse data for examples x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$") # Apply a function to every value without grouping calc_group_stat(x, "tax_data", function(v) v > 3) # Calculate the means for each group calc_group_stat(x, "tax_data", mean, groups = hmp_samples$sex) # Calculate the variation for each group calc_group_stat(x, "tax_data", sd, groups = hmp_samples$body_site) # Different ways to use only some columns calc_group_stat(x, "tax_data", function(v) v > 3, cols = c("700035949", "700097855", "700100489")) calc_group_stat(x, "tax_data", function(v) v > 3, cols = 4:6) calc_group_stat(x, "tax_data", function(v) v > 3, cols = startsWith(colnames(x$data$tax_data), "70001")) # Including all other columns in ouput calc_group_stat(x, "tax_data", mean, groups = hmp_samples$sex, other_cols = TRUE) # Inlcuding specific columns in output calc_group_stat(x, "tax_data", mean, groups = hmp_samples$sex, other_cols = 2) calc_group_stat(x, "tax_data", mean, groups = hmp_samples$sex, other_cols = "otu_id") # Rename output columns calc_group_stat(x, "tax_data", mean, groups = hmp_samples$sex, out_names = c("Women", "Men")) ## End(Not run)
## Not run: # Parse data for examples x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$") # Apply a function to every value without grouping calc_group_stat(x, "tax_data", function(v) v > 3) # Calculate the means for each group calc_group_stat(x, "tax_data", mean, groups = hmp_samples$sex) # Calculate the variation for each group calc_group_stat(x, "tax_data", sd, groups = hmp_samples$body_site) # Different ways to use only some columns calc_group_stat(x, "tax_data", function(v) v > 3, cols = c("700035949", "700097855", "700100489")) calc_group_stat(x, "tax_data", function(v) v > 3, cols = 4:6) calc_group_stat(x, "tax_data", function(v) v > 3, cols = startsWith(colnames(x$data$tax_data), "70001")) # Including all other columns in ouput calc_group_stat(x, "tax_data", mean, groups = hmp_samples$sex, other_cols = TRUE) # Inlcuding specific columns in output calc_group_stat(x, "tax_data", mean, groups = hmp_samples$sex, other_cols = 2) calc_group_stat(x, "tax_data", mean, groups = hmp_samples$sex, other_cols = "otu_id") # Rename output columns calc_group_stat(x, "tax_data", mean, groups = hmp_samples$sex, out_names = c("Women", "Men")) ## End(Not run)
For a given table in a taxmap
object, count the number of
samples (i.e. columns) with greater than a minimum value.
calc_n_samples( obj, data, cols = NULL, groups = "n_samples", other_cols = FALSE, out_names = NULL, drop = FALSE, more_than = 0, dataset = NULL )
calc_n_samples( obj, data, cols = NULL, groups = "n_samples", other_cols = FALSE, out_names = NULL, drop = FALSE, more_than = 0, dataset = NULL )
obj |
A |
data |
The name of a table in |
cols |
The columns in
|
groups |
Group multiple columns per treatment/group. This should be a
vector of group IDs (e.g. character, integer) the same length as
|
other_cols |
Preserve in the output non-target columns present in the input data. New columns will always be on the end. The "taxon_id" column will be preserved in the front. Takes one of the following inputs:
|
out_names |
The names of count columns in the output. Must be the same
length and order as |
drop |
If |
more_than |
A sample must have greater than this value for it to be counted as present. |
dataset |
DEPRECIATED. use "data" instead. |
A tibble
Other calculations:
calc_diff_abund_deseq2()
,
calc_group_mean()
,
calc_group_median()
,
calc_group_rsd()
,
calc_group_stat()
,
calc_obs_props()
,
calc_prop_samples()
,
calc_taxon_abund()
,
compare_groups()
,
counts_to_presence()
,
rarefy_obs()
,
zero_low_counts()
## Not run: # Parse data for example x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$") # Count samples with at least one read calc_n_samples(x, data = "tax_data") # Count samples with at least 5 reads calc_n_samples(x, data = "tax_data", more_than = 5) # Return a vector instead of a table calc_n_samples(x, data = "tax_data", drop = TRUE) # Only use some columns calc_n_samples(x, data = "tax_data", cols = hmp_samples$sample_id[1:5]) # Return a count for each treatment calc_n_samples(x, data = "tax_data", groups = hmp_samples$body_site) # Rename output columns calc_n_samples(x, data = "tax_data", groups = hmp_samples$body_site, out_names = c("A", "B", "C", "D", "E")) # Preserve other columns from input calc_n_samples(x, data = "tax_data", other_cols = TRUE) calc_n_samples(x, data = "tax_data", other_cols = 2) calc_n_samples(x, data = "tax_data", other_cols = "otu_id") ## End(Not run)
## Not run: # Parse data for example x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$") # Count samples with at least one read calc_n_samples(x, data = "tax_data") # Count samples with at least 5 reads calc_n_samples(x, data = "tax_data", more_than = 5) # Return a vector instead of a table calc_n_samples(x, data = "tax_data", drop = TRUE) # Only use some columns calc_n_samples(x, data = "tax_data", cols = hmp_samples$sample_id[1:5]) # Return a count for each treatment calc_n_samples(x, data = "tax_data", groups = hmp_samples$body_site) # Rename output columns calc_n_samples(x, data = "tax_data", groups = hmp_samples$body_site, out_names = c("A", "B", "C", "D", "E")) # Preserve other columns from input calc_n_samples(x, data = "tax_data", other_cols = TRUE) calc_n_samples(x, data = "tax_data", other_cols = 2) calc_n_samples(x, data = "tax_data", other_cols = "otu_id") ## End(Not run)
For a given table in a taxmap
object, convert one or more
columns containing counts to proportions. This is meant to be used with
counts associated with observations (e.g. OTUs), as opposed to counts that
have already been summed per taxon.
calc_obs_props( obj, data, cols = NULL, groups = NULL, other_cols = FALSE, out_names = NULL, dataset = NULL )
calc_obs_props( obj, data, cols = NULL, groups = NULL, other_cols = FALSE, out_names = NULL, dataset = NULL )
obj |
A |
data |
The name of a table in |
cols |
The columns in
|
groups |
Group multiple columns per treatment/group. This should be a
vector of group IDs (e.g. character, integer) the same length as
|
other_cols |
Preserve in the output non-target columns present in the input data. New columns will always be on the end. The "taxon_id" column will be preserved in the front. Takes one of the following inputs:
|
out_names |
The names of count columns in the output. Must be the same
length and order as |
dataset |
DEPRECIATED. use "data" instead. |
A tibble
Other calculations:
calc_diff_abund_deseq2()
,
calc_group_mean()
,
calc_group_median()
,
calc_group_rsd()
,
calc_group_stat()
,
calc_n_samples()
,
calc_prop_samples()
,
calc_taxon_abund()
,
compare_groups()
,
counts_to_presence()
,
rarefy_obs()
,
zero_low_counts()
## Not run: # Parse data for examples x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$") # Calculate proportions for all numeric columns calc_obs_props(x, "tax_data") # Calculate proportions for a subset of columns calc_obs_props(x, "tax_data", cols = c("700035949", "700097855", "700100489")) calc_obs_props(x, "tax_data", cols = 4:6) calc_obs_props(x, "tax_data", cols = startsWith(colnames(x$data$tax_data), "70001")) # Including all other columns in ouput calc_obs_props(x, "tax_data", other_cols = TRUE) # Inlcuding specific columns in output calc_obs_props(x, "tax_data", cols = c("700035949", "700097855", "700100489"), other_cols = 2:3) # Rename output columns calc_obs_props(x, "tax_data", cols = c("700035949", "700097855", "700100489"), out_names = c("a", "b", "c")) # Get proportions for groups of samples calc_obs_props(x, "tax_data", groups = hmp_samples$sex) calc_obs_props(x, "tax_data", groups = hmp_samples$sex, out_names = c("Women", "Men")) ## End(Not run)
## Not run: # Parse data for examples x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$") # Calculate proportions for all numeric columns calc_obs_props(x, "tax_data") # Calculate proportions for a subset of columns calc_obs_props(x, "tax_data", cols = c("700035949", "700097855", "700100489")) calc_obs_props(x, "tax_data", cols = 4:6) calc_obs_props(x, "tax_data", cols = startsWith(colnames(x$data$tax_data), "70001")) # Including all other columns in ouput calc_obs_props(x, "tax_data", other_cols = TRUE) # Inlcuding specific columns in output calc_obs_props(x, "tax_data", cols = c("700035949", "700097855", "700100489"), other_cols = 2:3) # Rename output columns calc_obs_props(x, "tax_data", cols = c("700035949", "700097855", "700100489"), out_names = c("a", "b", "c")) # Get proportions for groups of samples calc_obs_props(x, "tax_data", groups = hmp_samples$sex) calc_obs_props(x, "tax_data", groups = hmp_samples$sex, out_names = c("Women", "Men")) ## End(Not run)
For a given table in a taxmap
object, calculate the
proportion of samples (i.e. columns) with greater than a minimum value.
calc_prop_samples( obj, data, cols = NULL, groups = "prop_samples", other_cols = FALSE, out_names = NULL, drop = FALSE, more_than = 0, dataset = NULL )
calc_prop_samples( obj, data, cols = NULL, groups = "prop_samples", other_cols = FALSE, out_names = NULL, drop = FALSE, more_than = 0, dataset = NULL )
obj |
A |
data |
The name of a table in |
cols |
The columns in
|
groups |
Group multiple columns per treatment/group. This should be a
vector of group IDs (e.g. character, integer) the same length as
|
other_cols |
Preserve in the output non-target columns present in the input data. New columns will always be on the end. The "taxon_id" column will be preserved in the front. Takes one of the following inputs:
|
out_names |
The names of count columns in the output. Must be the same
length and order as |
drop |
If |
more_than |
A sample must have greater than this value for it to be counted as present. |
dataset |
DEPRECIATED. use "data" instead. |
A tibble
Other calculations:
calc_diff_abund_deseq2()
,
calc_group_mean()
,
calc_group_median()
,
calc_group_rsd()
,
calc_group_stat()
,
calc_n_samples()
,
calc_obs_props()
,
calc_taxon_abund()
,
compare_groups()
,
counts_to_presence()
,
rarefy_obs()
,
zero_low_counts()
## Not run: # Parse data for example x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$") # Count samples with at least one read calc_prop_samples(x, data = "tax_data") # Count samples with at least 5 reads calc_prop_samples(x, data = "tax_data", more_than = 5) # Return a vector instead of a table calc_prop_samples(x, data = "tax_data", drop = TRUE) # Only use some columns calc_prop_samples(x, data = "tax_data", cols = hmp_samples$sample_id[1:5]) # Return a count for each treatment calc_prop_samples(x, data = "tax_data", groups = hmp_samples$body_site) # Rename output columns calc_prop_samples(x, data = "tax_data", groups = hmp_samples$body_site, out_names = c("A", "B", "C", "D", "E")) # Preserve other columns from input calc_prop_samples(x, data = "tax_data", other_cols = TRUE) calc_prop_samples(x, data = "tax_data", other_cols = 2) calc_prop_samples(x, data = "tax_data", other_cols = "otu_id") ## End(Not run)
## Not run: # Parse data for example x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$") # Count samples with at least one read calc_prop_samples(x, data = "tax_data") # Count samples with at least 5 reads calc_prop_samples(x, data = "tax_data", more_than = 5) # Return a vector instead of a table calc_prop_samples(x, data = "tax_data", drop = TRUE) # Only use some columns calc_prop_samples(x, data = "tax_data", cols = hmp_samples$sample_id[1:5]) # Return a count for each treatment calc_prop_samples(x, data = "tax_data", groups = hmp_samples$body_site) # Rename output columns calc_prop_samples(x, data = "tax_data", groups = hmp_samples$body_site, out_names = c("A", "B", "C", "D", "E")) # Preserve other columns from input calc_prop_samples(x, data = "tax_data", other_cols = TRUE) calc_prop_samples(x, data = "tax_data", other_cols = 2) calc_prop_samples(x, data = "tax_data", other_cols = "otu_id") ## End(Not run)
Get character vector classifications of taxa in an object of type [taxonomy()] or [taxmap()] composed of data associated with taxa. Each classification is constructed by concatenating the data of the given taxon and all of its supertaxa.
obj$classifications(value = "taxon_names", sep = ";") classifications(obj, value = "taxon_names", sep = ";")
obj |
([taxonomy()] or [taxmap()]) |
value |
What data to return. Any result of 'all_names(obj)' can be used, but it usually only makes sense to data that corresponds to taxa 1:1, such as [taxon_ranks()]. By default, taxon indexes are returned. |
sep |
('character' of length 1) The character(s) to place between taxon IDs |
'character'
Other taxonomy data functions:
id_classifications()
,
is_branch()
,
is_internode()
,
is_leaf()
,
is_root()
,
is_stem()
,
map_data()
,
map_data_()
,
n_leaves()
,
n_leaves_1()
,
n_subtaxa()
,
n_subtaxa_1()
,
n_supertaxa()
,
n_supertaxa_1()
,
taxon_ids()
,
taxon_indexes()
,
taxon_names()
,
taxon_ranks()
# Defualt settings returns taxon names separated by ; classifications(ex_taxmap) # Other values can be returned besides taxon names classifications(ex_taxmap, value = "taxon_ids") # The separator can also be changed classifications(ex_taxmap, value = "taxon_ranks", sep = "||")
# Defualt settings returns taxon names separated by ; classifications(ex_taxmap) # Other values can be returned besides taxon names classifications(ex_taxmap, value = "taxon_ids") # The separator can also be changed classifications(ex_taxmap, value = "taxon_ranks", sep = "||")
Apply a function to compare data, usually abundance, from pairs of
treatments/groups. By default, every pairwise combination of treatments are
compared. A custom function can be supplied to perform the comparison. The
plotting function heat_tree_matrix
is useful for visualizing
these results.
compare_groups( obj, data, cols, groups, func = NULL, combinations = NULL, other_cols = FALSE, dataset = NULL )
compare_groups( obj, data, cols, groups, func = NULL, combinations = NULL, other_cols = FALSE, dataset = NULL )
obj |
A |
data |
The name of a table in |
cols |
The names/indexes of columns in
|
groups |
A vector defining how samples are grouped into "treatments". Must be the same
order and length as |
func |
The function to apply for each comparison. For each row in
function(abund_1, abund_2) { log_ratio <- log2(median(abund_1) / median(abund_2)) if (is.nan(log_ratio)) { log_ratio <- 0 } list(log2_median_ratio = log_ratio, median_diff = median(abund_1) - median(abund_2), mean_diff = mean(abund_1) - mean(abund_2), wilcox_p_value = wilcox.test(abund_1, abund_2)$p.value) } |
combinations |
Which combinations of groups to use. Must be a list of vectors, each containing the names of 2 groups to compare. By default, all pairwise combinations of groups are compared. |
other_cols |
If |
dataset |
DEPRECIATED. use "data" instead. |
A tibble
Other calculations:
calc_diff_abund_deseq2()
,
calc_group_mean()
,
calc_group_median()
,
calc_group_rsd()
,
calc_group_stat()
,
calc_n_samples()
,
calc_obs_props()
,
calc_prop_samples()
,
calc_taxon_abund()
,
counts_to_presence()
,
rarefy_obs()
,
zero_low_counts()
## Not run: # Parse data for plotting x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$") # Convert counts to proportions x$data$otu_table <- calc_obs_props(x, data = "tax_data", cols = hmp_samples$sample_id) # Get per-taxon counts x$data$tax_table <- calc_taxon_abund(x, data = "otu_table", cols = hmp_samples$sample_id) # Calculate difference between groups x$data$diff_table <- compare_groups(x, data = "tax_table", cols = hmp_samples$sample_id, groups = hmp_samples$body_site) # Plot results (might take a few minutes) heat_tree_matrix(x, data = "diff_table", node_size = n_obs, node_label = taxon_names, node_color = log2_median_ratio, node_color_range = diverging_palette(), node_color_trans = "linear", node_color_interval = c(-3, 3), edge_color_interval = c(-3, 3), node_size_axis_label = "Number of OTUs", node_color_axis_label = "Log2 ratio median proportions") # How to get results for only some pairs of groups compare_groups(x, data = "tax_table", cols = hmp_samples$sample_id, groups = hmp_samples$body_site, combinations = list(c('Nose', 'Saliva'), c('Skin', 'Throat'))) ## End(Not run)
## Not run: # Parse data for plotting x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$") # Convert counts to proportions x$data$otu_table <- calc_obs_props(x, data = "tax_data", cols = hmp_samples$sample_id) # Get per-taxon counts x$data$tax_table <- calc_taxon_abund(x, data = "otu_table", cols = hmp_samples$sample_id) # Calculate difference between groups x$data$diff_table <- compare_groups(x, data = "tax_table", cols = hmp_samples$sample_id, groups = hmp_samples$body_site) # Plot results (might take a few minutes) heat_tree_matrix(x, data = "diff_table", node_size = n_obs, node_label = taxon_names, node_color = log2_median_ratio, node_color_range = diverging_palette(), node_color_trans = "linear", node_color_interval = c(-3, 3), edge_color_interval = c(-3, 3), node_size_axis_label = "Number of OTUs", node_color_axis_label = "Log2 ratio median proportions") # How to get results for only some pairs of groups compare_groups(x, data = "tax_table", cols = hmp_samples$sample_id, groups = hmp_samples$body_site, combinations = list(c('Nose', 'Saliva'), c('Skin', 'Throat'))) ## End(Not run)
Find the complement of one or more sequences stored as a character
vector. This is a wrapper for comp
for character
vectors instead of lists of character vectors with one value per letter.
IUPAC ambiguity code are handled and the upper/lower case is preserved.
complement(seqs)
complement(seqs)
seqs |
A character vector with one element per sequence. |
Other sequence transformations:
rev_comp()
,
reverse()
complement(c("aagtgGGTGaa", "AAGTGGT"))
complement(c("aagtgGGTGaa", "AAGTGGT"))
For a given table in a taxmap
object, apply a function to
rows in groups of columns. The result of the function is used to create new
columns. This is equivalent to splitting columns of a table by a factor and
using apply
on each group.
counts_to_presence( obj, data, threshold = 0, groups = NULL, cols = NULL, other_cols = FALSE, out_names = NULL, dataset = NULL )
counts_to_presence( obj, data, threshold = 0, groups = NULL, cols = NULL, other_cols = FALSE, out_names = NULL, dataset = NULL )
obj |
A |
data |
The name of a table in |
threshold |
The value a number must be greater than to count as present. By, default, anything above 0 is considered present. |
groups |
Group multiple columns per treatment/group. This should be a
vector of group IDs (e.g. character, integer) the same length as
|
cols |
The columns in
|
other_cols |
Preserve in the output non-target columns present in the input data. New columns will always be on the end. The "taxon_id" column will be preserved in the front. Takes one of the following inputs:
|
out_names |
The names of count columns in the output. Must be the same
length and order as |
dataset |
DEPRECIATED. use "data" instead. |
A tibble
Other calculations:
calc_diff_abund_deseq2()
,
calc_group_mean()
,
calc_group_median()
,
calc_group_rsd()
,
calc_group_stat()
,
calc_n_samples()
,
calc_obs_props()
,
calc_prop_samples()
,
calc_taxon_abund()
,
compare_groups()
,
rarefy_obs()
,
zero_low_counts()
## Not run: # Parse data for examples x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$") # Convert count to presence/absence counts_to_presence(x, "tax_data") # Check if there are any reads in each group of samples counts_to_presence(x, "tax_data", groups = hmp_samples$body_site) ## End(Not run)
## Not run: # Parse data for examples x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$") # Convert count to presence/absence counts_to_presence(x, "tax_data") # Check if there are any reads in each group of samples counts_to_presence(x, "tax_data", groups = hmp_samples$body_site) ## End(Not run)
The list of known databases. Not currently used much, but will be when we add more check for taxon IDs and taxon ranks from particular databases.
database_list
database_list
An object of class list
of length 8.
List of databases with pre-filled details, where each has the format:
url: A base URL for the database source.
description: Description of the database source.
id regex: identifier regex.
[taxon_database]
database_list database_list$ncbi database_list$ncbi$name database_list$ncbi$description database_list$ncbi$url
database_list database_list$ncbi database_list$ncbi$name database_list$ncbi$description database_list$ncbi$url
Returns the default color palette for diverging data
diverging_palette()
diverging_palette()
character
of hex color codes
diverging_palette()
diverging_palette()
An example hierarchies object built from the ground up.
A [hierarchies()] object.
Created from the example code in the [hierarchies()] documentation.
Other taxa-datasets:
ex_hierarchy1
,
ex_hierarchy2
,
ex_hierarchy3
,
ex_taxmap
An example Hierarchy object built from the ground up.
A [hierarchy()] object with
name: Poaceae / rank: family / id: 4479
name: Poa / rank: genus / id: 4544
name: Poa annua / rank: species / id: 93036
Based on NCBI taxonomic classification
Created from the example code in the [hierarchy()] documentation.
Other taxa-datasets:
ex_hierarchies
,
ex_hierarchy2
,
ex_hierarchy3
,
ex_taxmap
An example Hierarchy object built from the ground up.
A [hierarchy()] object with
name: Felidae / rank: family / id: 9681
name: Puma / rank: genus / id: 146712
name: Puma concolor / rank: species / id: 9696
Based on NCBI taxonomic classification
Created from the example code in the [hierarchy()] documentation.
Other taxa-datasets:
ex_hierarchies
,
ex_hierarchy1
,
ex_hierarchy3
,
ex_taxmap
An example Hierarchy object built from the ground up.
A [hierarchy()] object with
name: Chordata / rank: phylum / id: 158852
name: Vertebrata / rank: subphylum / id: 331030
name: Teleostei / rank: class / id: 161105
name: Salmonidae / rank: family / id: 161931
name: Salmo / rank: genus / id: 161994
name: Salmo salar / rank: species / id: 161996
Based on ITIS taxonomic classification
Created from the example code in the [hierarchy()] documentation.
Other taxa-datasets:
ex_hierarchies
,
ex_hierarchy1
,
ex_hierarchy2
,
ex_taxmap
An example taxmap object built from the ground up. Typically, data stored in taxmap would be parsed from an input file, but this data set is just for demonstration purposes.
A [taxmap()] object.
Created from the example code in the [taxmap()] documentation.
Other taxa-datasets:
ex_hierarchies
,
ex_hierarchy1
,
ex_hierarchy2
,
ex_hierarchy3
Convert taxonomic information in a character vector into a [taxmap()] object. The location and identity of important information in the input is specified using a [regular expression](https://en.wikipedia.org/wiki/Regular_expression) with capture groups and a corresponding key. An object of type [taxmap()] is returned containing the specified information. See the 'key' option for accepted sources of taxonomic information.
extract_tax_data( tax_data, key, regex, class_key = "taxon_name", class_regex = "(.*)", class_sep = NULL, sep_is_regex = FALSE, class_rev = FALSE, database = "ncbi", include_match = FALSE, include_tax_data = TRUE )
extract_tax_data( tax_data, key, regex, class_key = "taxon_name", class_regex = "(.*)", class_sep = NULL, sep_is_regex = FALSE, class_rev = FALSE, database = "ncbi", include_match = FALSE, include_tax_data = TRUE )
tax_data |
A vector from which to extract taxonomy information. |
key |
('character') The identity of the capturing groups defined using 'regex'. The length of 'key' must be equal to the number of capturing groups specified in 'regex'. Any names added to the terms will be used as column names in the output. Only '"info"' can be used multiple times. Each term must be one of those described below: * 'taxon_id': A unique numeric id for a taxon for a particular 'database' (e.g. ncbi accession number). Requires an internet connection. * 'taxon_name': The name of a taxon (e.g. "Mammalia" or "Homo sapiens"). Not necessarily unique, but interpretable by a particular 'database'. Requires an internet connection. * 'fuzzy_name': The name of a taxon, but check for misspellings first. Only use if you think there are misspellings. Using '"taxon_name"' is faster. * 'class': A list of taxon information that constitutes the full taxonomic classification (e.g. "K_Mammalia;P_Carnivora;C_Felidae"). Individual taxa are separated by the 'class_sep' argument and the information is parsed by the 'class_regex' and 'class_key' arguments. * 'seq_id': Sequence ID for a particular database that is associated with a taxonomic classification. Currently only works with the "ncbi" database. * 'info': Arbitrary taxon info you want included in the output. Can be used more than once. |
regex |
('character' of length 1) A regular expression with capturing groups indicating the locations of relevant information. The identity of the information must be specified using the 'key' argument. |
class_key |
('character' of length 1) The identity of the capturing groups defined using 'class_regex'. The length of 'class_key' must be equal to the number of capturing groups specified in 'class_regex'. Any names added to the terms will be used as column names in the output. Only '"info"' can be used multiple times. Each term must be one of those described below: * 'taxon_name': The name of a taxon. Not necessarily unique. * 'taxon_rank': The rank of the taxon. This will be used to add rank info into the output object that can be accessed by 'out$taxon_ranks()'. * 'info': Arbitrary taxon info you want included in the output. Can be used more than once. |
class_regex |
('character' of length 1) A regular expression with capturing groups indicating the locations of data for each taxon in the 'class' term in the 'key' argument. The identity of the information must be specified using the 'class_key' argument. The 'class_sep' option can be used to split the classification into data for each taxon before matching. If 'class_sep' is 'NULL', each match of 'class_regex' defines a taxon in the classification. |
class_sep |
('character' of length 1) Used with the 'class' term in the 'key' argument. The character(s) used to separate individual taxa within a classification. After the string defined by the 'class' capture group in 'regex' is split by 'class_sep', its capture groups are extracted by 'class_regex' and defined by 'class_key'. If 'NULL', every match of 'class_regex' is used instead with first splitting by 'class_sep'. |
sep_is_regex |
('TRUE'/'FALSE') Whether or not 'class_sep' should be used as a [regular expression](https://en.wikipedia.org/wiki/Regular_expression). |
class_rev |
('logical' of length 1) Used with the 'class' term in the 'key' argument. If 'TRUE', the order of taxon data in a classification is reversed to be specific to broad. |
database |
('character' of length 1) The name of the database that patterns given in 'parser' will apply to. Valid databases include "ncbi", "itis", "eol", "col", "tropicos", "nbn", and "none". '"none"' will cause no database to be queried; use this if you want to not use the internet. NOTE: Only '"ncbi"' has been tested extensively so far. |
include_match |
('logical' of length 1) If 'TRUE', include the part of the input matched by 'regex' in the output object. |
include_tax_data |
('TRUE'/'FALSE') Whether or not to include 'tax_data' as a dataset. |
Returns an object of type [taxmap()]
If you have invalid inputs or a download fails for another reason, then there will be a "unknown" taxon ID as a placeholder and failed inputs will be assigned to this ID. You can remove these using [filter_taxa()] like so: 'filter_taxa(result, taxon_ids != "unknown")'. Add 'drop_obs = FALSE' if you want the input data, but want to remove the taxon.
Other parsers:
lookup_tax_data()
,
parse_dada2()
,
parse_edge_list()
,
parse_greengenes()
,
parse_mothur_tax_summary()
,
parse_mothur_taxonomy()
,
parse_newick()
,
parse_phylo()
,
parse_phyloseq()
,
parse_qiime_biom()
,
parse_rdp()
,
parse_silva_fasta()
,
parse_tax_data()
,
parse_ubiome()
,
parse_unite_general()
## Not run: # For demonstration purposes, the following example dataset has all the # types of data that can be used, but any one of them alone would work. raw_data <- c( ">id:AB548412-tid:9689-Panthera leo-tax:K_Mammalia;P_Carnivora;C_Felidae;G_Panthera;S_leo", ">id:FJ358423-tid:9694-Panthera tigris-tax:K_Mammalia;P_Carnivora;C_Felidae;G_Panthera;S_tigris", ">id:DQ334818-tid:9643-Ursus americanus-tax:K_Mammalia;P_Carnivora;C_Felidae;G_Ursus;S_americanus" ) # Build a taxmap object from classifications extract_tax_data(raw_data, key = c(my_seq = "info", my_tid = "info", org = "info", tax = "class"), regex = "^>id:(.+)-tid:(.+)-(.+)-tax:(.+)$", class_sep = ";", class_regex = "^(.+)_(.+)$", class_key = c(my_rank = "info", tax_name = "taxon_name")) # Build a taxmap object from taxon ids # Note: this requires an internet connection extract_tax_data(raw_data, key = c(my_seq = "info", my_tid = "taxon_id", org = "info", tax = "info"), regex = "^>id:(.+)-tid:(.+)-(.+)-tax:(.+)$") # Build a taxmap object from ncbi sequence accession numbers # Note: this requires an internet connection extract_tax_data(raw_data, key = c(my_seq = "seq_id", my_tid = "info", org = "info", tax = "info"), regex = "^>id:(.+)-tid:(.+)-(.+)-tax:(.+)$") # Build a taxmap object from taxon names # Note: this requires an internet connection extract_tax_data(raw_data, key = c(my_seq = "info", my_tid = "info", org = "taxon_name", tax = "info"), regex = "^>id:(.+)-tid:(.+)-(.+)-tax:(.+)$") ## End(Not run)
## Not run: # For demonstration purposes, the following example dataset has all the # types of data that can be used, but any one of them alone would work. raw_data <- c( ">id:AB548412-tid:9689-Panthera leo-tax:K_Mammalia;P_Carnivora;C_Felidae;G_Panthera;S_leo", ">id:FJ358423-tid:9694-Panthera tigris-tax:K_Mammalia;P_Carnivora;C_Felidae;G_Panthera;S_tigris", ">id:DQ334818-tid:9643-Ursus americanus-tax:K_Mammalia;P_Carnivora;C_Felidae;G_Ursus;S_americanus" ) # Build a taxmap object from classifications extract_tax_data(raw_data, key = c(my_seq = "info", my_tid = "info", org = "info", tax = "class"), regex = "^>id:(.+)-tid:(.+)-(.+)-tax:(.+)$", class_sep = ";", class_regex = "^(.+)_(.+)$", class_key = c(my_rank = "info", tax_name = "taxon_name")) # Build a taxmap object from taxon ids # Note: this requires an internet connection extract_tax_data(raw_data, key = c(my_seq = "info", my_tid = "taxon_id", org = "info", tax = "info"), regex = "^>id:(.+)-tid:(.+)-(.+)-tax:(.+)$") # Build a taxmap object from ncbi sequence accession numbers # Note: this requires an internet connection extract_tax_data(raw_data, key = c(my_seq = "seq_id", my_tid = "info", org = "info", tax = "info"), regex = "^>id:(.+)-tid:(.+)-(.+)-tax:(.+)$") # Build a taxmap object from taxon names # Note: this requires an internet connection extract_tax_data(raw_data, key = c(my_seq = "info", my_tid = "info", org = "taxon_name", tax = "info"), regex = "^>id:(.+)-tid:(.+)-(.+)-tax:(.+)$") ## End(Not run)
Filter out taxa with ambiguous names, such as "unknown" or "uncultured".
NOTE: some parameters of this function are passed to
filter_taxa
with the "invert" option set to TRUE
.
Works the same way as filter_taxa
for the most part.
filter_ambiguous_taxa( obj, unknown = TRUE, uncultured = TRUE, name_regex = ".", ignore_case = TRUE, subtaxa = FALSE, drop_obs = TRUE, reassign_obs = TRUE, reassign_taxa = TRUE )
filter_ambiguous_taxa( obj, unknown = TRUE, uncultured = TRUE, name_regex = ".", ignore_case = TRUE, subtaxa = FALSE, drop_obs = TRUE, reassign_obs = TRUE, reassign_taxa = TRUE )
obj |
A |
unknown |
If |
uncultured |
If |
name_regex |
The regex code to match a valid character in a taxon name. For example, "[a-z]" would mean taxon names can only be lower case letters. |
ignore_case |
If |
subtaxa |
('logical' or 'numeric' of length 1) If 'TRUE', include subtaxa of taxa passing the filter. Positive numbers indicate the number of ranks below the target taxa to return. '0' is equivalent to 'FALSE'. Negative numbers are equivalent to 'TRUE'. |
drop_obs |
('logical') This option only applies to [taxmap()] objects.
If 'FALSE', include observations (i.e. user-defined data in 'obj$data')
even if the taxon they are assigned to is filtered out. Observations
assigned to removed taxa will be assigned to |
reassign_obs |
('logical' of length 1) This option only applies to [taxmap()] objects. If 'TRUE', observations (i.e. user-defined data in 'obj$data') assigned to removed taxa will be reassigned to the closest supertaxon that passed the filter. If there are no supertaxa of such an observation that passed the filter, they will be filtered out if 'drop_obs' is 'TRUE'. This option can be either simply 'TRUE'/'FALSE', meaning that all data sets will be treated the same, or a logical vector can be supplied with names corresponding one or more data sets in 'obj$data'. For example, 'c(abundance = TRUE, stats = FALSE)' would reassign observations in 'obj$data$abundance', but not in 'obj$data$stats'. |
reassign_taxa |
('logical' of length 1) If 'TRUE', subtaxa of removed taxa will be reassigned to the closest supertaxon that passed the filter. This is useful for removing intermediate levels of a taxonomy. |
If you encounter a taxon name that represents an ambiguous taxon that is not filtered out by this function, let us know and we will add it.
A taxmap
object
obj <- parse_tax_data(c("Plantae;Solanaceae;Solanum;lycopersicum", "Plantae;Solanaceae;Solanum;tuberosum", "Plantae;Solanaceae;Solanum;unknown", "Plantae;Solanaceae;Solanum;uncultured", "Plantae;UNIDENTIFIED")) filter_ambiguous_taxa(obj)
obj <- parse_tax_data(c("Plantae;Solanaceae;Solanum;lycopersicum", "Plantae;Solanaceae;Solanum;tuberosum", "Plantae;Solanaceae;Solanum;unknown", "Plantae;Solanaceae;Solanum;uncultured", "Plantae;UNIDENTIFIED")) filter_ambiguous_taxa(obj)
Filter data in a [taxmap()] object (in 'obj$data') with a set of conditions. See [dplyr::filter()] for the inspiration for this function and more information. Calling the function using the 'obj$filter_obs(...)' style edits "obj" in place, unlike most R functions. However, calling the function using the ‘filter_obs(obj, ...)' imitates R’s traditional copy-on-modify semantics, so "obj" would not be changed; instead a changed version would be returned, like most R functions.
obj$filter_obs(data, ..., drop_taxa = FALSE, drop_obs = TRUE, subtaxa = FALSE, supertaxa = TRUE, reassign_obs = FALSE) filter_obs(obj, data, ..., drop_taxa = FALSE, drop_obs = TRUE, subtaxa = FALSE, supertaxa = TRUE, reassign_obs = FALSE)
obj |
An object of type [taxmap()] |
data |
Dataset names, indexes, or a logical vector that indicates which datasets in 'obj$data' to filter. If multiple datasets are filterd at once, then they must be the same length. |
... |
One or more filtering conditions. Any variable name that appears in [all_names()] can be used as if it was a vector on its own. Each filtering condition can be one of two things: * 'integer': One or more dataset indexes. * 'logical': A 'TRUE'/'FALSE' vector of length equal to the number of items in the dataset. |
drop_taxa |
('logical' of length 1) If 'FALSE', preserve taxa even if all of their observations are filtered out. If 'TRUE', remove taxa for which all observations were filtered out. Note that only taxa that are unobserved due to this filtering will be removed; there might be other taxa without observations to begin with that will not be removed. |
drop_obs |
('logical') This only has an effect when 'drop_taxa' is 'TRUE'. When 'TRUE', observations for other data sets (i.e. not 'data') assigned to taxa that are removed when filtering 'data' are also removed. Otherwise, only data for taxa that are not present in all other data sets will be removed. This option can be either simply 'TRUE'/'FALSE', meaning that all data sets will be treated the same, or a logical vector can be supplied with names corresponding one or more data sets in 'obj$data'. For example, 'c(abundance = TRUE, stats = FALSE)' would remove observations in 'obj$data$abundance', but not in 'obj$data$stats'. |
subtaxa |
('logical' or 'numeric' of length 1) This only has an effect when 'drop_taxa' is 'TRUE'. If 'TRUE', include subtaxa of taxa passing the filter. Positive numbers indicate the number of ranks below the target taxa to return. '0' is equivalent to 'FALSE'. Negative numbers are equivalent to 'TRUE'. |
supertaxa |
('logical' or 'numeric' of length 1) This only has an effect when 'drop_taxa' is 'TRUE'. If 'TRUE', include supertaxa of taxa passing the filter. Positive numbers indicate the number of ranks above the target taxa to return. '0' is equivalent to 'FALSE'. Negative numbers are equivalent to 'TRUE'. |
reassign_obs |
('logical') This only has an effect when 'drop_taxa' is 'TRUE'. If 'TRUE', observations assigned to removed taxa will be reassigned to the closest supertaxon that passed the filter. If there are no supertaxa of such an observation that passed the filter, they will be filtered out if 'drop_obs' is 'TRUE'. This option can be either simply 'TRUE'/'FALSE', meaning that all data sets will be treated the same, or a logical vector can be supplied with names corresponding one or more data sets in 'obj$data'. For example, 'c(abundance = TRUE, stats = FALSE)' would reassign observations in 'obj$data$abundance', but not in 'obj$data$stats'. |
target |
DEPRECIATED. use "data" instead. |
An object of type [taxmap()]
Other taxmap manipulation functions:
arrange_obs()
,
arrange_taxa()
,
filter_taxa()
,
mutate_obs()
,
sample_frac_obs()
,
sample_frac_taxa()
,
sample_n_obs()
,
sample_n_taxa()
,
select_obs()
,
transmute_obs()
# Filter by row index filter_obs(ex_taxmap, "info", 1:2) # Filter by TRUE/FALSE filter_obs(ex_taxmap, "info", dangerous == FALSE) filter_obs(ex_taxmap, "info", dangerous == FALSE, n_legs > 0) filter_obs(ex_taxmap, "info", n_legs == 2) # Remove taxa whose obserservations were filtered out filter_obs(ex_taxmap, "info", n_legs == 2, drop_taxa = TRUE) # Preserve other data sets while removing taxa filter_obs(ex_taxmap, "info", n_legs == 2, drop_taxa = TRUE, drop_obs = c(abund = FALSE)) # When filtering taxa, do not return supertaxa of taxa that are preserved filter_obs(ex_taxmap, "info", n_legs == 2, drop_taxa = TRUE, supertaxa = FALSE) # Filter multiple datasets at once filter_obs(ex_taxmap, c("info", "phylopic_ids", "foods"), n_legs == 2)
# Filter by row index filter_obs(ex_taxmap, "info", 1:2) # Filter by TRUE/FALSE filter_obs(ex_taxmap, "info", dangerous == FALSE) filter_obs(ex_taxmap, "info", dangerous == FALSE, n_legs > 0) filter_obs(ex_taxmap, "info", n_legs == 2) # Remove taxa whose obserservations were filtered out filter_obs(ex_taxmap, "info", n_legs == 2, drop_taxa = TRUE) # Preserve other data sets while removing taxa filter_obs(ex_taxmap, "info", n_legs == 2, drop_taxa = TRUE, drop_obs = c(abund = FALSE)) # When filtering taxa, do not return supertaxa of taxa that are preserved filter_obs(ex_taxmap, "info", n_legs == 2, drop_taxa = TRUE, supertaxa = FALSE) # Filter multiple datasets at once filter_obs(ex_taxmap, c("info", "phylopic_ids", "foods"), n_legs == 2)
Filter taxa in a [taxonomy()] or [taxmap()] object with a series of conditions. Any variable name that appears in [all_names()] can be used as if it was a vector on its own. See [dplyr::filter()] for the inspiration for this function and more information. Calling the function using the 'obj$filter_taxa(...)' style edits "obj" in place, unlike most R functions. However, calling the function using the ‘filter_taxa(obj, ...)' imitates R’s traditional copy-on-modify semantics, so "obj" would not be changed; instead a changed version would be returned, like most R functions.
filter_taxa(obj, ..., subtaxa = FALSE, supertaxa = FALSE, drop_obs = TRUE, reassign_obs = TRUE, reassign_taxa = TRUE, invert = FALSE, keep_order = TRUE) obj$filter_taxa(..., subtaxa = FALSE, supertaxa = FALSE, drop_obs = TRUE, reassign_obs = TRUE, reassign_taxa = TRUE, invert = FALSE, keep_order = TRUE)
obj |
An object of class [taxonomy()] or [taxmap()] |
... |
One or more filtering conditions. Any variable name that appears in [all_names()] can be used as if it was a vector on its own. Each filtering condition must resolve to one of three things: * 'character': One or more taxon IDs contained in 'obj$edge_list$to' * 'integer': One or more row indexes of 'obj$edge_list' * 'logical': A 'TRUE'/'FALSE' vector of length equal to the number of rows in 'obj$edge_list' * 'NULL': ignored |
subtaxa |
('logical' or 'numeric' of length 1) If 'TRUE', include subtaxa of taxa passing the filter. Positive numbers indicate the number of ranks below the target taxa to return. '0' is equivalent to 'FALSE'. Negative numbers are equivalent to 'TRUE'. |
supertaxa |
('logical' or 'numeric' of length 1) If 'TRUE', include supertaxa of taxa passing the filter. Positive numbers indicate the number of ranks above the target taxa to return. '0' is equivalent to 'FALSE'. Negative numbers are equivalent to 'TRUE'. |
drop_obs |
('logical') This option only applies to [taxmap()] objects.
If 'FALSE', include observations (i.e. user-defined data in 'obj$data')
even if the taxon they are assigned to is filtered out. Observations
assigned to removed taxa will be assigned to |
reassign_obs |
('logical' of length 1) This option only applies to [taxmap()] objects. If 'TRUE', observations (i.e. user-defined data in 'obj$data') assigned to removed taxa will be reassigned to the closest supertaxon that passed the filter. If there are no supertaxa of such an observation that passed the filter, they will be filtered out if 'drop_obs' is 'TRUE'. This option can be either simply 'TRUE'/'FALSE', meaning that all data sets will be treated the same, or a logical vector can be supplied with names corresponding one or more data sets in 'obj$data'. For example, 'c(abundance = TRUE, stats = FALSE)' would reassign observations in 'obj$data$abundance', but not in 'obj$data$stats'. |
reassign_taxa |
('logical' of length 1) If 'TRUE', subtaxa of removed taxa will be reassigned to the closest supertaxon that passed the filter. This is useful for removing intermediate levels of a taxonomy. |
invert |
('logical' of length 1) If 'TRUE', do NOT include the selection. This is different than just replacing a '==' with a '!=' because this option negates the selection after taking into account the 'subtaxa' and 'supertaxa' options. This is useful for removing a taxon and all its subtaxa for example. |
keep_order |
('logical' of length 1) If 'TRUE', keep relative order of taxa not filtered out. For example, the result of 'filter_taxa(ex_taxmap, 1:3)' and 'filter_taxa(ex_taxmap, 3:1)' would be the same. Does not affect dataset order, only taxon order. This is useful for maintaining order correspondence with a dataset that has one value per taxon. |
An object of type [taxonomy()] or [taxmap()]
Other taxmap manipulation functions:
arrange_obs()
,
arrange_taxa()
,
filter_obs()
,
mutate_obs()
,
sample_frac_obs()
,
sample_frac_taxa()
,
sample_n_obs()
,
sample_n_taxa()
,
select_obs()
,
transmute_obs()
# Filter by index filter_taxa(ex_taxmap, 1:3) # Filter by taxon ID filter_taxa(ex_taxmap, c("b", "c", "d")) # Fiter by TRUE/FALSE filter_taxa(ex_taxmap, taxon_names == "Plantae", subtaxa = TRUE) filter_taxa(ex_taxmap, n_obs > 3) filter_taxa(ex_taxmap, ! taxon_ranks %in% c("species", "genus")) filter_taxa(ex_taxmap, taxon_ranks == "genus", n_obs > 1) # Filter by an observation characteristic dangerous_taxa <- sapply(ex_taxmap$obs("info"), function(i) any(ex_taxmap$data$info$dangerous[i])) filter_taxa(ex_taxmap, dangerous_taxa) # Include supertaxa filter_taxa(ex_taxmap, 12, supertaxa = TRUE) filter_taxa(ex_taxmap, 12, supertaxa = 2) # Include subtaxa filter_taxa(ex_taxmap, 1, subtaxa = TRUE) filter_taxa(ex_taxmap, 1, subtaxa = 2) # Dont remove rows in user-defined data corresponding to removed taxa filter_taxa(ex_taxmap, 2, drop_obs = FALSE) filter_taxa(ex_taxmap, 2, drop_obs = c(info = FALSE)) # Remove a taxon and it subtaxa filter_taxa(ex_taxmap, taxon_names == "Mammalia", subtaxa = TRUE, invert = TRUE)
# Filter by index filter_taxa(ex_taxmap, 1:3) # Filter by taxon ID filter_taxa(ex_taxmap, c("b", "c", "d")) # Fiter by TRUE/FALSE filter_taxa(ex_taxmap, taxon_names == "Plantae", subtaxa = TRUE) filter_taxa(ex_taxmap, n_obs > 3) filter_taxa(ex_taxmap, ! taxon_ranks %in% c("species", "genus")) filter_taxa(ex_taxmap, taxon_ranks == "genus", n_obs > 1) # Filter by an observation characteristic dangerous_taxa <- sapply(ex_taxmap$obs("info"), function(i) any(ex_taxmap$data$info$dangerous[i])) filter_taxa(ex_taxmap, dangerous_taxa) # Include supertaxa filter_taxa(ex_taxmap, 12, supertaxa = TRUE) filter_taxa(ex_taxmap, 12, supertaxa = 2) # Include subtaxa filter_taxa(ex_taxmap, 1, subtaxa = TRUE) filter_taxa(ex_taxmap, 1, subtaxa = 2) # Dont remove rows in user-defined data corresponding to removed taxa filter_taxa(ex_taxmap, 2, drop_obs = FALSE) filter_taxa(ex_taxmap, 2, drop_obs = c(info = FALSE)) # Remove a taxon and it subtaxa filter_taxa(ex_taxmap, taxon_names == "Mammalia", subtaxa = TRUE, invert = TRUE)
Taxonomic filtering helpers
ranks(...) nms(...) ids(...)
ranks(...) nms(...) ids(...)
... |
quoted rank names, taxonomic names, taxonomic ids, or any of those with supported operators (See Supported Relational Operators below) |
Each function assigns some metadata so we can more easily process your query downstream. In addition, we check for whether you've used any relational operators and pull those out to make downstream processing easier
The goal of these functions is to make it easy to combine queries based on each of rank names, taxonomic names, and taxonomic ids.
These are designed to be used inside of [pop()], [pick()], [span()]. Inside of those functions, we figure out what rank names you want to filter on, then check against a reference dataset ([ranks_ref]) to allow ordered queries like I want all taxa between Class and Genus. If you provide rank names, we just use those, then do the filtering you requested. If you provide taxonomic names or ids we figure out what rank names you are referring to, then we can proceed as in the previous sentence.
'>' all items above rank of x
'>=' all items above rank of x, inclusive
'<' all items below rank of x
'<=' all items below rank of x, inclusive
Ranks can be any character string in the set of acceptable rank names.
'nms' is named to avoid using 'names' which would collide with the fxn [base::names()] in Base R. Can pass in any character taxonomic names.
Ids are any alphanumeric taxonomic identifier. Some database providers use all digits, but some use a combination of digits and characters.
NSE is not supported at the moment, but may be in the future
ranks("genus") ranks("order", "genus") ranks("> genus") nms("Poaceae") nms("Poaceae", "Poa") nms("< Poaceae") ids(4544) ids(4544, 4479) ids("< 4479")
ranks("genus") ranks("order", "genus") ranks("> genus") nms("Poaceae") nms("Poaceae", "Poa") nms("< Poaceae") ids(4544) ids(4544, 4479) ids("< 4479")
Given a vector of names, return a list of data (usually lists/vectors) contained in a [taxonomy()] or [taxmap()] object. Each item will be named by taxon ids when possible.
obj$get_data(name = NULL, ...) get_data(obj, name = NULL, ...)
obj |
A [taxonomy()] or [taxmap()] object |
name |
('character') Names of data to return. If not supplied, return all data listed in [all_names()]. |
... |
Passed to [all_names()]. Used to filter what kind of data is returned (e.g. columns in tables or function output?) if 'name' is not supplied or what kinds are allowed if 'name' is supplied. |
'list' of vectors or lists. Each vector or list will be named by associated taxon ids if possible.
Other NSE helpers:
all_names()
,
data_used
,
names_used
# Get specific values get_data(ex_taxmap, c("reaction", "n_legs", "taxon_ranks")) # Get all values get_data(ex_taxmap)
# Get specific values get_data(ex_taxmap, c("reaction", "n_legs", "taxon_ranks")) # Get all values get_data(ex_taxmap)
Given a vector of names, return a table of the indicated data contained in a [taxonomy()] or [taxmap()] object.
obj$get_data_frame(name = NULL, ...) get_data_frame(obj, name = NULL, ...)
obj |
A [taxonomy()] or [taxmap()] object |
name |
('character') Names of data to return. If not supplied, return all data listed in [all_names()]. |
... |
Passed to [all_names()]. Used to filter what kind of data is returned (e.g. columns in tables or function output?) if 'name' is not supplied or what kinds are allowed if 'name' is supplied. |
Note: This function will not work with variables in datasets in [taxmap()] objects unless their rows correspond 1:1 with all taxa.
'data.frame'
# Get specific values get_data_frame(ex_taxmap, c("taxon_names", "taxon_indexes", "is_stem"))
# Get specific values get_data_frame(ex_taxmap, c("taxon_names", "taxon_indexes", "is_stem"))
Get a data set from a taxmap object and complain if it does not exist.
obj |
A taxmap object |
data |
Dataset name, index, or a logical vector that indicates which dataset in 'obj$data' to add columns to. |
## Not run: # Get data set by name get_dataset(ex_taxmap, "info") # Get data set by indeex_taxmap get_dataset(ex_taxmap, 1) # Get data set by T/F vector get_dataset(ex_taxmap, startsWith(names(ex_taxmap$data), "i")) ## End(Not run)
## Not run: # Get data set by name get_dataset(ex_taxmap, "info") # Get data set by indeex_taxmap get_dataset(ex_taxmap, 1) # Get data set by T/F vector get_dataset(ex_taxmap, startsWith(names(ex_taxmap$data), "i")) ## End(Not run)
Plots the distribution of values associated with a taxonomic classification/heirarchy. Taxonomic classifications can have multiple roots, resulting in multiple trees on the same plot. A tree consists of elements, element properties, conditions, and mapping properties which are represented as parameters in the heat_tree object. The elements (e.g. nodes, edges, lables, and individual trees) are the infrastructure of the heat tree. The element properties (e.g. size and color) are characteristics that are manipulated by various data conditions and mapping properties. The element properties can be explicitly defined or automatically generated. The conditions are data (e.g. taxon statistics, such as abundance) represented in the taxmap/metacoder object. The mapping properties are parameters (e.g. transformations, range, interval, and layout) used to change the elements/element properties and how they are used to represent (or not represent) the various conditions.
heat_tree(...) ## S3 method for class 'Taxmap' heat_tree(.input, ...) ## Default S3 method: heat_tree( taxon_id, supertaxon_id, node_label = NA, edge_label = NA, tree_label = NA, node_size = 1, edge_size = node_size, node_label_size = node_size, edge_label_size = edge_size, tree_label_size = as.numeric(NA), node_color = "#999999", edge_color = node_color, tree_color = NA, node_label_color = "#000000", edge_label_color = "#000000", tree_label_color = "#000000", node_size_trans = "area", edge_size_trans = node_size_trans, node_label_size_trans = node_size_trans, edge_label_size_trans = edge_size_trans, tree_label_size_trans = "area", node_color_trans = "area", edge_color_trans = node_color_trans, tree_color_trans = "area", node_label_color_trans = "area", edge_label_color_trans = "area", tree_label_color_trans = "area", node_size_range = c(NA, NA), edge_size_range = c(NA, NA), node_label_size_range = c(NA, NA), edge_label_size_range = c(NA, NA), tree_label_size_range = c(NA, NA), node_color_range = quantative_palette(), edge_color_range = node_color_range, tree_color_range = quantative_palette(), node_label_color_range = quantative_palette(), edge_label_color_range = quantative_palette(), tree_label_color_range = quantative_palette(), node_size_interval = range(node_size, na.rm = TRUE, finite = TRUE), node_color_interval = NULL, edge_size_interval = range(edge_size, na.rm = TRUE, finite = TRUE), edge_color_interval = NULL, node_label_max = 500, edge_label_max = 500, tree_label_max = 500, overlap_avoidance = 1, margin_size = c(0, 0, 0, 0), layout = "reingold-tilford", initial_layout = "fruchterman-reingold", make_node_legend = TRUE, make_edge_legend = TRUE, title = NULL, title_size = 0.08, node_legend_title = "Nodes", edge_legend_title = "Edges", node_color_axis_label = NULL, node_size_axis_label = NULL, edge_color_axis_label = NULL, edge_size_axis_label = NULL, node_color_digits = 3, node_size_digits = 3, edge_color_digits = 3, edge_size_digits = 3, background_color = "#FFFFFF00", output_file = NULL, aspect_ratio = 1, repel_labels = TRUE, repel_force = 1, repel_iter = 1000, verbose = FALSE, ... )
heat_tree(...) ## S3 method for class 'Taxmap' heat_tree(.input, ...) ## Default S3 method: heat_tree( taxon_id, supertaxon_id, node_label = NA, edge_label = NA, tree_label = NA, node_size = 1, edge_size = node_size, node_label_size = node_size, edge_label_size = edge_size, tree_label_size = as.numeric(NA), node_color = "#999999", edge_color = node_color, tree_color = NA, node_label_color = "#000000", edge_label_color = "#000000", tree_label_color = "#000000", node_size_trans = "area", edge_size_trans = node_size_trans, node_label_size_trans = node_size_trans, edge_label_size_trans = edge_size_trans, tree_label_size_trans = "area", node_color_trans = "area", edge_color_trans = node_color_trans, tree_color_trans = "area", node_label_color_trans = "area", edge_label_color_trans = "area", tree_label_color_trans = "area", node_size_range = c(NA, NA), edge_size_range = c(NA, NA), node_label_size_range = c(NA, NA), edge_label_size_range = c(NA, NA), tree_label_size_range = c(NA, NA), node_color_range = quantative_palette(), edge_color_range = node_color_range, tree_color_range = quantative_palette(), node_label_color_range = quantative_palette(), edge_label_color_range = quantative_palette(), tree_label_color_range = quantative_palette(), node_size_interval = range(node_size, na.rm = TRUE, finite = TRUE), node_color_interval = NULL, edge_size_interval = range(edge_size, na.rm = TRUE, finite = TRUE), edge_color_interval = NULL, node_label_max = 500, edge_label_max = 500, tree_label_max = 500, overlap_avoidance = 1, margin_size = c(0, 0, 0, 0), layout = "reingold-tilford", initial_layout = "fruchterman-reingold", make_node_legend = TRUE, make_edge_legend = TRUE, title = NULL, title_size = 0.08, node_legend_title = "Nodes", edge_legend_title = "Edges", node_color_axis_label = NULL, node_size_axis_label = NULL, edge_color_axis_label = NULL, edge_size_axis_label = NULL, node_color_digits = 3, node_size_digits = 3, edge_color_digits = 3, edge_size_digits = 3, background_color = "#FFFFFF00", output_file = NULL, aspect_ratio = 1, repel_labels = TRUE, repel_force = 1, repel_iter = 1000, verbose = FALSE, ... )
... |
(other named arguments)
Passed to the |
.input |
An object of type |
taxon_id |
The unique ids of taxa. |
supertaxon_id |
The unique id of supertaxon |
node_label |
See details on labels. Default: no labels. |
edge_label |
See details on labels. Default: no labels. |
tree_label |
See details on labels. The label to display above each graph. The value of the root of each graph will be used. Default: None. |
node_size |
See details on size. Default: constant size. |
edge_size |
See details on size. Default: relative to node size. |
node_label_size |
See details on size. Default: relative to vertex size. |
edge_label_size |
See details on size. Default: relative to edge size. |
tree_label_size |
See details on size. Default: relative to graph size. |
node_color |
See details on colors. Default: grey. |
edge_color |
See details on colors. Default: same as node color. |
tree_color |
See details on colors. The value of the root of each graph will be used. Overwrites the node and edge color if specified. Default: Not used. |
node_label_color |
See details on colors. Default: black. |
edge_label_color |
See details on colors. Default: black. |
tree_label_color |
See details on colors. Default: black. |
node_size_trans |
See details on transformations.
Default: |
edge_size_trans |
See details on transformations.
Default: same as |
node_label_size_trans |
See details on transformations.
Default: same as |
edge_label_size_trans |
See details on transformations.
Default: same as |
tree_label_size_trans |
See details on transformations.
Default: |
node_color_trans |
See details on transformations.
Default: |
edge_color_trans |
See details on transformations. Default: same as node color transformation. |
tree_color_trans |
See details on transformations.
Default: |
node_label_color_trans |
See details on transformations.
Default: |
edge_label_color_trans |
See details on transformations.
Default: |
tree_label_color_trans |
See details on transformations.
Default: |
node_size_range |
See details on ranges. Default: Optimize to balance overlaps and range size. |
edge_size_range |
See details on ranges. Default: relative to node size range. |
node_label_size_range |
See details on ranges. Default: relative to node size. |
edge_label_size_range |
See details on ranges. Default: relative to edge size. |
tree_label_size_range |
See details on ranges. Default: relative to tree size. |
node_color_range |
See details on ranges. Default: Color-blind friendly palette. |
edge_color_range |
See details on ranges. Default: same as node color. |
tree_color_range |
See details on ranges. Default: Color-blind friendly palette. |
node_label_color_range |
See details on ranges. Default: Color-blind friendly palette. |
edge_label_color_range |
See details on ranges. Default: Color-blind friendly palette. |
tree_label_color_range |
See details on ranges. Default: Color-blind friendly palette. |
node_size_interval |
See details on intervals.
Default: The range of values in |
node_color_interval |
See details on intervals.
Default: The range of values in |
edge_size_interval |
See details on intervals.
Default: The range of values in |
edge_color_interval |
See details on intervals.
Default: The range of values in |
node_label_max |
The maximum number of node labels. Default: 20. |
edge_label_max |
The maximum number of edge labels. Default: 20. |
tree_label_max |
The maximum number of tree labels. Default: 20. |
overlap_avoidance |
( |
margin_size |
( |
layout |
The layout algorithm used to position nodes.
See details on layouts.
Default: |
initial_layout |
he layout algorithm used to set the initial position
of nodes, passed as input to the |
make_node_legend |
if TRUE, make legend for node size/color mappings. |
make_edge_legend |
if TRUE, make legend for edge size/color mappings. |
title |
Name to print above the graph. |
title_size |
The size of the title relative to the rest of the graph. |
node_legend_title |
The title of the legend for node data. Can be 'NA' or 'NULL' to remove the title. |
edge_legend_title |
The title of the legend for edge data. Can be 'NA' or 'NULL' to remove the title. |
node_color_axis_label |
The label on the scale axis corresponding to |
node_size_axis_label |
The label on the scale axis corresponding to |
edge_color_axis_label |
The label on the scale axis corresponding to |
edge_size_axis_label |
The label on the scale axis corresponding to |
node_color_digits |
The number of significant figures used for the numbers on the scale axis corresponding to |
node_size_digits |
The number of significant figures used for the numbers on the scale axis corresponding to |
edge_color_digits |
The number of significant figures used for the numbers on the scale axis corresponding to |
edge_size_digits |
The number of significant figures used for the numbers on the scale axis corresponding to |
background_color |
The background color of the plot. Default: Transparent |
output_file |
The path to one or more files to save the plot in using |
aspect_ratio |
The aspect_ratio of the plot. |
repel_labels |
If |
repel_force |
The force of which overlapping labels will be repelled from eachother. |
repel_iter |
The number of iterations used when repelling labels |
verbose |
If |
The labels of nodes, edges, and trees can be added. Node labels are centered over their node. Edge labels are displayed over edges, in the same orientation. Tree labels are displayed over their tree.
Accepts a vector, the same length taxon_id
or a factor of its length.
The size of nodes, edges, labels, and trees can be mapped to various conditions. This is useful for displaying statistics for taxa, such as abundance. Only the relative size of the condition is used, not the values themselves. The <element>_size_trans (transformation) parameter can be used to make the size mapping non-linear. The <element>_size_range parameter can be used to proportionately change the size of an element based on the condition mapped to that element. The <element>_size_interval parameter can be used to change the limit at which a condition will be graphically represented as the same size as the minimum/maximum <element>_size_range.
Accepts a numeric
vector, the same length taxon_id
or a
factor of its length.
The colors of nodes, edges, labels, and trees can be mapped to various conditions. This is useful for visually highlighting/clustering groups of taxa. Only the relative size of the condition is used, not the values themselves. The <element>_color_trans (transformation) parameter can be used to make the color mapping non-linear. The <element>_color_range parameter can be used to proportionately change the color of an element based on the condition mapped to that element. The <element>_color_interval parameter can be used to change the limit at which a condition will be graphically represented as the same color as the minimum/maximum <element>_color_range.
Accepts a vector, the same length taxon_id
or a factor of its length.
If a numeric vector is given, it is mapped to a color scale.
Hex values or color names can be used (e.g. #000000
or "black"
).
Mapping Properties
Before any conditions specified are mapped to an element property (color/size), they can be transformed to make the mapping non-linear. Any of the transformations listed below can be used by specifying their name. A customized function can also be supplied to do the transformation.
Proportional to radius/diameter of node
circular area; better perceptual accuracy than "linear"
Log base 10 of radius
Log base 2 of radius
Log base e of radius
Log base 10 of circular area
Log base 2 of circular area
Log base e of circular area
The displayed range of colors and sizes can be explicitly defined or automatically generated.
When explicitly used, the size range will proportionately increase/decrease the size of a particular element.
Size ranges are specified by supplying a numeric
vector with two values: the minimum and maximum.
The units used should be between 0 and 1, representing the proportion of a dimension of the graph.
Since the dimensions of the graph are determined by layout, and not always square, the value
that 1
corresponds to is the square root of the graph area (i.e. the side of a square with
the same area as the plotted space).
Color ranges can be any number of color values as either HEX codes (e.g. #000000
) or
color names (e.g. "black"
).
Layouts determine the position of node elements on the graph.
They are implemented using the igraph
package.
Any additional arguments passed to heat_tree
are passed to the igraph
function used.
The following character
values are understood:
Use as_tree
. A circular tree-like layout.
Use with_dh
. A type of simulated annealing.
Use with_gem
. A force-directed layout.
Use with_graphopt
. A force-directed layout.
Use with_mds
. Multidimensional scaling.
Use with_fr
. A force-directed layout.
Use with_kk
. A layout based on a physical model of springs.
Use with_lgl
. Meant for larger graphs.
Use with_drl
. A force-directed layout.
This is the minimum and maximum of values displayed on the legend scales.
Intervals are specified by supplying a numeric
vector with two values: the minimum and maximum.
When explicitly used, the <element>_<property>_interval will redefine the way the actual conditional values are being represented
by setting a limit for the <element>_<property>.
Any condition below the minimum <element>_<property>_interval will be graphically represented the same as a condition AT the
minimum value in the full range of conditional values. Any value above the maximum <element>_<property>_interval will be graphically
represented the same as a value AT the maximum value in the full range of conditional values.
By default, the minimum and maximum equals the <element>_<property>_range used to infer the value of the <element>_<property>.
Setting a custom interval is useful for making <element>_<properties> in multiple graphs correspond to the same conditions,
or setting logical boundaries (such as c(0,1)
for proportions.
Note that this is different from the <element>_<property>_range mapping property, which determines the size/color of graphed elements.
This package includes code from the R package ggrepel to handle label overlap avoidance with permission from the author of ggrepel Kamil Slowikowski. We included the code instead of depending on ggrepel because we are using internal functions to ggrepel that might change in the future. We thank Kamil Slowikowski for letting us use his code and would like to acknowledge his implementation of the label overlap avoidance used in metacoder.
## Not run: # Parse dataset for plotting x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$") # Default appearance: # No parmeters are needed, but the default tree is not too useful heat_tree(x) # A good place to start: # There will always be "taxon_names" and "n_obs" variables, so this is a # good place to start. This will shown the number of OTUs in this case. heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs) # Plotting read depth: # To plot read depth, you first need to add up the number of reads per taxon. # The function `calc_taxon_abund` is good for this. x$data$taxon_counts <- calc_taxon_abund(x, data = "tax_data") x$data$taxon_counts$total <- rowSums(x$data$taxon_counts[, -1]) # -1 = taxon_id column heat_tree(x, node_label = taxon_names, node_size = total, node_color = total) # Plotting multiple variables: # You can plot up to 4 quantative variables use node/edge size/color, but it # is usually best to use 2 or 3. The plot below uses node size for number of # OTUs and color for number of reads and edge size for number of samples x$data$n_samples <- calc_n_samples(x, data = "taxon_counts") heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = total, edge_color = n_samples) # Different layouts: # You can use any layout implemented by igraph. You can also specify an # initial layout to seed the main layout with. heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs, layout = "davidson-harel") heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs, layout = "davidson-harel", initial_layout = "reingold-tilford") # Axis labels: # You can add custom labeles to the legends heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = total, edge_color = n_samples, node_size_axis_label = "Number of OTUs", node_color_axis_label = "Number of reads", edge_color_axis_label = "Number of samples") # Overlap avoidance: # You can change how much node overlap avoidance is used. heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs, overlap_avoidance = .5) # Label overlap avoidance # You can modfiy how label scattering is handled using the `replel_force` and `repel_iter` options. You can turn off label scattering using the `repel_labels` option. heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs, repel_force = 2, repel_iter = 20000) heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs, repel_labels = FALSE) # Setting the size of graph elements: # You can force nodes, edges, and lables to be a specific size/color range instead # of letting the function optimize it. These options end in `_range`. heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs, node_size_range = c(0.01, .1)) heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs, edge_color_range = c("black", "#FFFFFF")) heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs, node_label_size_range = c(0.02, 0.02)) # Setting the transformation used: # You can change how raw statistics are converted to color/size using options # ending in _trans. heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs, node_size_trans = "log10 area") # Setting the interval displayed: # By default, the whole range of the statistic provided will be displayed. # You can set what range of values are displayed using options ending in `_interval`. heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs, node_size_interval = c(10, 100)) ## End(Not run)
## Not run: # Parse dataset for plotting x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$") # Default appearance: # No parmeters are needed, but the default tree is not too useful heat_tree(x) # A good place to start: # There will always be "taxon_names" and "n_obs" variables, so this is a # good place to start. This will shown the number of OTUs in this case. heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs) # Plotting read depth: # To plot read depth, you first need to add up the number of reads per taxon. # The function `calc_taxon_abund` is good for this. x$data$taxon_counts <- calc_taxon_abund(x, data = "tax_data") x$data$taxon_counts$total <- rowSums(x$data$taxon_counts[, -1]) # -1 = taxon_id column heat_tree(x, node_label = taxon_names, node_size = total, node_color = total) # Plotting multiple variables: # You can plot up to 4 quantative variables use node/edge size/color, but it # is usually best to use 2 or 3. The plot below uses node size for number of # OTUs and color for number of reads and edge size for number of samples x$data$n_samples <- calc_n_samples(x, data = "taxon_counts") heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = total, edge_color = n_samples) # Different layouts: # You can use any layout implemented by igraph. You can also specify an # initial layout to seed the main layout with. heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs, layout = "davidson-harel") heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs, layout = "davidson-harel", initial_layout = "reingold-tilford") # Axis labels: # You can add custom labeles to the legends heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = total, edge_color = n_samples, node_size_axis_label = "Number of OTUs", node_color_axis_label = "Number of reads", edge_color_axis_label = "Number of samples") # Overlap avoidance: # You can change how much node overlap avoidance is used. heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs, overlap_avoidance = .5) # Label overlap avoidance # You can modfiy how label scattering is handled using the `replel_force` and `repel_iter` options. You can turn off label scattering using the `repel_labels` option. heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs, repel_force = 2, repel_iter = 20000) heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs, repel_labels = FALSE) # Setting the size of graph elements: # You can force nodes, edges, and lables to be a specific size/color range instead # of letting the function optimize it. These options end in `_range`. heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs, node_size_range = c(0.01, .1)) heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs, edge_color_range = c("black", "#FFFFFF")) heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs, node_label_size_range = c(0.02, 0.02)) # Setting the transformation used: # You can change how raw statistics are converted to color/size using options # ending in _trans. heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs, node_size_trans = "log10 area") # Setting the interval displayed: # By default, the whole range of the statistic provided will be displayed. # You can set what range of values are displayed using options ending in `_interval`. heat_tree(x, node_label = taxon_names, node_size = n_obs, node_color = n_obs, node_size_interval = c(10, 100)) ## End(Not run)
Plot a matrix of heat trees for showing pairwise comparisons. A larger,
labelled tree serves as a key for the matrix of smaller unlabelled trees. The
data for this function is typically created with compare_groups
,
heat_tree_matrix( obj, data, label_small_trees = FALSE, key_size = 0.6, seed = 1, output_file = NULL, row_label_color = diverging_palette()[3], col_label_color = diverging_palette()[1], row_label_size = 12, col_label_size = 12, ..., dataset = NULL )
heat_tree_matrix( obj, data, label_small_trees = FALSE, key_size = 0.6, seed = 1, output_file = NULL, row_label_color = diverging_palette()[3], col_label_color = diverging_palette()[1], row_label_size = 12, col_label_size = 12, ..., dataset = NULL )
obj |
A |
data |
The name of a table in |
label_small_trees |
If |
key_size |
The size of the key tree relative to the whole graph. For example, 0.5 means half the width/height of the graph. |
seed |
That random seed used to make the graphs. |
output_file |
The path to one or more files to save the plot in using |
row_label_color |
The color of the row labels on the right side of the matrix. Default: based on the node_color_range. |
col_label_color |
The color of the columns labels along the top of the matrix. Default: based on the node_color_range. |
row_label_size |
The size of the row labels on the right side of the matrix. Default: 12. |
col_label_size |
The size of the columns labels along the top of the matrix. Default: 12. |
... |
Passed to |
dataset |
DEPRECIATED. use "data" instead. |
## Not run: # Parse dataset for plotting x <- parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$") # Convert counts to proportions x$data$otu_table <- calc_obs_props(x, data = "tax_data", cols = hmp_samples$sample_id) # Get per-taxon counts x$data$tax_table <- calc_taxon_abund(x, data = "otu_table", cols = hmp_samples$sample_id) # Calculate difference between treatments x$data$diff_table <- compare_groups(x, data = "tax_table", cols = hmp_samples$sample_id, groups = hmp_samples$body_site) # Plot results (might take a few minutes) heat_tree_matrix(x, data = "diff_table", node_size = n_obs, node_label = taxon_names, node_color = log2_median_ratio, node_color_range = diverging_palette(), node_color_trans = "linear", node_color_interval = c(-3, 3), edge_color_interval = c(-3, 3), node_size_axis_label = "Number of OTUs", node_color_axis_label = "Log2 ratio median proportions") ## End(Not run)
## Not run: # Parse dataset for plotting x <- parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$") # Convert counts to proportions x$data$otu_table <- calc_obs_props(x, data = "tax_data", cols = hmp_samples$sample_id) # Get per-taxon counts x$data$tax_table <- calc_taxon_abund(x, data = "otu_table", cols = hmp_samples$sample_id) # Calculate difference between treatments x$data$diff_table <- compare_groups(x, data = "tax_table", cols = hmp_samples$sample_id, groups = hmp_samples$body_site) # Plot results (might take a few minutes) heat_tree_matrix(x, data = "diff_table", node_size = n_obs, node_label = taxon_names, node_color = log2_median_ratio, node_color_range = diverging_palette(), node_color_trans = "linear", node_color_interval = c(-3, 3), edge_color_interval = c(-3, 3), node_size_axis_label = "Number of OTUs", node_color_axis_label = "Log2 ratio median proportions") ## End(Not run)
NOTE: This will soon be depreciated. Make a set of many [hierarchy()] class objects. This is just a thin wrapper over a standard list.
hierarchies(..., .list = NULL)
hierarchies(..., .list = NULL)
... |
Any number of object of class [hierarchy()] |
.list |
Any number of object of class [hierarchy()] in a list |
An 'R6Class' object of class [hierarchy()]
Other classes:
hierarchy()
,
taxa()
,
taxmap()
,
taxon()
,
taxon_database()
,
taxon_id()
,
taxon_name()
,
taxon_rank()
,
taxonomy()
A class containing an ordered list of [taxon()] objects that represent a hierarchical classification.
hierarchy(..., .list = NULL)
hierarchy(..., .list = NULL)
... |
Any number of object of class 'Taxon' or taxonomic names as character strings |
.list |
An alternate to the '...' input. Any number of object of class [taxon()] or character vectors in a list. Cannot be used with '...'. |
On initialization, taxa are sorted if they have ranks with a known order.
**Methods**
Remove 'Taxon' elements by rank name, taxon name or taxon ID. The change happens in place, so you don't need to assign output to a new object. returns self - rank_names (character) a vector of rank names
Select 'Taxon' elements by rank name, taxon name or taxon ID. The change happens in place, so you don't need to assign output to a new object. returns self - rank_names (character) a vector of rank names
An 'R6Class' object of class 'Hierarchy'
Other classes:
hierarchies()
,
taxa()
,
taxmap()
,
taxon()
,
taxon_database()
,
taxon_id()
,
taxon_name()
,
taxon_rank()
,
taxonomy()
(x <- taxon( name = taxon_name("Poaceae"), rank = taxon_rank("family"), id = taxon_id(4479) )) (y <- taxon( name = taxon_name("Poa"), rank = taxon_rank("genus"), id = taxon_id(4544) )) (z <- taxon( name = taxon_name("Poa annua"), rank = taxon_rank("species"), id = taxon_id(93036) )) (res <- hierarchy(z, y, x)) res$taxa res$ranklist # null taxa x <- taxon(NULL) (res <- hierarchy(x, x, x)) ## similar to hierarchy(), but `taxa` slot is not empty
(x <- taxon( name = taxon_name("Poaceae"), rank = taxon_rank("family"), id = taxon_id(4479) )) (y <- taxon( name = taxon_name("Poa"), rank = taxon_rank("genus"), id = taxon_id(4544) )) (z <- taxon( name = taxon_name("Poa annua"), rank = taxon_rank("species"), id = taxon_id(93036) )) (res <- hierarchy(z, y, x)) res$taxa res$ranklist # null taxa x <- taxon(NULL) (res <- hierarchy(x, x, x)) ## similar to hierarchy(), but `taxa` slot is not empty
Changes the font of a taxon ID column in a table print out.
highlight_taxon_ids(table_text, header_index, row_indexes)
highlight_taxon_ids(table_text, header_index, row_indexes)
table_text |
The print out of the table in a character vector, one element per line. |
header_index |
The row index that contains the table column names |
row_indexes |
The indexes of the rows to be formatted. |
A subset of the Human Microbiome Project abundance matrix produced by QIIME.
It contains OTU ids, taxonomic lineages, and the read counts for 50 samples.
See hmp_samples
for the matching dataset of sample information.
A 1,000 x 52 tibble.
The 50 samples were randomly selected such that there were 10 in each of 5 treatments: "Saliva", "Throat", "Stool", "Right_Antecubital_fossa", "Anterior_nares". For each treatment, there were 5 samples from men and 5 from women.
Subset from data available at https://www.hmpdacc.org/hmp/HMQCP/
Other hmp_data:
hmp_samples
The sample information for a subset of the Human Microbiome Project data. It
contains the sample ID, sex, and body site for each sample in the abundance
matrix stored in hmp_otus
. The "sample_id" column corresponds
to the column names of hmp_otus
.
A 50 x 3 tibble.
The 50 samples were randomly selected such that there were 10 in each of 5 treatments: "Saliva", "Throat", "Stool", "Right_Antecubital_fossa", "Anterior_nares". For each treatment, there were 5 samples from men and 5 from women. "Right_Antecubital_fossa" was renamed to "Skin" and "Anterior_nares" to "Nose".
Subset from data available at https://www.hmpdacc.org/hmp/HMQCP/
Other hmp_data:
hmp_otus
Get classification strings of taxa in an object of type [taxonomy()] or [taxmap()] composed of taxon IDs. Each classification is constructed by concatenating the taxon ids of the given taxon and its supertaxa.
obj$id_classifications(sep = ";") id_classifications(obj, sep = ";")
obj |
([taxonomy()] or [taxmap()]) |
sep |
('character' of length 1) The character(s) to place between taxon IDs |
'character'
Other taxonomy data functions:
classifications()
,
is_branch()
,
is_internode()
,
is_leaf()
,
is_root()
,
is_stem()
,
map_data()
,
map_data_()
,
n_leaves()
,
n_leaves_1()
,
n_subtaxa()
,
n_subtaxa_1()
,
n_supertaxa()
,
n_supertaxa_1()
,
taxon_ids()
,
taxon_indexes()
,
taxon_names()
,
taxon_ranks()
# Get classifications of IDs for each taxon id_classifications(ex_taxmap) # Use a different seperator id_classifications(ex_taxmap, sep = '|')
# Get classifications of IDs for each taxon id_classifications(ex_taxmap) # Use a different seperator id_classifications(ex_taxmap, sep = '|')
Return the "internode" taxa for a [taxonomy()] or [taxmap()] object. An internode is any taxon with a single immediate supertaxon and a single immediate subtaxon. They can be removed from a tree without any loss of information on the relative relationship between remaining taxa. Can also be used to get the internodes of a subset of taxa.
obj$internodes(subset = NULL, value = "taxon_indexes") internodes(obj, subset = NULL, value = "taxon_indexes")
obj |
The [taxonomy()] or [taxmap()] object containing taxon information to be queried. |
subset |
Taxon IDs, TRUE/FALSE vector, or taxon indexes used to subset the tree prior to determining internodes. Default: All taxa in 'obj' will be used. Any variable name that appears in [all_names()] can be used as if it was a vector on its own. Note that internodes are determined after the filtering, so a given taxon might be a internode on the unfiltered tree, but not a internode on the filtered tree. |
value |
What data to return. This is usually the name of column in a table in 'obj$data'. Any result of [all_names()] can be used, but it usually only makes sense to use data that corresponds to taxa 1:1, such as [taxon_ranks()]. By default, taxon indexes are returned. |
'character'
Other taxonomy indexing functions:
branches()
,
leaves()
,
roots()
,
stems()
,
subtaxa()
,
supertaxa()
## Not run: # Return indexes of branch taxa internodes(ex_taxmap) # Return indexes for a subset of taxa internodes(ex_taxmap, subset = 2:17) internodes(ex_taxmap, subset = n_obs > 1) # Return something besides taxon indexes internodes(ex_taxmap, value = "taxon_names") ## End(Not run)
## Not run: # Return indexes of branch taxa internodes(ex_taxmap) # Return indexes for a subset of taxa internodes(ex_taxmap, subset = 2:17) internodes(ex_taxmap, subset = n_obs > 1) # Return something besides taxon indexes internodes(ex_taxmap, value = "taxon_names") ## End(Not run)
Find taxa with ambiguous names, such as "unknown" or "uncultured".
is_ambiguous( taxon_names, unknown = TRUE, uncultured = TRUE, name_regex = ".", ignore_case = TRUE )
is_ambiguous( taxon_names, unknown = TRUE, uncultured = TRUE, name_regex = ".", ignore_case = TRUE )
taxon_names |
A |
unknown |
If |
uncultured |
If |
name_regex |
The regex code to match a valid character in a taxon name. For example, "[a-z]" would mean taxon names can only be lower case letters. |
ignore_case |
If |
If you encounter a taxon name that represents an ambiguous taxon that is not filtered out by this function, let us know and we will add it.
TRUE/FALSE vector corresponding to taxon_names
is_ambiguous(c("unknown", "uncultured", "homo sapiens", "kfdsjfdljsdf"))
is_ambiguous(c("unknown", "uncultured", "homo sapiens", "kfdsjfdljsdf"))
Test if taxa are branches in a [taxonomy()] or [taxmap()] object. Branches are taxa in the interior of the tree that are not [roots()], [stems()], or [leaves()].
obj$is_branch() is_branch(obj)
obj |
The [taxonomy()] or [taxmap()] object. |
A 'logical' of length equal to the number of taxa.
Other taxonomy data functions:
classifications()
,
id_classifications()
,
is_internode()
,
is_leaf()
,
is_root()
,
is_stem()
,
map_data()
,
map_data_()
,
n_leaves()
,
n_leaves_1()
,
n_subtaxa()
,
n_subtaxa_1()
,
n_supertaxa()
,
n_supertaxa_1()
,
taxon_ids()
,
taxon_indexes()
,
taxon_names()
,
taxon_ranks()
# Test which taxon IDs correspond to branches is_branch(ex_taxmap) # Filter out branches filter_taxa(ex_taxmap, ! is_branch)
# Test which taxon IDs correspond to branches is_branch(ex_taxmap) # Filter out branches filter_taxa(ex_taxmap, ! is_branch)
Test if taxa are "internodes" in a [taxonomy()] or [taxmap()] object. An internode is any taxon with a single immediate supertaxon and a single immediate subtaxon. They can be removed from a tree without any loss of information on the relative relationship between remaining taxa.
obj$is_internode() is_internode(obj)
obj |
The [taxonomy()] or [taxmap()] object. |
A 'logical' of length equal to the number of taxa.
Other taxonomy data functions:
classifications()
,
id_classifications()
,
is_branch()
,
is_leaf()
,
is_root()
,
is_stem()
,
map_data()
,
map_data_()
,
n_leaves()
,
n_leaves_1()
,
n_subtaxa()
,
n_subtaxa_1()
,
n_supertaxa()
,
n_supertaxa_1()
,
taxon_ids()
,
taxon_indexes()
,
taxon_names()
,
taxon_ranks()
# Test for which taxon IDs correspond to internodes is_internode(ex_taxmap) # Filter out internodes filter_taxa(ex_taxmap, ! is_internode)
# Test for which taxon IDs correspond to internodes is_internode(ex_taxmap) # Filter out internodes filter_taxa(ex_taxmap, ! is_internode)
Test if taxa are leaves in a [taxonomy()] or [taxmap()] object. Leaves are taxa without subtaxa, typically species.
obj$is_leaf() is_leaf(obj)
obj |
The [taxonomy()] or [taxmap()] object. |
A 'logical' of length equal to the number of taxa.
Other taxonomy data functions:
classifications()
,
id_classifications()
,
is_branch()
,
is_internode()
,
is_root()
,
is_stem()
,
map_data()
,
map_data_()
,
n_leaves()
,
n_leaves_1()
,
n_subtaxa()
,
n_subtaxa_1()
,
n_supertaxa()
,
n_supertaxa_1()
,
taxon_ids()
,
taxon_indexes()
,
taxon_names()
,
taxon_ranks()
# Test which taxon IDs correspond to leaves is_leaf(ex_taxmap) # Filter out leaves filter_taxa(ex_taxmap, ! is_leaf)
# Test which taxon IDs correspond to leaves is_leaf(ex_taxmap) # Filter out leaves filter_taxa(ex_taxmap, ! is_leaf)
Test if taxa are roots in a [taxonomy()] or [taxmap()] object. Roots are taxa without supertaxa, typically things like "Bacteria", or "Life".
obj$is_root() is_root(obj)
obj |
The [taxonomy()] or [taxmap()] object. |
A 'logical' of length equal to the number of taxa.
Other taxonomy data functions:
classifications()
,
id_classifications()
,
is_branch()
,
is_internode()
,
is_leaf()
,
is_stem()
,
map_data()
,
map_data_()
,
n_leaves()
,
n_leaves_1()
,
n_subtaxa()
,
n_subtaxa_1()
,
n_supertaxa()
,
n_supertaxa_1()
,
taxon_ids()
,
taxon_indexes()
,
taxon_names()
,
taxon_ranks()
# Test for which taxon IDs correspond to roots is_root(ex_taxmap) # Filter out roots filter_taxa(ex_taxmap, ! is_root)
# Test for which taxon IDs correspond to roots is_root(ex_taxmap) # Filter out roots filter_taxa(ex_taxmap, ! is_root)
Test if taxa are stems in a [taxonomy()] or [taxmap()] object. Stems are taxa from the [roots()] taxa to the first taxon with more than one subtaxon. These can usually be filtered out of the taxonomy without removing any information on how the remaining taxa are related.
obj$is_stem() is_stem(obj)
obj |
The [taxonomy()] or [taxmap()] object. |
A 'logical' of length equal to the number of taxa.
Other taxonomy data functions:
classifications()
,
id_classifications()
,
is_branch()
,
is_internode()
,
is_leaf()
,
is_root()
,
map_data()
,
map_data_()
,
n_leaves()
,
n_leaves_1()
,
n_subtaxa()
,
n_subtaxa_1()
,
n_supertaxa()
,
n_supertaxa_1()
,
taxon_ids()
,
taxon_indexes()
,
taxon_names()
,
taxon_ranks()
# Test which taxon IDs correspond to stems is_stem(ex_taxmap) # Filter out stems filter_taxa(ex_taxmap, ! is_stem)
# Test which taxon IDs correspond to stems is_stem(ex_taxmap) # Filter out stems filter_taxa(ex_taxmap, ! is_stem)
Functions used to determine graph layout.
Calling the function with no parameters returns available function names.
Calling the function with only the name of a function returns that function.
Supplying a name and a graph
object to run the layout function on the graph.
layout_functions( name = NULL, graph = NULL, intitial_coords = NULL, effort = 1, ... )
layout_functions( name = NULL, graph = NULL, intitial_coords = NULL, effort = 1, ... )
name |
( |
graph |
( |
intitial_coords |
( |
effort |
( |
... |
(other arguments) Passed to igraph layout function used. |
The name available functions, a layout functions, or a two-column matrix depending on how arguments are provided.
# List available function names: layout_functions() # Execute layout function on graph: layout_functions("davidson-harel", igraph::make_ring(5))
# List available function names: layout_functions() # Execute layout function on graph: layout_functions("davidson-harel", igraph::make_ring(5))
Return the leaf taxa for a [taxonomy()] or [taxmap()] object. Leaf taxa are taxa with no subtaxa.
obj$leaves(subset = NULL, recursive = TRUE, simplify = FALSE, value = "taxon_indexes") leaves(obj, subset = NULL, recursive = TRUE, simplify = FALSE, value = "taxon_indexes")
obj |
The [taxonomy()] or [taxmap()] object containing taxon information to be queried. |
subset |
Taxon IDs, TRUE/FALSE vector, or taxon indexes to find leaves for. Default: All taxa in 'obj' will be used. Any variable name that appears in [all_names()] can be used as if it was a vector on its own. |
recursive |
('logical' or 'numeric') If 'FALSE', only return the leaves if they occur one rank below the target taxa. If 'TRUE', return all of the leaves for each taxon. Positive numbers indicate the number of recursions (i.e. number of ranks below the target taxon to return). '1' is equivalent to 'FALSE'. Negative numbers are equivalent to 'TRUE'. |
simplify |
('logical') If 'TRUE', then combine all the results into a single vector of unique values. |
value |
What data to return. This is usually the name of column in a table in 'obj$data'. Any result of 'all_names(obj)' can be used, but it usually only makes sense to data that corresponds to taxa 1:1, such as [taxon_ranks()]. By default, taxon indexes are returned. |
'character'
Other taxonomy indexing functions:
branches()
,
internodes()
,
roots()
,
stems()
,
subtaxa()
,
supertaxa()
# Return indexes of leaf taxa leaves(ex_taxmap) # Return indexes for a subset of taxa leaves(ex_taxmap, subset = 2:17) leaves(ex_taxmap, subset = taxon_names == "Plantae") # Return something besides taxon indexes leaves(ex_taxmap, value = "taxon_names") leaves(ex_taxmap, subset = taxon_ranks == "genus", value = "taxon_names") # Return a vector of all unique values leaves(ex_taxmap, value = "taxon_names", simplify = TRUE) # Only return leaves for their direct supertaxa leaves(ex_taxmap, value = "taxon_names", recursive = FALSE)
# Return indexes of leaf taxa leaves(ex_taxmap) # Return indexes for a subset of taxa leaves(ex_taxmap, subset = 2:17) leaves(ex_taxmap, subset = taxon_names == "Plantae") # Return something besides taxon indexes leaves(ex_taxmap, value = "taxon_names") leaves(ex_taxmap, subset = taxon_ranks == "genus", value = "taxon_names") # Return a vector of all unique values leaves(ex_taxmap, value = "taxon_names", simplify = TRUE) # Only return leaves for their direct supertaxa leaves(ex_taxmap, value = "taxon_names", recursive = FALSE)
Apply a function to the leaves of each taxon. This is similar to using [leaves()] with [lapply()] or [sapply()].
obj$leaves_apply(func, subset = NULL, recursive = TRUE, simplify = FALSE, value = "taxon_indexes", ...) leaves_apply(obj, func, subset = NULL, recursive = TRUE, simplify = FALSE, value = "taxon_indexes", ...)
obj |
The [taxonomy()] or [taxmap()] object containing taxon information to be queried. |
func |
('function') The function to apply. |
subset |
Taxon IDs, TRUE/FALSE vector, or taxon indexes to use. Default: All taxa in 'obj' will be used. Any variable name that appears in [all_names()] can be used as if it was a vector on its own. |
recursive |
('logical' or 'numeric') If 'FALSE', only return the leaves if they occur one rank below the target taxa. If 'TRUE', return all of the leaves for each taxon. Positive numbers indicate the number of recursions (i.e. number of ranks below the target taxon to return). '1' is equivalent to 'FALSE'. Negative numbers are equivalent to 'TRUE'. |
simplify |
('logical') If 'TRUE', then combine all the results into a single vector of unique values. |
value |
What data to give to the function. Any result of 'all_names(obj)' can be used, but it usually only makes sense to use data that has an associated taxon id. |
... |
Extra arguments are passed to the function 'func'. |
# Count number of leaves under each taxon or its subtaxa leaves_apply(ex_taxmap, length) # Count number of leaves under each taxon leaves_apply(ex_taxmap, length, recursive = FALSE) # Converting output of leaves to upper case leaves_apply(ex_taxmap, value = "taxon_names", toupper) # Passing arguments to the function leaves_apply(ex_taxmap, value = "taxon_names", paste0, collapse = ", ")
# Count number of leaves under each taxon or its subtaxa leaves_apply(ex_taxmap, length) # Count number of leaves under each taxon leaves_apply(ex_taxmap, length, recursive = FALSE) # Converting output of leaves to upper case leaves_apply(ex_taxmap, value = "taxon_names", toupper) # Passing arguments to the function leaves_apply(ex_taxmap, value = "taxon_names", paste0, collapse = ", ")
Looks up taxonomic data from NCBI sequence IDs, taxon IDs, or taxon names that are present in a table, list, or vector. Also can incorporate additional associated datasets.
lookup_tax_data( tax_data, type, column = 1, datasets = list(), mappings = c(), database = "ncbi", include_tax_data = TRUE, use_database_ids = TRUE, ask = TRUE )
lookup_tax_data( tax_data, type, column = 1, datasets = list(), mappings = c(), database = "ncbi", include_tax_data = TRUE, use_database_ids = TRUE, ask = TRUE )
tax_data |
A table, list, or vector that contain sequence IDs, taxon IDs, or taxon names. * tables: The 'column' option must be used to specify which column contains the sequence IDs, taxon IDs, or taxon names. * lists: There must be only one item per list entry unless the 'column' option is used to specify what item to use in each list entry. * vectors: simply a vector of sequence IDs, taxon IDs, or taxon names. |
type |
What type of information can be used to look up the classifications. Takes one of the following values: * '"seq_id"': A database sequence ID with an associated classification (e.g. NCBI accession numbers). * '"taxon_id"': A reference database taxon ID (e.g. a NCBI taxon ID) * '"taxon_name"': A single taxon name (e.g. "Homo sapiens" or "Primates") * '"fuzzy_name"': A single taxon name, but check for misspellings first. Only use if you think there are misspellings. Using '"taxon_name"' is faster. |
column |
('character' or 'integer') The name or index of the column that contains information used to lookup classifications. This only applies when a table or list is supplied to 'tax_data'. |
datasets |
Additional lists/vectors/tables that should be included in the resulting 'taxmap' object. The 'mappings' option is use to specify how these data sets relate to the 'tax_data' and, by inference, what taxa apply to each item. |
mappings |
(named 'character') This defines how the taxonomic information in 'tax_data' applies to data in 'datasets'. This option should have the same number of inputs as 'datasets', with values corresponding to each dataset. The names of the character vector specify what information in 'tax_data' is shared with info in each 'dataset', which is specified by the corresponding values of the character vector. If there are no shared variables, you can add 'NA' as a placeholder, but you could just leave that data out since it is not benefiting from being in the taxmap object. The names/values can be one of the following: * For tables, the names of columns can be used. * '"{{index}}"' : This means to use the index of rows/items * '"{{name}}"' : This means to use row/item names. * '"{{value}}"' : This means to use the values in vectors or lists. Lists will be converted to vectors using [unlist()]. |
database |
('character') The name of a database to use to look up classifications. Options include "ncbi", "itis", "eol", "col", "tropicos", and "nbn". |
include_tax_data |
('TRUE'/'FALSE') Whether or not to include 'tax_data' as a dataset, like those in 'datasets'. |
use_database_ids |
('TRUE'/'FALSE') Whether or not to use downloaded database taxon ids instead of arbitrary, automatically-generated taxon ids. |
ask |
('TRUE'/'FALSE') Whether or not to prompt the user for input. Currently, this would only happen when looking up the taxonomy of a taxon name with multiple matches. If 'FALSE', taxa with multiple hits are treated as if they do not exist in the database. This might change in the future if we can find an elegant way of handling this. |
If you have invalid inputs or a download fails for another reason, then there will be a "unknown" taxon ID as a placeholder and failed inputs will be assigned to this ID. You can remove these using [filter_taxa()] like so: 'filter_taxa(result, taxon_ids != "unknown")'. Add 'drop_obs = FALSE' if you want the input data, but want to remove the taxon.
Other parsers:
extract_tax_data()
,
parse_dada2()
,
parse_edge_list()
,
parse_greengenes()
,
parse_mothur_tax_summary()
,
parse_mothur_taxonomy()
,
parse_newick()
,
parse_phylo()
,
parse_phyloseq()
,
parse_qiime_biom()
,
parse_rdp()
,
parse_silva_fasta()
,
parse_tax_data()
,
parse_ubiome()
,
parse_unite_general()
## Not run: # Look up taxon names in vector from NCBI lookup_tax_data(c("homo sapiens", "felis catus", "Solanaceae"), type = "taxon_name") # Look up taxon names in list from NCBI lookup_tax_data(list("homo sapiens", "felis catus", "Solanaceae"), type = "taxon_name") # Look up taxon names in table from NCBI my_table <- data.frame(name = c("homo sapiens", "felis catus"), decency = c("meh", "good")) lookup_tax_data(my_table, type = "taxon_name", column = "name") # Look up taxon names from NCBI with fuzzy matching lookup_tax_data(c("homo sapienss", "feles catus", "Solanacese"), type = "fuzzy_name") # Look up taxon names from a different database lookup_tax_data(c("homo sapiens", "felis catus", "Solanaceae"), type = "taxon_name", database = "ITIS") # Prevent asking questions for ambiguous taxon names lookup_tax_data(c("homo sapiens", "felis catus", "Solanaceae"), type = "taxon_name", database = "ITIS", ask = FALSE) # Look up taxon IDs from NCBI lookup_tax_data(c("9689", "9694", "9643"), type = "taxon_id") # Look up sequence IDs from NCBI lookup_tax_data(c("AB548412", "FJ358423", "DQ334818"), type = "seq_id") # Make up new taxon IDs instead of using the downloaded ones lookup_tax_data(c("AB548412", "FJ358423", "DQ334818"), type = "seq_id", use_database_ids = FALSE) # --- Parsing multiple datasets at once (advanced) --- # The rest is one example for how to classify multiple datasets at once. # Make example data with taxonomic classifications species_data <- data.frame(tax = c("Mammalia;Carnivora;Felidae", "Mammalia;Carnivora;Felidae", "Mammalia;Carnivora;Ursidae"), species = c("Panthera leo", "Panthera tigris", "Ursus americanus"), species_id = c("A", "B", "C")) # Make example data associated with the taxonomic data # Note how this does not contain classifications, but # does have a varaible in common with "species_data" ("id" = "species_id") abundance <- data.frame(id = c("A", "B", "C", "A", "B", "C"), sample_id = c(1, 1, 1, 2, 2, 2), counts = c(23, 4, 3, 34, 5, 13)) # Make another related data set named by species id common_names <- c(A = "Lion", B = "Tiger", C = "Bear", "Oh my!") # Make another related data set with no names foods <- list(c("ungulates", "boar"), c("ungulates", "boar"), c("salmon", "fruit", "nuts")) # Make a taxmap object with these three datasets x = lookup_tax_data(species_data, type = "taxon_name", datasets = list(counts = abundance, my_names = common_names, foods = foods), mappings = c("species_id" = "id", "species_id" = "{{name}}", "{{index}}" = "{{index}}"), column = "species") # Note how all the datasets have taxon ids now x$data # This allows for complex mappings between variables that other functions use map_data(x, my_names, foods) map_data(x, counts, my_names) ## End(Not run)
## Not run: # Look up taxon names in vector from NCBI lookup_tax_data(c("homo sapiens", "felis catus", "Solanaceae"), type = "taxon_name") # Look up taxon names in list from NCBI lookup_tax_data(list("homo sapiens", "felis catus", "Solanaceae"), type = "taxon_name") # Look up taxon names in table from NCBI my_table <- data.frame(name = c("homo sapiens", "felis catus"), decency = c("meh", "good")) lookup_tax_data(my_table, type = "taxon_name", column = "name") # Look up taxon names from NCBI with fuzzy matching lookup_tax_data(c("homo sapienss", "feles catus", "Solanacese"), type = "fuzzy_name") # Look up taxon names from a different database lookup_tax_data(c("homo sapiens", "felis catus", "Solanaceae"), type = "taxon_name", database = "ITIS") # Prevent asking questions for ambiguous taxon names lookup_tax_data(c("homo sapiens", "felis catus", "Solanaceae"), type = "taxon_name", database = "ITIS", ask = FALSE) # Look up taxon IDs from NCBI lookup_tax_data(c("9689", "9694", "9643"), type = "taxon_id") # Look up sequence IDs from NCBI lookup_tax_data(c("AB548412", "FJ358423", "DQ334818"), type = "seq_id") # Make up new taxon IDs instead of using the downloaded ones lookup_tax_data(c("AB548412", "FJ358423", "DQ334818"), type = "seq_id", use_database_ids = FALSE) # --- Parsing multiple datasets at once (advanced) --- # The rest is one example for how to classify multiple datasets at once. # Make example data with taxonomic classifications species_data <- data.frame(tax = c("Mammalia;Carnivora;Felidae", "Mammalia;Carnivora;Felidae", "Mammalia;Carnivora;Ursidae"), species = c("Panthera leo", "Panthera tigris", "Ursus americanus"), species_id = c("A", "B", "C")) # Make example data associated with the taxonomic data # Note how this does not contain classifications, but # does have a varaible in common with "species_data" ("id" = "species_id") abundance <- data.frame(id = c("A", "B", "C", "A", "B", "C"), sample_id = c(1, 1, 1, 2, 2, 2), counts = c(23, 4, 3, 34, 5, 13)) # Make another related data set named by species id common_names <- c(A = "Lion", B = "Tiger", C = "Bear", "Oh my!") # Make another related data set with no names foods <- list(c("ungulates", "boar"), c("ungulates", "boar"), c("salmon", "fruit", "nuts")) # Make a taxmap object with these three datasets x = lookup_tax_data(species_data, type = "taxon_name", datasets = list(counts = abundance, my_names = common_names, foods = foods), mappings = c("species_id" = "id", "species_id" = "{{name}}", "{{index}}" = "{{index}}"), column = "species") # Note how all the datasets have taxon ids now x$data # This allows for complex mappings between variables that other functions use map_data(x, my_names, foods) map_data(x, counts, my_names) ## End(Not run)
Attempts to save the abundance matrix stored as a table in a taxmap object in the
dada2 ASV abundance matrix format. If the taxmap object was created using
parse_dada2
, then it should be able to replicate the format
exactly with the default settings.
make_dada2_asv_table(obj, asv_table = "asv_table", asv_id = "asv_id")
make_dada2_asv_table(obj, asv_table = "asv_table", asv_id = "asv_id")
obj |
A taxmap object |
asv_table |
The name of the abundance matrix in the taxmap object to use. |
asv_id |
The name of the column in |
A numeric matrix
with rows as samples and columns as ASVs
Other writers:
make_dada2_tax_table()
,
write_greengenes()
,
write_mothur_taxonomy()
,
write_rdp()
,
write_silva_fasta()
,
write_unite_general()
Attempts to save the taxonomy information assocaited with an abundance matrix in a taxmap object
in the dada2 taxonomy matrix format. If the taxmap object was created using
parse_dada2
, then it should be able to replicate the format exactly with the
default settings.
make_dada2_tax_table(obj, asv_table = "asv_table", asv_id = "asv_id")
make_dada2_tax_table(obj, asv_table = "asv_table", asv_id = "asv_id")
obj |
A taxmap object |
asv_table |
The name of the abundance matrix in the taxmap object to use. |
asv_id |
The name of the column in |
A character matrix
with rows as ASVs and columns as taxonomic ranks.
Other writers:
make_dada2_asv_table()
,
write_greengenes()
,
write_mothur_taxonomy()
,
write_rdp()
,
write_silva_fasta()
,
write_unite_general()
Creates a named vector that maps the values of two variables associated with taxa in a [taxonomy()] or [taxmap()] object. Both values must be named by taxon ids.
obj$map_data(from, to, warn = TRUE) map_data(obj, from, to, warn = TRUE)
obj |
The [taxonomy()] or [taxmap()] object. |
from |
The value used to name the output. There will be one output value for each value in 'from'. Any variable that appears in [all_names()] can be used as if it was a variable on its own. |
to |
The value returned in the output. Any variable that appears in [all_names()] can be used as if it was a variable on its own. |
warn |
If 'TRUE', issue a warning if there are multiple unique values of 'to' for each value of 'from'. |
A vector of 'to' values named by values in 'from'.
Other taxonomy data functions:
classifications()
,
id_classifications()
,
is_branch()
,
is_internode()
,
is_leaf()
,
is_root()
,
is_stem()
,
map_data_()
,
n_leaves()
,
n_leaves_1()
,
n_subtaxa()
,
n_subtaxa_1()
,
n_supertaxa()
,
n_supertaxa_1()
,
taxon_ids()
,
taxon_indexes()
,
taxon_names()
,
taxon_ranks()
# Mapping between two variables in `all_names(ex_taxmap)` map_data(ex_taxmap, from = taxon_names, to = n_legs > 0) # Mapping with external variables x = c("d" = "looks like a cat", "h" = "big scary cats", "i" = "smaller cats", "m" = "might eat you", "n" = "Meow! (Feed me!)") map_data(ex_taxmap, from = taxon_names, to = x)
# Mapping between two variables in `all_names(ex_taxmap)` map_data(ex_taxmap, from = taxon_names, to = n_legs > 0) # Mapping with external variables x = c("d" = "looks like a cat", "h" = "big scary cats", "i" = "smaller cats", "m" = "might eat you", "n" = "Meow! (Feed me!)") map_data(ex_taxmap, from = taxon_names, to = x)
Creates a named vector that maps the values of two variables associated with taxa in a [taxonomy()] or [taxmap()] object without using Non-Standard Evaluation (NSE). Both values must be named by taxon ids. This is the same as [map_data()] without NSE and can be useful in some odd cases where NSE fails to work as expected.
obj$map_data(from, to) map_data(obj, from, to)
obj |
The [taxonomy()] or [taxmap()] object. |
from |
The value used to name the output. There will be one output value for each value in 'from'. |
to |
The value returned in the output. |
A vector of 'to' values named by values in 'from'.
Other taxonomy data functions:
classifications()
,
id_classifications()
,
is_branch()
,
is_internode()
,
is_leaf()
,
is_root()
,
is_stem()
,
map_data()
,
n_leaves()
,
n_leaves_1()
,
n_subtaxa()
,
n_subtaxa_1()
,
n_supertaxa()
,
n_supertaxa_1()
,
taxon_ids()
,
taxon_indexes()
,
taxon_names()
,
taxon_ranks()
x = c("d" = "looks like a cat", "h" = "big scary cats", "i" = "smaller cats", "m" = "might eat you", "n" = "Meow! (Feed me!)") map_data_(ex_taxmap, from = ex_taxmap$taxon_names(), to = x)
x = c("d" = "looks like a cat", "h" = "big scary cats", "i" = "smaller cats", "m" = "might eat you", "n" = "Meow! (Feed me!)") map_data_(ex_taxmap, from = ex_taxmap$taxon_names(), to = x)
A package for planning and analysis of amplicon metagenomics research projects.
The goal of the metacoder
package is to provide a set of tools for:
Standardized parsing of taxonomic information from diverse resources.
Visualization of statistics distributed over taxonomic classifications.
Evaluating potential metabarcoding primers for taxonomic specificity.
Providing flexible functions for analyzing taxonomic and abundance data.
To accomplish these goals, metacoder
leverages resources from other R packages, interfaces with
external programs, and provides novel functions where needed to allow for entire analyses within R.
The full documentation can be found online at https://grunwaldlab.github.io/metacoder_documentation/.
There is also a short vignette included for offline use that can be accessed by the following code:
browseVignettes(package = "metacoder")
Plotting:
In silico PCR:
Analysis:
Parsers:
Writers:
Database querying:
These are the classes users would typically interact with:
* [taxon]: A class used to define a single taxon. Many other classes in the 'taxa“ package include one or more objects of this class. * : Stores one or more [taxon] objects. This is just a thin wrapper for a list of [taxon] objects. * [hierarchy]: A class containing an ordered list of [taxon] objects that represent a hierarchical classification. * [hierarchies]: A list of taxonomic classifications. This is just a thin wrapper for a list of [hierarchy] objects. * [taxonomy]: A taxonomy composed of [taxon] objects organized in a tree structure. This differs from the [hierarchies] class in how the [taxon] objects are stored. Unlike a [hierarchies] object, each unique taxon is stored only once and the relationships between taxa are stored in an edgelist. * [taxmap]: A class designed to store a taxonomy and associated user-defined data. This class builds on the [taxonomy] class. User defined data can be stored in the list 'obj$data', where 'obj' is a taxmap object. Any number of user-defined lists, vectors, or tables mapped to taxa can be manipulated in a cohesive way such that relationships between taxa and data are preserved.
These classes are mostly components for the larger classes above and would not typically be used on their own.
* [taxon_database]: Used to store information about taxonomy databases. * [taxon_id]: Used to store taxon IDs, either arbitrary or from a particular taxonomy database. * [taxon_name]: Used to store taxon names, either arbitrary or from a particular taxonomy database. * [taxon_rank]: Used to store taxon ranks (e.g. species, family), either arbitrary or from a particular taxonomy database.
These are some of the more important functions used to filter data in classes that store multiple taxa, like [hierarchies], [taxmap], and [taxonomy].
* [filter_taxa]: Filter taxa in a [taxonomy] or [taxmap] object with a series of conditions. Relationships between remaining taxa and user-defined data are preserved (There are many options controlling this). * [filter_obs]: Filter user-defined data [taxmap] object with a series of conditions. Relationships between remaining taxa and user-defined data are preserved (There are many options controlling this); * [sample_n_taxa]: Randomly sample taxa. Has same abilities as [filter_taxa]. * [sample_n_obs]: Randomly sample observations. Has same abilities as [filter_obs]. * [mutate_obs]: Add datasets or columns to datasets in the 'data' list of [taxmap] objects. * [pick]: Pick out specific taxa, while others are dropped in [hierarchy] and [hierarchies] objects. * [pop]: Pop out taxa (drop them) in [hierarchy] and [hierarchies] objects. * [span]: Select a range of taxa, either by two names, or relational operators in [hierarchy] and [hierarchies] objects.
There are lots of functions for getting information for each taxon.
* [subtaxa]: Return data for the subtaxa of each taxon in an [taxonomy] or [taxmap] object. * [supertaxa]: Return data for the supertaxa of each taxon in an [taxonomy] or [taxmap] object. * [roots]: Return data for the roots of each taxon in an [taxonomy] or [taxmap] object. * [leaves]: Return data for the leaves of each taxon in an [taxonomy] or [taxmap] object. * [obs]: Return user-specific data for each taxon and all of its subtaxa in an [taxonomy] or [taxmap] object.
Note, this is mostly of interest to developers and advanced users.
The classes in the 'taxa' package are mostly [R6](https://adv-r.hadley.nz/r6.html) classes ([R6Class]). A few of the simpler ones ( and [hierarchies]) are [S3](https://adv-r.hadley.nz/s3.html) instead. R6 classes are different than most R objects because they are [mutable](https://en.wikipedia.org/wiki/Immutable_object) (e.g. A function can change its input without returning it). In this, they are more similar to class systems in [object-oriented](https://en.wikipedia.org/wiki/Object-oriented_programming) languages like python. As in other object-oriented class systems, functions are thought to "belong" to classes (i.e. the data), rather than functions existing independently of the data. For example, the function 'print' in R exists apart from what it is printing, although it will change how it prints based on what the class of the data is that is passed to it. In fact, a user can make a custom print method for their own class by defining a function called 'print.myclassname'. In contrast, the functions that operate on R6 functions are "packaged" with the data they operate on. For example, a print method of an object for an R6 class might be called like 'my_data$print()' instead of 'print(my_data)'.
Note, you will need to read the previous section to fully understand this one.
Since the R6 function syntax (e.g. 'my_data$print()') might be confusing to many R users, all functions in 'taxa' also have S3 versions. For example, the [filter_taxa()] function can be called on a [taxmap] object called 'my_obj' like 'my_obj$filter_taxa(...)' (the R6 syntax) or 'filter_taxa(my_obj, ...)' (the S3 syntax). For some functions, these two way of calling the function can have different effect. For functions that do not returned a modified version of the input (e.g. [subtaxa()]), the two ways have identical behavior. However, functions like [filter_taxa()], that modify their inputs, actually change the object passed to them as the first argument as well as returning that object. For example,
'my_obj <- filter_taxa(my_obj, ...)'
and
'my_obj$filter_taxa(...)'
and
'new_obj <- my_obj$filter_taxa(...)'
all replace 'my_obj' with the filtered result, but
'new_obj <- filter_taxa(my_obj, ...)'
will not modify 'my_obj'.
This is a rather advanced topic.
Like packages such as 'ggplot2' and [dplyr], the 'taxa' package uses non-standard evaluation to allow code to be more readable and shorter. In effect, there are variables that only "exist" inside a function call and depend on what is passed to that function as the first parameter (usually a class object). For example, in the 'dpylr' function [filter()], column names can be used as if they were independent variables. See '?dpylr::filter' for examples of this. The 'taxa' package builds on this idea.
For many functions that work on [taxonomy] or [taxmap] objects (e.g. [filter_taxa]), some functions that return per-taxon information (e.g. [taxon_names()]) can be referred to by just the name of the function. When one of these functions are referred to by name, the function is run on the relevant object and its value replaces the function name. For example,
'new_obj <- filter_taxa(my_obj, taxon_names == "Bacteria")'
is identical to:
'new_obj <- filter_taxa(my_obj, taxon_names(my_obj) == "Bacteria")'
which is identical to:
'new_obj <- filter_taxa(my_obj, my_obj$taxon_names() == "Bacteria")'
which is identical to:
'my_names <- taxon_names(my_obj)'
'new_obj <- filter_taxa(my_obj, my_names == "Bacteria")'
For 'taxmap' objects, you can also use names of user defined lists, vectors, and the names of columns in user-defined tables that are stored in the 'obj$data' list. See [filter_taxa()] for examples. You can even add your own functions that are called by name by adding them to the 'obj$funcs' list. For any object with functions that use non-standard evaluation, you can see what values can be used with [all_names()] like 'all_names(obj)'.
Various elements of the 'taxa' package were inspired by the [dplyr] and [taxize] packages. This package started as parts of the 'metacoder' and 'binomen' packages. There are also many dependencies that make 'taxa' possible.
Find a problem? Have a suggestion? Have a question? Please submit an issue at our [GitHub repository](https://github.com/ropensci/taxa):
[https://github.com/ropensci/taxa/issues](https://github.com/ropensci/taxa/issues)
A GitHub account is free and easy to set up. We welcome feedback! If you don't want to use GitHub for some reason, feel free to email us. We do prefer posting to github since it allows others that might have the same issue to see our conversation. It also helps us keep track of what problems we need to address.
Want to contribute code or make a change to the code? Great, thank you! Please [fork](https://help.github.com/articles/fork-a-repo/) our GitHub repository and submit a [pull request](https://help.github.com/articles/about-pull-requests/).
Zachary Foster and Niklaus Grunwald
Add columns to tables in 'obj$data' in [taxmap()] objects. See [dplyr::mutate()] for the inspiration for this function and more information. Calling the function using the 'obj$mutate_obs(...)' style edits "obj" in place, unlike most R functions. However, calling the function using the ‘mutate_obs(obj, ...)' imitates R’s traditional copy-on-modify semantics, so "obj" would not be changed; instead a changed version would be returned, like most R functions.
obj$mutate_obs(data, ...) mutate_obs(obj, data, ...)
obj |
An object of type [taxmap()] |
data |
Dataset name, index, or a logical vector that indicates which dataset in 'obj$data' to add columns to. |
... |
One or more named columns to add. Newly created columns can be referenced in the same function call. Any variable name that appears in [all_names()] can be used as if it was a vector on its own. |
target |
DEPRECIATED. use "data" instead. |
An object of type [taxmap()]
Other taxmap manipulation functions:
arrange_obs()
,
arrange_taxa()
,
filter_obs()
,
filter_taxa()
,
sample_frac_obs()
,
sample_frac_taxa()
,
sample_n_obs()
,
sample_n_taxa()
,
select_obs()
,
transmute_obs()
# Add column to existing tables mutate_obs(ex_taxmap, "info", new_col = "Im new", newer_col = paste0(new_col, "er!")) # Create columns in a new table mutate_obs(ex_taxmap, "new_table", nums = 1:10, squared = nums ^ 2) # Add a new vector mutate_obs(ex_taxmap, "new_vector", 1:10) # Add a new list mutate_obs(ex_taxmap, "new_list", list(1, 2))
# Add column to existing tables mutate_obs(ex_taxmap, "info", new_col = "Im new", newer_col = paste0(new_col, "er!")) # Create columns in a new table mutate_obs(ex_taxmap, "new_table", nums = 1:10, squared = nums ^ 2) # Add a new vector mutate_obs(ex_taxmap, "new_vector", 1:10) # Add a new list mutate_obs(ex_taxmap, "new_list", list(1, 2))
Get number of leaves for each taxon in an object of type [taxonomy()] or [taxmap()]
obj$n_leaves() n_leaves(obj)
obj |
([taxonomy()] or [taxmap()]) |
numeric
Other taxonomy data functions:
classifications()
,
id_classifications()
,
is_branch()
,
is_internode()
,
is_leaf()
,
is_root()
,
is_stem()
,
map_data()
,
map_data_()
,
n_leaves_1()
,
n_subtaxa()
,
n_subtaxa_1()
,
n_supertaxa()
,
n_supertaxa_1()
,
taxon_ids()
,
taxon_indexes()
,
taxon_names()
,
taxon_ranks()
# Get number of leaves for each taxon n_leaves(ex_taxmap) # Filter taxa based on number of leaves filter_taxa(ex_taxmap, n_leaves > 0)
# Get number of leaves for each taxon n_leaves(ex_taxmap) # Filter taxa based on number of leaves filter_taxa(ex_taxmap, n_leaves > 0)
Get number of leaves for each taxon in an object of type [taxonomy()] or [taxmap()], not including leaves of subtaxa etc.
obj$n_leaves_1() n_leaves_1(obj)
obj |
([taxonomy()] or [taxmap()]) |
numeric
Other taxonomy data functions:
classifications()
,
id_classifications()
,
is_branch()
,
is_internode()
,
is_leaf()
,
is_root()
,
is_stem()
,
map_data()
,
map_data_()
,
n_leaves()
,
n_subtaxa()
,
n_subtaxa_1()
,
n_supertaxa()
,
n_supertaxa_1()
,
taxon_ids()
,
taxon_indexes()
,
taxon_names()
,
taxon_ranks()
# Get number of leaves for each taxon n_leaves_1(ex_taxmap) # Filter taxa based on number of leaves filter_taxa(ex_taxmap, n_leaves_1 > 0)
# Get number of leaves for each taxon n_leaves_1(ex_taxmap) # Filter taxa based on number of leaves filter_taxa(ex_taxmap, n_leaves_1 > 0)
Count observations for each taxon in a data set in a [taxmap()] object. This includes observations for the specific taxon and the observations of its subtaxa. "Observations" in this sense are the items (for list/vectors) or rows (for tables) in a dataset. By default, observations in the first data set in the [taxmap()] object is used. For example, if the data set is a table, then a value of 3 for a taxon means that their are 3 rows in that table assigned to that taxon or one of its subtaxa.
obj$n_obs(data) n_obs(obj, data)
obj |
([taxmap()]) |
data |
Dataset name, index, or a logical vector that indicates which dataset in 'obj$data' to add columns to. |
target |
DEPRECIATED. use "data" instead. |
'numeric'
Other taxmap data functions:
n_obs_1()
# Get number of observations for each taxon in first dataset n_obs(ex_taxmap) # Get number of observations in a specified data set n_obs(ex_taxmap, "info") n_obs(ex_taxmap, "abund") # Filter taxa using number of observations in the first table filter_taxa(ex_taxmap, n_obs > 1)
# Get number of observations for each taxon in first dataset n_obs(ex_taxmap) # Get number of observations in a specified data set n_obs(ex_taxmap, "info") n_obs(ex_taxmap, "abund") # Filter taxa using number of observations in the first table filter_taxa(ex_taxmap, n_obs > 1)
Count observations for each taxon in a data set in a [taxmap()] object. This includes observations for the specific taxon but NOT the observations of its subtaxa. "Observations" in this sense are the items (for list/vectors) or rows (for tables) in a dataset. By default, observations in the first data set in the [taxmap()] object is used. For example, if the data set is a table, then a value of 3 for a taxon means that their are 3 rows in that table assigned to that taxon.
obj$n_obs_1(data) n_obs_1(obj, data)
obj |
([taxmap()]) |
data |
Dataset name, index, or a logical vector that indicates which dataset in 'obj$data' to add columns to. |
target |
DEPRECIATED. use "data" instead. |
'numeric'
Other taxmap data functions:
n_obs()
# Get number of observations for each taxon in first dataset n_obs_1(ex_taxmap) # Get number of observations in a specified data set n_obs_1(ex_taxmap, "info") n_obs_1(ex_taxmap, "abund") # Filter taxa using number of observations in the first table filter_taxa(ex_taxmap, n_obs_1 > 0)
# Get number of observations for each taxon in first dataset n_obs_1(ex_taxmap) # Get number of observations in a specified data set n_obs_1(ex_taxmap, "info") n_obs_1(ex_taxmap, "abund") # Filter taxa using number of observations in the first table filter_taxa(ex_taxmap, n_obs_1 > 0)
Get number of subtaxa for each taxon in an object of type [taxonomy()] or [taxmap()]
obj$n_subtaxa() n_subtaxa(obj)
obj |
([taxonomy()] or [taxmap()]) |
numeric
Other taxonomy data functions:
classifications()
,
id_classifications()
,
is_branch()
,
is_internode()
,
is_leaf()
,
is_root()
,
is_stem()
,
map_data()
,
map_data_()
,
n_leaves()
,
n_leaves_1()
,
n_subtaxa_1()
,
n_supertaxa()
,
n_supertaxa_1()
,
taxon_ids()
,
taxon_indexes()
,
taxon_names()
,
taxon_ranks()
# Count number of subtaxa within each taxon n_subtaxa(ex_taxmap) # Filter taxa based on number of subtaxa # (this command removed all leaves or "tips" of the tree) filter_taxa(ex_taxmap, n_subtaxa > 0)
# Count number of subtaxa within each taxon n_subtaxa(ex_taxmap) # Filter taxa based on number of subtaxa # (this command removed all leaves or "tips" of the tree) filter_taxa(ex_taxmap, n_subtaxa > 0)
Get number of subtaxa for each taxon in an object of type [taxonomy()] or [taxmap()], not including subtaxa of subtaxa etc. This does not include subtaxa assigned to subtaxa.
obj$n_subtaxa_1() n_subtaxa_1(obj)
obj |
([taxonomy()] or [taxmap()]) |
numeric
Other taxonomy data functions:
classifications()
,
id_classifications()
,
is_branch()
,
is_internode()
,
is_leaf()
,
is_root()
,
is_stem()
,
map_data()
,
map_data_()
,
n_leaves()
,
n_leaves_1()
,
n_subtaxa()
,
n_supertaxa()
,
n_supertaxa_1()
,
taxon_ids()
,
taxon_indexes()
,
taxon_names()
,
taxon_ranks()
# Count number of immediate subtaxa in each taxon n_subtaxa_1(ex_taxmap) # Filter taxa based on number of subtaxa # (this command removed all leaves or "tips" of the tree) filter_taxa(ex_taxmap, n_subtaxa_1 > 0)
# Count number of immediate subtaxa in each taxon n_subtaxa_1(ex_taxmap) # Filter taxa based on number of subtaxa # (this command removed all leaves or "tips" of the tree) filter_taxa(ex_taxmap, n_subtaxa_1 > 0)
Get number of supertaxa for each taxon in an object of type [taxonomy()] or [taxmap()].
obj$n_supertaxa() n_supertaxa(obj)
obj |
([taxonomy()] or [taxmap()]) |
numeric
Other taxonomy data functions:
classifications()
,
id_classifications()
,
is_branch()
,
is_internode()
,
is_leaf()
,
is_root()
,
is_stem()
,
map_data()
,
map_data_()
,
n_leaves()
,
n_leaves_1()
,
n_subtaxa()
,
n_subtaxa_1()
,
n_supertaxa_1()
,
taxon_ids()
,
taxon_indexes()
,
taxon_names()
,
taxon_ranks()
# Count number of supertaxa that contain each taxon n_supertaxa(ex_taxmap) # Filter taxa based on the number of supertaxa # (this command removes all root taxa) filter_taxa(ex_taxmap, n_supertaxa > 0)
# Count number of supertaxa that contain each taxon n_supertaxa(ex_taxmap) # Filter taxa based on the number of supertaxa # (this command removes all root taxa) filter_taxa(ex_taxmap, n_supertaxa > 0)
Get number of immediate supertaxa (i.e. not supertaxa of supertaxa, etc) for each taxon in an object of type [taxonomy()] or [taxmap()]. This should always be either 1 or 0.
obj$n_supertaxa_1() n_supertaxa_1(obj)
obj |
([taxonomy()] or [taxmap()]) |
numeric
Other taxonomy data functions:
classifications()
,
id_classifications()
,
is_branch()
,
is_internode()
,
is_leaf()
,
is_root()
,
is_stem()
,
map_data()
,
map_data_()
,
n_leaves()
,
n_leaves_1()
,
n_subtaxa()
,
n_subtaxa_1()
,
n_supertaxa()
,
taxon_ids()
,
taxon_indexes()
,
taxon_names()
,
taxon_ranks()
# Test for the presence of supertaxa containing each taxon n_supertaxa_1(ex_taxmap) # Filter taxa based on the presence of supertaxa # (this command removes all root taxa) filter_taxa(ex_taxmap, n_supertaxa_1 > 0)
# Test for the presence of supertaxa containing each taxon n_supertaxa_1(ex_taxmap) # Filter taxa based on the presence of supertaxa # (this command removes all root taxa) filter_taxa(ex_taxmap, n_supertaxa_1 > 0)
Downloads a sample of sequences meant to evenly capture the diversity of a
given taxon. Can be used to get a shallow sampling of vast groups.
CAUTION: This function can make MANY queries to Genbank depending on
arguments given and can take a very long time. Choose your arguments
carefully to avoid long waits and needlessly stressing NCBI's servers. Use a
downloaded database and a parser from the taxa
package when possible.
ncbi_taxon_sample( name = NULL, id = NULL, target_rank, min_counts = NULL, max_counts = NULL, interpolate_min = TRUE, interpolate_max = TRUE, min_children = NULL, max_children = NULL, seqrange = "1:3000", getrelated = FALSE, fuzzy = TRUE, limit = 10, entrez_query = NULL, hypothetical = FALSE, verbose = TRUE )
ncbi_taxon_sample( name = NULL, id = NULL, target_rank, min_counts = NULL, max_counts = NULL, interpolate_min = TRUE, interpolate_max = TRUE, min_children = NULL, max_children = NULL, seqrange = "1:3000", getrelated = FALSE, fuzzy = TRUE, limit = 10, entrez_query = NULL, hypothetical = FALSE, verbose = TRUE )
name |
( |
id |
( |
target_rank |
( |
min_counts |
(named |
max_counts |
(named |
interpolate_min |
( |
interpolate_max |
( |
min_children |
(named |
max_children |
(named |
seqrange |
(character) Sequence range, as e.g., "1:1000". This is the range of sequence lengths to search for. So "1:1000" means search for sequences from 1 to 1000 characters in length. |
getrelated |
(logical) If TRUE, gets the longest sequences of a species in the same genus as the one searched for. If FALSE, returns nothing if no match found. |
fuzzy |
(logical) Whether to do fuzzy taxonomic ID search or exact
search. If |
limit |
( |
entrez_query |
( |
hypothetical |
( |
verbose |
( |
## Not run: # Look up 5 ITS sequences from each fungal class data <- ncbi_taxon_sample(name = "Fungi", target_rank = "class", limit = 5, entrez_query = '"internal transcribed spacer"[All Fields]') # Look up taxonomic information for sequences obj <- lookup_tax_data(data, type = "seq_id", column = "gi_no") # Plot information filter_taxa(obj, taxon_names == "Fungi", subtaxa = TRUE) %>% heat_tree(node_label = taxon_names, node_color = n_obs, node_size = n_obs) ## End(Not run)
## Not run: # Look up 5 ITS sequences from each fungal class data <- ncbi_taxon_sample(name = "Fungi", target_rank = "class", limit = 5, entrez_query = '"internal transcribed spacer"[All Fields]') # Look up taxonomic information for sequences obj <- lookup_tax_data(data, type = "seq_id", column = "gi_no") # Plot information filter_taxa(obj, taxon_names == "Fungi", subtaxa = TRUE) %>% heat_tree(node_label = taxon_names, node_color = n_obs, node_size = n_obs) ## End(Not run)
Given a [taxmap()] object, return data associated with each taxon in a given table included in that [taxmap()] object.
obj$obs(data, value = NULL, subset = NULL, recursive = TRUE, simplify = FALSE) obs(obj, data, value = NULL, subset = NULL, recursive = TRUE, simplify = FALSE)
obj |
([taxmap()]) The [taxmap()] object containing taxon information to be queried. |
data |
Either the name of something in 'obj$data' that has taxon information or a an external object with taxon information. For tables, there must be a column named "taxon_id" and lists/vectors must be named by taxon ID. |
value |
What data to return. This is usually the name of column in a table in 'obj$data'. Any result of 'all_names(obj)' can be used. If the value used has names, it is assumed that the names are taxon ids and the taxon ids are used to look up the correct values. |
subset |
Taxon IDs, TRUE/FALSE vector, or taxon indexes to find observations for. Default: All taxa in 'obj' will be used. Any variable name that appears in [all_names()] can be used as if it was a vector on its own. |
recursive |
('logical' or 'numeric') If 'FALSE', only return the observation assigned to the specified input taxa, not subtaxa. If 'TRUE', return all the observations of every subtaxa, etc. Positive numbers indicate the number of ranks below the each taxon to get observations for '0' is equivalent to 'FALSE'. Negative numbers are equivalent to 'TRUE'. |
simplify |
('logical') If 'TRUE', then combine all the results into a single vector of unique observation indexes. |
If 'simplify = FALSE', then a list of vectors of observation indexes are returned corresponding to the 'data' argument. If 'simplify = TRUE', then the observation indexes for all 'data' taxa are returned in a single vector.
# Get indexes of rows corresponding to each taxon obs(ex_taxmap, "info") # Get only a subset of taxon indexes obs(ex_taxmap, "info", subset = 1:2) # Get only a subset of taxon IDs obs(ex_taxmap, "info", subset = c("b", "c")) # Get only a subset of taxa using logical tests obs(ex_taxmap, "info", subset = taxon_ranks == "genus") # Only return indexes of rows assinged to each taxon explicitly obs(ex_taxmap, "info", recursive = FALSE) # Lump all row indexes in a single vector obs(ex_taxmap, "info", simplify = TRUE) # Return values from a dataset instead of indexes obs(ex_taxmap, "info", value = "name")
# Get indexes of rows corresponding to each taxon obs(ex_taxmap, "info") # Get only a subset of taxon indexes obs(ex_taxmap, "info", subset = 1:2) # Get only a subset of taxon IDs obs(ex_taxmap, "info", subset = c("b", "c")) # Get only a subset of taxa using logical tests obs(ex_taxmap, "info", subset = taxon_ranks == "genus") # Only return indexes of rows assinged to each taxon explicitly obs(ex_taxmap, "info", recursive = FALSE) # Lump all row indexes in a single vector obs(ex_taxmap, "info", simplify = TRUE) # Return values from a dataset instead of indexes obs(ex_taxmap, "info", value = "name")
Apply a function to data for the observations for each taxon. This is similar to using [obs()] with [lapply()] or [sapply()].
obj$obs_apply(data, func, simplify = FALSE, value = NULL, subset = NULL, recursive = TRUE, ...) obs_apply(obj, data, func, simplify = FALSE, value = NULL, subset = NULL, recursive = TRUE, ...)
obj |
The [taxmap()] object containing taxon information to be queried. |
data |
Either the name of something in 'obj$data' that has taxon information or a an external object with taxon information. For tables, there must be a column named "taxon_id" and lists/vectors must be named by taxon ID. |
func |
('function') The function to apply. |
simplify |
('logical') If 'TRUE', convert lists to vectors. |
value |
What data to give to the function. This is usually the name of column in a table in 'obj$data'. Any result of 'all_names(obj)' can be used, but it usually only makes sense to use columns in the dataset specified by the 'data' option. By default, the indexes of observation in 'data' are returned. |
subset |
Taxon IDs, TRUE/FALSE vector, or taxon indexes to use. Default: All taxa in 'obj' will be used. Any variable name that appears in [all_names()] can be used as if it was a vector on its own. |
recursive |
('logical' or 'numeric') If 'FALSE', only return the observation assigned to the specified input taxa, not subtaxa. If 'TRUE', return all the observations of every subtaxa, etc. Positive numbers indicate the number of ranks below the each taxon to get observations for '0' is equivalent to 'FALSE'. Negative numbers are equivalent to 'TRUE'. |
... |
Extra arguments are passed to the function. |
# Find the average number of legs in each taxon obs_apply(ex_taxmap, "info", mean, value = "n_legs", simplify = TRUE) # One way to implement `n_obs` and find the number of observations per taxon obs_apply(ex_taxmap, "info", length, simplify = TRUE)
# Find the average number of legs in each taxon obs_apply(ex_taxmap, "info", mean, value = "n_legs", simplify = TRUE) # One way to implement `n_obs` and find the number of observations per taxon obs_apply(ex_taxmap, "info", length, simplify = TRUE)
Convert the ASV table and taxonomy table returned by dada2 into a taxmap object. An example of the input format can be found by following the dada2 tutorial here: shttps://benjjneb.github.io/dada2/tutorial.html
parse_dada2( seq_table, tax_table, class_key = "taxon_name", class_regex = "(.*)", include_match = TRUE )
parse_dada2( seq_table, tax_table, class_key = "taxon_name", class_regex = "(.*)", include_match = TRUE )
seq_table |
The ASV abundance matrix, with rows as samples and columns as ASV ids or sequences |
tax_table |
The table with taxonomic classifications for ASVs, with ASVs in rows and taxonomic ranks as columns. |
class_key |
('character' of length 1) The identity of the capturing groups defined using 'class_regex'. The length of 'class_key' must be equal to the number of capturing groups specified in 'class_regex'. Any names added to the terms will be used as column names in the output. At least one '"taxon_name"' must be specified. Only '"info"' can be used multiple times. Each term must be one of those described below: * 'taxon_name': The name of a taxon. Not necessarily unique, but are interpretable by a particular 'database'. Requires an internet connection. * 'taxon_rank': The rank of the taxon. This will be used to add rank info into the output object that can be accessed by 'out$taxon_ranks()'. * 'info': Arbitrary taxon info you want included in the output. Can be used more than once. |
class_regex |
('character' of length 1) A regular expression with capturing groups indicating the locations of data for each taxon in the 'class' term in the 'key' argument. The identity of the information must be specified using the 'class_key' argument. The 'class_sep' option can be used to split the classification into data for each taxon before matching. If 'class_sep' is 'NULL', each match of 'class_regex' defines a taxon in the classification. |
include_match |
('logical' of length 1) If 'TRUE', include the part of the input matched by 'class_regex' in the output object. |
Other parsers:
extract_tax_data()
,
lookup_tax_data()
,
parse_edge_list()
,
parse_greengenes()
,
parse_mothur_tax_summary()
,
parse_mothur_taxonomy()
,
parse_newick()
,
parse_phylo()
,
parse_phyloseq()
,
parse_qiime_biom()
,
parse_rdp()
,
parse_silva_fasta()
,
parse_tax_data()
,
parse_ubiome()
,
parse_unite_general()
Parses the greengenes database.
parse_greengenes(tax_file, seq_file = NULL)
parse_greengenes(tax_file, seq_file = NULL)
tax_file |
( |
seq_file |
( |
The taxonomy input file has a format like:
228054 k__Bacteria; p__Cyanobacteria; c__Synechococcophycideae; o__Synech... 844608 k__Bacteria; p__Cyanobacteria; c__Synechococcophycideae; o__Synech... ...
The optional sequence file has a format like:
>1111886 AACGAACGCTGGCGGCATGCCTAACACATGCAAGTCGAACGAGACCTTCGGGTCTAGTGGCGCACGGGTGCGTA... >1111885 AGAGTTTGATCCTGGCTCAGAATGAACGCTGGCGGCGTGCCTAACACATGCAAGTCGTACGAGAAATCCCGAGC... ...
Other parsers:
extract_tax_data()
,
lookup_tax_data()
,
parse_dada2()
,
parse_edge_list()
,
parse_mothur_tax_summary()
,
parse_mothur_taxonomy()
,
parse_newick()
,
parse_phylo()
,
parse_phyloseq()
,
parse_qiime_biom()
,
parse_rdp()
,
parse_silva_fasta()
,
parse_tax_data()
,
parse_ubiome()
,
parse_unite_general()
Parse the '*.tax.summary' file that is returned by the 'Classify.seqs' command in mothur.
parse_mothur_tax_summary(file = NULL, text = NULL, table = NULL)
parse_mothur_tax_summary(file = NULL, text = NULL, table = NULL)
file |
( |
text |
( |
table |
( |
The input file has a format like:
taxlevel rankID taxon daughterlevels total A B C 0 0 Root 2 242 84 84 74 1 0.1 Bacteria 50 242 84 84 74 2 0.1.2 Actinobacteria 38 13 0 13 0 3 0.1.2.3 Actinomycetaceae-Bifidobacteriaceae 10 13 0 13 0 4 0.1.2.3.7 Bifidobacteriaceae 6 13 0 13 0 5 0.1.2.3.7.2 Bifidobacterium_choerinum_et_rel. 8 13 0 13 0 6 0.1.2.3.7.2.1 Bifidobacterium_angulatum_et_rel. 1 11 0 11 0 7 0.1.2.3.7.2.1.1 unclassified 1 11 0 11 0 8 0.1.2.3.7.2.1.1.1 unclassified 1 11 0 11 0 9 0.1.2.3.7.2.1.1.1.1 unclassified 1 11 0 11 0 10 0.1.2.3.7.2.1.1.1.1.1 unclassified 1 11 0 11 0 11 0.1.2.3.7.2.1.1.1.1.1.1 unclassified 1 11 0 11 0 12 0.1.2.3.7.2.1.1.1.1.1.1.1 unclassified 1 11 0 11 0 6 0.1.2.3.7.2.5 Bifidobacterium_longum_et_rel. 1 2 0 2 0 7 0.1.2.3.7.2.5.1 unclassified 1 2 0 2 0 8 0.1.2.3.7.2.5.1.1 unclassified 1 2 0 2 0 9 0.1.2.3.7.2.5.1.1.1 unclassified 1 2 0 2 0
or
taxon total A B C "k__Bacteria";"p__Actinobacteria";"c__Actinobacteria";... 1 0 1 0 "k__Bacteria";"p__Actinobacteria";"c__Actinobacteria";... 1 0 1 0 "k__Bacteria";"p__Actinobacteria";"c__Actinobacteria";... 1 0 1 0
Other parsers:
extract_tax_data()
,
lookup_tax_data()
,
parse_dada2()
,
parse_edge_list()
,
parse_greengenes()
,
parse_mothur_taxonomy()
,
parse_newick()
,
parse_phylo()
,
parse_phyloseq()
,
parse_qiime_biom()
,
parse_rdp()
,
parse_silva_fasta()
,
parse_tax_data()
,
parse_ubiome()
,
parse_unite_general()
Parse the '*.taxonomy' file that is returned by the 'Classify.seqs' command in mothur. If confidence scores are present, they are included in the output.
parse_mothur_taxonomy(file = NULL, text = NULL)
parse_mothur_taxonomy(file = NULL, text = NULL)
file |
( |
text |
( |
The input file has a format like:
AY457915 Bacteria(100);Firmicutes(99);Clostridiales(99);Johnsone... AY457914 Bacteria(100);Firmicutes(100);Clostridiales(100);Johnso... AY457913 Bacteria(100);Firmicutes(100);Clostridiales(100);Johnso... AY457912 Bacteria(100);Firmicutes(99);Clostridiales(99);Johnsone... AY457911 Bacteria(100);Firmicutes(99);Clostridiales(98);Ruminoco...
or...
AY457915 Bacteria;Firmicutes;Clostridiales;Johnsonella_et_rel.;J... AY457914 Bacteria;Firmicutes;Clostridiales;Johnsonella_et_rel.;J... AY457913 Bacteria;Firmicutes;Clostridiales;Johnsonella_et_rel.;J... AY457912 Bacteria;Firmicutes;Clostridiales;Johnsonella_et_rel.;J... AY457911 Bacteria;Firmicutes;Clostridiales;Ruminococcus_et_rel.;...
Other parsers:
extract_tax_data()
,
lookup_tax_data()
,
parse_dada2()
,
parse_edge_list()
,
parse_greengenes()
,
parse_mothur_tax_summary()
,
parse_newick()
,
parse_phylo()
,
parse_phyloseq()
,
parse_qiime_biom()
,
parse_rdp()
,
parse_silva_fasta()
,
parse_tax_data()
,
parse_ubiome()
,
parse_unite_general()
Parse a Newick file into a taxmap object.
parse_newick(file = NULL, text = NULL)
parse_newick(file = NULL, text = NULL)
file |
( |
text |
( |
The input file has a format like:
(ant:17, (bat:31, cow:22):7, dog:22, (elk:33, fox:12):40); (dog:20, (elephant:30, horse:60):20):50;
Other parsers:
extract_tax_data()
,
lookup_tax_data()
,
parse_dada2()
,
parse_edge_list()
,
parse_greengenes()
,
parse_mothur_tax_summary()
,
parse_mothur_taxonomy()
,
parse_phylo()
,
parse_phyloseq()
,
parse_qiime_biom()
,
parse_rdp()
,
parse_silva_fasta()
,
parse_tax_data()
,
parse_ubiome()
,
parse_unite_general()
Parses a phylo object from the ape package.
parse_phylo(obj)
parse_phylo(obj)
obj |
A phylo object from the ape package. |
Other parsers:
extract_tax_data()
,
lookup_tax_data()
,
parse_dada2()
,
parse_edge_list()
,
parse_greengenes()
,
parse_mothur_tax_summary()
,
parse_mothur_taxonomy()
,
parse_newick()
,
parse_phyloseq()
,
parse_qiime_biom()
,
parse_rdp()
,
parse_silva_fasta()
,
parse_tax_data()
,
parse_ubiome()
,
parse_unite_general()
Converts a phyloseq object to a taxmap object.
parse_phyloseq(obj, class_regex = "(.*)", class_key = "taxon_name")
parse_phyloseq(obj, class_regex = "(.*)", class_key = "taxon_name")
obj |
A phyloseq object |
class_regex |
A regular expression used to parse data in the taxon
names. There must be a capture group (a pair of parentheses) for each item
in |
class_key |
('character' of length 1) The identity of the capturing groups defined using 'class_regex'. The length of 'class_key' must be equal to the number of capturing groups specified in 'class_regex'. Any names added to the terms will be used as column names in the output. At least one '"taxon_name"' must be specified. Only '"info"' can be used multiple times. Each term must be one of those described below: * 'taxon_name': The name of a taxon. Not necessarily unique, but are interpretable by a particular 'database'. Requires an internet connection. * 'taxon_rank': The rank of the taxon. This will be used to add rank info into the output object that can be accessed by 'out$taxon_ranks()'. * 'info': Arbitrary taxon info you want included in the output. Can be used more than once. |
A taxmap object
Other parsers:
extract_tax_data()
,
lookup_tax_data()
,
parse_dada2()
,
parse_edge_list()
,
parse_greengenes()
,
parse_mothur_tax_summary()
,
parse_mothur_taxonomy()
,
parse_newick()
,
parse_phylo()
,
parse_qiime_biom()
,
parse_rdp()
,
parse_silva_fasta()
,
parse_tax_data()
,
parse_ubiome()
,
parse_unite_general()
## Not run: # Install phyloseq to get example data # source('http://bioconductor.org/biocLite.R') # biocLite('phyloseq') # Parse example dataset library(phyloseq) data(GlobalPatterns) x <- parse_phyloseq(GlobalPatterns) # Plot data heat_tree(x, node_size = n_obs, node_color = n_obs, node_label = taxon_names, tree_label = taxon_names) ## End(Not run)
## Not run: # Install phyloseq to get example data # source('http://bioconductor.org/biocLite.R') # biocLite('phyloseq') # Parse example dataset library(phyloseq) data(GlobalPatterns) x <- parse_phyloseq(GlobalPatterns) # Plot data heat_tree(x, node_size = n_obs, node_color = n_obs, node_label = taxon_names, tree_label = taxon_names) ## End(Not run)
Parses the output file from EMBOSS primersearch into a data.frame with rows corresponding to predicted amplicons and their associated information.
parse_primersearch(file_path)
parse_primersearch(file_path)
file_path |
The path to a primersearch output file. |
A data frame with each row corresponding to amplicon data
Parses a file in BIOM format from QIIME into a taxmap object. This also seems to work with files from MEGAN. I have not tested if it works with other BIOM files.
parse_qiime_biom(file, class_regex = "(.*)", class_key = "taxon_name")
parse_qiime_biom(file, class_regex = "(.*)", class_key = "taxon_name")
file |
( |
class_regex |
A regular expression used to parse data in the taxon
names. There must be a capture group (a pair of parentheses) for each item
in |
class_key |
('character' of length 1) The identity of the capturing groups defined using 'class_regex'. The length of 'class_key' must be equal to the number of capturing groups specified in 'class_regex'. Any names added to the terms will be used as column names in the output. At least one '"taxon_name"' must be specified. Only '"info"' can be used multiple times. Each term must be one of those described below: * 'taxon_name': The name of a taxon. Not necessarily unique, but are interpretable by a particular 'database'. Requires an internet connection. * 'taxon_rank': The rank of the taxon. This will be used to add rank info into the output object that can be accessed by 'out$taxon_ranks()'. * 'info': Arbitrary taxon info you want included in the output. Can be used more than once. |
This function was inspired by the tutorial created by Geoffrey Zahn at http://geoffreyzahn.com/getting-your-otu-table-into-r/.
A taxmap object
Other parsers:
extract_tax_data()
,
lookup_tax_data()
,
parse_dada2()
,
parse_edge_list()
,
parse_greengenes()
,
parse_mothur_tax_summary()
,
parse_mothur_taxonomy()
,
parse_newick()
,
parse_phylo()
,
parse_phyloseq()
,
parse_rdp()
,
parse_silva_fasta()
,
parse_tax_data()
,
parse_ubiome()
,
parse_unite_general()
Parses an RDP reference FASTA file.
parse_rdp(input = NULL, file = NULL, include_seqs = TRUE, add_species = FALSE)
parse_rdp(input = NULL, file = NULL, include_seqs = TRUE, add_species = FALSE)
input |
(
Either "input" or "file" must be supplied but not both. |
file |
The path to a FASTA file containing sequences to use. Either "input" or "file" must be supplied but not both. |
include_seqs |
( |
add_species |
( |
The input file has a format like:
>S000448483 Sparassis crispa; MBUH-PIRJO&ILKKA94-1587/ss5 Lineage=Root;rootrank;Fun... ggattcccctagtaactgcgagtgaagcgggaagagctcaaatttaaaatctggcggcgtcctcgtcgtccgagttgtaa tctggagaagcgacatccgcgctggaccgtgtacaagtctcttggaaaagagcgtcgtagagggtgacaatcccgtcttt ...
Other parsers:
extract_tax_data()
,
lookup_tax_data()
,
parse_dada2()
,
parse_edge_list()
,
parse_greengenes()
,
parse_mothur_tax_summary()
,
parse_mothur_taxonomy()
,
parse_newick()
,
parse_phylo()
,
parse_phyloseq()
,
parse_qiime_biom()
,
parse_silva_fasta()
,
parse_tax_data()
,
parse_ubiome()
,
parse_unite_general()
Parses an SILVA FASTA file that can be found at https://www.arb-silva.de/no_cache/download/archive/release_128/Exports/.
parse_silva_fasta(file = NULL, input = NULL, include_seqs = TRUE)
parse_silva_fasta(file = NULL, input = NULL, include_seqs = TRUE)
file |
The path to a FASTA file containing sequences to use. Either "input" or "file" must be supplied but not both. |
input |
(
Either "input" or "file" must be supplied but not both. |
include_seqs |
( |
The input file has a format like:
>GCVF01000431.1.2369 Bacteria;Proteobacteria;Gammaproteobacteria;Oceanospiril... CGUGCACGGUGGAUGCCUUGGCAGCCAGAGGCGAUGAAGGACGUUGUAGCCUGCGAUAAGCUCCGGUUAGGUGGCAAACA ACCGUUUGACCCGGAGAUCUCCGAAUGGGGCAACCCACCCGUUGUAAGGCGGGUAUCACCGACUGAAUCCAUAGGUCGGU ...
Other parsers:
extract_tax_data()
,
lookup_tax_data()
,
parse_dada2()
,
parse_edge_list()
,
parse_greengenes()
,
parse_mothur_tax_summary()
,
parse_mothur_taxonomy()
,
parse_newick()
,
parse_phylo()
,
parse_phyloseq()
,
parse_qiime_biom()
,
parse_rdp()
,
parse_tax_data()
,
parse_ubiome()
,
parse_unite_general()
Reads taxonomic information and associated data in tables, lists, and vectors and stores it in a [taxmap()] object. [Taxonomic classifications](https://en.wikipedia.org/wiki/Taxonomy_(biology)#Classifying_organisms) must be present.
parse_tax_data( tax_data, datasets = list(), class_cols = 1, class_sep = ";", sep_is_regex = FALSE, class_key = "taxon_name", class_regex = "(.*)", class_reversed = FALSE, include_match = TRUE, mappings = c(), include_tax_data = TRUE, named_by_rank = FALSE )
parse_tax_data( tax_data, datasets = list(), class_cols = 1, class_sep = ";", sep_is_regex = FALSE, class_key = "taxon_name", class_regex = "(.*)", class_reversed = FALSE, include_match = TRUE, mappings = c(), include_tax_data = TRUE, named_by_rank = FALSE )
tax_data |
A table, list, or vector that contains the names of taxa that represent [taxonomic classifications](https://en.wikipedia.org/wiki/Taxonomy_(biology)#Classifying_organisms). Accepted representations of classifications include: * A list/vector or table with column(s) of taxon names: Something like '"Animalia;Chordata;Mammalia;Primates;Hominidae;Homo"'. What separator(s) is used (";" in this example) can be changed with the 'class_sep' option. For tables, the classification can be spread over multiple columns and the separator(s) will be applied to each column, although each column could just be single taxon names with no separator. Use the 'class_cols' option to specify which columns have taxon names. * A list in which each entry is a classifications. For example, 'list(c("Animalia", "Chordata", "Mammalia", "Primates", "Hominidae", "Homo"), ...)'. * A list of data.frames where each represents a classification with one taxon per row. The column that contains taxon names is specified using the 'class_cols' option. In this instance, it only makes sense to specify a single column. |
datasets |
Additional lists/vectors/tables that should be included in the resulting 'taxmap' object. The 'mappings' option is use to specify how these data sets relate to the 'tax_data' and, by inference, what taxa apply to each item. |
class_cols |
('character' or 'integer') The names or indexes of columns that contain classifications if the first input is a table. If multiple columns are specified, they will be combined in the order given. Negative column indexes mean "every column besides these columns". |
class_sep |
('character') One or more separators that delineate taxon names in a classification. For example, if one column had '"Homo sapiens"' and another had '"Animalia;Chordata;Mammalia;Primates;Hominidae"', then 'class_sep = c(" ", ";")'. All separators are applied to each column so order does not matter. |
sep_is_regex |
('TRUE'/'FALSE') Whether or not 'class_sep' should be used as a [regular expression](https://en.wikipedia.org/wiki/Regular_expression). |
class_key |
('character' of length 1) The identity of the capturing groups defined using 'class_regex'. The length of 'class_key' must be equal to the number of capturing groups specified in 'class_regex'. Any names added to the terms will be used as column names in the output. At least one '"taxon_name"' must be specified. Only '"info"' can be used multiple times. Each term must be one of those described below: * 'taxon_name': The name of a taxon. Not necessarily unique, but are interpretable by a particular 'database'. Requires an internet connection. * 'taxon_rank': The rank of the taxon. This will be used to add rank info into the output object that can be accessed by 'out$taxon_ranks()'. * 'info': Arbitrary taxon info you want included in the output. Can be used more than once. |
class_regex |
('character' of length 1) A regular expression with capturing groups indicating the locations of data for each taxon in the 'class' term in the 'key' argument. The identity of the information must be specified using the 'class_key' argument. The 'class_sep' option can be used to split the classification into data for each taxon before matching. If 'class_sep' is 'NULL', each match of 'class_regex' defines a taxon in the classification. |
class_reversed |
If 'TRUE', then classifications go from specific to general. For example: 'Abditomys latidens : Muridae : Rodentia : Mammalia : Chordata'. |
include_match |
('logical' of length 1) If 'TRUE', include the part of the input matched by 'class_regex' in the output object. |
mappings |
(named 'character') This defines how the taxonomic information in 'tax_data' applies to data set in 'datasets'. This option should have the same number of inputs as 'datasets', with values corresponding to each data set. The names of the character vector specify what information in 'tax_data' is shared with info in each 'dataset', which is specified by the corresponding values of the character vector. If there are no shared variables, you can add 'NA' as a placeholder, but you could just leave that data out since it is not benefiting from being in the taxmap object. The names/values can be one of the following: * For tables, the names of columns can be used. * '"{{index}}"' : This means to use the index of rows/items * '"{{name}}"' : This means to use row/item names. * '"{{value}}"' : This means to use the values in vectors or lists. Lists will be converted to vectors using [unlist()]. |
include_tax_data |
('TRUE'/'FALSE') Whether or not to include 'tax_data' as a dataset, like those in 'datasets'. |
named_by_rank |
('TRUE'/'FALSE') If 'TRUE' and the input is a table with columns named by ranks or a list of vectors with each vector named by ranks, include that rank info in the output object, so it can be accessed by 'out$taxon_ranks()'. If 'TRUE', taxa with different ranks, but the same name and location in the taxonomy, will be considered different taxa. Cannot be used with the 'sep', 'class_regex', or 'class_key' options. |
Other parsers:
extract_tax_data()
,
lookup_tax_data()
,
parse_dada2()
,
parse_edge_list()
,
parse_greengenes()
,
parse_mothur_tax_summary()
,
parse_mothur_taxonomy()
,
parse_newick()
,
parse_phylo()
,
parse_phyloseq()
,
parse_qiime_biom()
,
parse_rdp()
,
parse_silva_fasta()
,
parse_ubiome()
,
parse_unite_general()
# Read a vector of classifications my_taxa <- c("Mammalia;Carnivora;Felidae", "Mammalia;Carnivora;Felidae", "Mammalia;Carnivora;Ursidae") parse_tax_data(my_taxa, class_sep = ";") # Read a list of classifications my_taxa <- list("Mammalia;Carnivora;Felidae", "Mammalia;Carnivora;Felidae", "Mammalia;Carnivora;Ursidae") parse_tax_data(my_taxa, class_sep = ";") # Read classifications in a table in a single column species_data <- data.frame(tax = c("Mammalia;Carnivora;Felidae", "Mammalia;Carnivora;Felidae", "Mammalia;Carnivora;Ursidae"), species_id = c("A", "B", "C")) parse_tax_data(species_data, class_sep = ";", class_cols = "tax") # Read classifications in a table in multiple columns species_data <- data.frame(lineage = c("Mammalia;Carnivora;Felidae", "Mammalia;Carnivora;Felidae", "Mammalia;Carnivora;Ursidae"), species = c("Panthera leo", "Panthera tigris", "Ursus americanus"), species_id = c("A", "B", "C")) parse_tax_data(species_data, class_sep = c(" ", ";"), class_cols = c("lineage", "species")) # Read classification tables with one column per rank species_data <- data.frame(class = c("Mammalia", "Mammalia", "Mammalia"), order = c("Carnivora", "Carnivora", "Carnivora"), family = c("Felidae", "Felidae", "Ursidae"), genus = c("Panthera", "Panthera", "Ursus"), species = c("leo", "tigris", "americanus"), species_id = c("A", "B", "C")) parse_tax_data(species_data, class_cols = 1:5) parse_tax_data(species_data, class_cols = 1:5, named_by_rank = TRUE) # makes `taxon_ranks()` work # Classifications with extra information my_taxa <- c("Mammalia_class_1;Carnivora_order_2;Felidae_genus_3", "Mammalia_class_1;Carnivora_order_2;Felidae_genus_3", "Mammalia_class_1;Carnivora_order_2;Ursidae_genus_3") parse_tax_data(my_taxa, class_sep = ";", class_regex = "(.+)_(.+)_([0-9]+)", class_key = c(my_name = "taxon_name", a_rank = "taxon_rank", some_num = "info")) # --- Parsing multiple datasets at once (advanced) --- # The rest is one example for how to classify multiple datasets at once. # Make example data with taxonomic classifications species_data <- data.frame(tax = c("Mammalia;Carnivora;Felidae", "Mammalia;Carnivora;Felidae", "Mammalia;Carnivora;Ursidae"), species = c("Panthera leo", "Panthera tigris", "Ursus americanus"), species_id = c("A", "B", "C")) # Make example data associated with the taxonomic data # Note how this does not contain classifications, but # does have a varaible in common with "species_data" ("id" = "species_id") abundance <- data.frame(id = c("A", "B", "C", "A", "B", "C"), sample_id = c(1, 1, 1, 2, 2, 2), counts = c(23, 4, 3, 34, 5, 13)) # Make another related data set named by species id common_names <- c(A = "Lion", B = "Tiger", C = "Bear", "Oh my!") # Make another related data set with no names foods <- list(c("ungulates", "boar"), c("ungulates", "boar"), c("salmon", "fruit", "nuts")) # Make a taxmap object with these three datasets x = parse_tax_data(species_data, datasets = list(counts = abundance, my_names = common_names, foods = foods), mappings = c("species_id" = "id", "species_id" = "{{name}}", "{{index}}" = "{{index}}"), class_cols = c("tax", "species"), class_sep = c(" ", ";")) # Note how all the datasets have taxon ids now x$data # This allows for complex mappings between variables that other functions use map_data(x, my_names, foods) map_data(x, counts, my_names)
# Read a vector of classifications my_taxa <- c("Mammalia;Carnivora;Felidae", "Mammalia;Carnivora;Felidae", "Mammalia;Carnivora;Ursidae") parse_tax_data(my_taxa, class_sep = ";") # Read a list of classifications my_taxa <- list("Mammalia;Carnivora;Felidae", "Mammalia;Carnivora;Felidae", "Mammalia;Carnivora;Ursidae") parse_tax_data(my_taxa, class_sep = ";") # Read classifications in a table in a single column species_data <- data.frame(tax = c("Mammalia;Carnivora;Felidae", "Mammalia;Carnivora;Felidae", "Mammalia;Carnivora;Ursidae"), species_id = c("A", "B", "C")) parse_tax_data(species_data, class_sep = ";", class_cols = "tax") # Read classifications in a table in multiple columns species_data <- data.frame(lineage = c("Mammalia;Carnivora;Felidae", "Mammalia;Carnivora;Felidae", "Mammalia;Carnivora;Ursidae"), species = c("Panthera leo", "Panthera tigris", "Ursus americanus"), species_id = c("A", "B", "C")) parse_tax_data(species_data, class_sep = c(" ", ";"), class_cols = c("lineage", "species")) # Read classification tables with one column per rank species_data <- data.frame(class = c("Mammalia", "Mammalia", "Mammalia"), order = c("Carnivora", "Carnivora", "Carnivora"), family = c("Felidae", "Felidae", "Ursidae"), genus = c("Panthera", "Panthera", "Ursus"), species = c("leo", "tigris", "americanus"), species_id = c("A", "B", "C")) parse_tax_data(species_data, class_cols = 1:5) parse_tax_data(species_data, class_cols = 1:5, named_by_rank = TRUE) # makes `taxon_ranks()` work # Classifications with extra information my_taxa <- c("Mammalia_class_1;Carnivora_order_2;Felidae_genus_3", "Mammalia_class_1;Carnivora_order_2;Felidae_genus_3", "Mammalia_class_1;Carnivora_order_2;Ursidae_genus_3") parse_tax_data(my_taxa, class_sep = ";", class_regex = "(.+)_(.+)_([0-9]+)", class_key = c(my_name = "taxon_name", a_rank = "taxon_rank", some_num = "info")) # --- Parsing multiple datasets at once (advanced) --- # The rest is one example for how to classify multiple datasets at once. # Make example data with taxonomic classifications species_data <- data.frame(tax = c("Mammalia;Carnivora;Felidae", "Mammalia;Carnivora;Felidae", "Mammalia;Carnivora;Ursidae"), species = c("Panthera leo", "Panthera tigris", "Ursus americanus"), species_id = c("A", "B", "C")) # Make example data associated with the taxonomic data # Note how this does not contain classifications, but # does have a varaible in common with "species_data" ("id" = "species_id") abundance <- data.frame(id = c("A", "B", "C", "A", "B", "C"), sample_id = c(1, 1, 1, 2, 2, 2), counts = c(23, 4, 3, 34, 5, 13)) # Make another related data set named by species id common_names <- c(A = "Lion", B = "Tiger", C = "Bear", "Oh my!") # Make another related data set with no names foods <- list(c("ungulates", "boar"), c("ungulates", "boar"), c("salmon", "fruit", "nuts")) # Make a taxmap object with these three datasets x = parse_tax_data(species_data, datasets = list(counts = abundance, my_names = common_names, foods = foods), mappings = c("species_id" = "id", "species_id" = "{{name}}", "{{index}}" = "{{index}}"), class_cols = c("tax", "species"), class_sep = c(" ", ";")) # Note how all the datasets have taxon ids now x$data # This allows for complex mappings between variables that other functions use map_data(x, my_names, foods) map_data(x, counts, my_names)
Converts the uBiome file format to taxmap. NOTE: This is experimental and might not work if uBiome changes their format. Contact the maintainers if you encounter problems/
parse_ubiome(file = NULL, table = NULL)
parse_ubiome(file = NULL, table = NULL)
file |
( |
table |
( |
The input file has a format like:
tax_name,tax_rank,count,count_norm,taxon,parent root,root,29393,1011911,1, Bacteria,superkingdom,29047,1000000,2,131567 Campylobacter,genus,23,791,194,72294 Flavobacterium,genus,264,9088,237,49546
Other parsers:
extract_tax_data()
,
lookup_tax_data()
,
parse_dada2()
,
parse_edge_list()
,
parse_greengenes()
,
parse_mothur_tax_summary()
,
parse_mothur_taxonomy()
,
parse_newick()
,
parse_phylo()
,
parse_phyloseq()
,
parse_qiime_biom()
,
parse_rdp()
,
parse_silva_fasta()
,
parse_tax_data()
,
parse_unite_general()
Parse the UNITE general release FASTA file
parse_unite_general(input = NULL, file = NULL, include_seqs = TRUE)
parse_unite_general(input = NULL, file = NULL, include_seqs = TRUE)
input |
(
Either "input" or "file" must be supplied but not both. |
file |
The path to a FASTA file containing sequences to use. Either "input" or "file" must be supplied but not both. |
include_seqs |
( |
The input file has a format like:
>Glomeromycota_sp|KJ484724|SH523877.07FU|reps|k__Fungi;p__Glomeromycota;c__unid... ATAATTTGCCGAACCTAGCGTTAGCGCGAGGTTCTGCGATCAACACTTATATTTAAAACCCAACTCTTAAATTTTGTAT...
Other parsers:
extract_tax_data()
,
lookup_tax_data()
,
parse_dada2()
,
parse_edge_list()
,
parse_greengenes()
,
parse_mothur_tax_summary()
,
parse_mothur_taxonomy()
,
parse_newick()
,
parse_phylo()
,
parse_phyloseq()
,
parse_qiime_biom()
,
parse_rdp()
,
parse_silva_fasta()
,
parse_tax_data()
,
parse_ubiome()
A pair of primers are aligned against a set of sequences. A
taxmap
object with two tables is returned: a table with
information for each predicted amplicon, quality of match, and predicted
amplicons, and a table with per-taxon amplification statistics. Requires the
EMBOSS tool kit (https://emboss.sourceforge.net/) to be installed.
primersearch(obj, seqs, forward, reverse, mismatch = 5, clone = TRUE)
primersearch(obj, seqs, forward, reverse, mismatch = 5, clone = TRUE)
obj |
A |
seqs |
The sequences to do in silico PCR on. This can be any variable in
|
forward |
( |
reverse |
( |
mismatch |
An integer vector of length 1. The percentage of mismatches allowed. |
clone |
If |
It can be confusing how the primer sequence relates to the binding sites on a reference database sequence. A simplified diagram can help. For example, if the top strand below (5' -> 3') is the database sequence, the forward primer has the same sequence as the target region, since it will bind to the other strand (3' -> 5') during PCR and extend on the 3' end. However, the reverse primer must bind to the database strand, so it will have to be the complement of the reference sequence. It also has to be reversed to make it in the standard 5' -> 3' orientation. Therefore, the reverse primer must be the reverse complement of its binding site on the reference sequence.
Primer 1: 5' AAGTACCTTAACGGAATTATAG 3' Primer 2: 5' GCTCCACCTACGAAACGAAT 3' <- TAAGCAAAGCATCCACCTCG 5' 5' ...AAGTACCTTAACGGAATTATAG......ATTCGTTTCGTAGGTGGAGC... 3' 3' ...TTCATGGAATTGCCTTAATATC......TAAGCAAAGCATCCACCTCG... 5' 5' AAGTACCTTAACGGAATTATAG ->
However, a database might have either the top or the bottom strand as a
reference sequence. Since one implies the sequence of the other, either is
valid, but this is another source of confusion. If we take the diagram above
and rotate it 180 degrees, it would mean the same thing, but which primer we would
want to call "forward" and which we would want to call "reverse" would
change. Databases of a single locus (e.g. Greengenes) will likely have a
convention for which strand will be present, so relative to this convention,
there is a distinct "forward" and "reverse". However, computers dont know
about this convention, so the "forward" primer is whichever primer has the
same sequence as its binding region in the database (as opposed to the
reverse complement). For this reason, primersearch will redefine which primer
is "forward" and which is "reverse" based on how it binds the reference
sequence. See the example code in primersearch_raw
for a
demonstration of this.
A copy of the input taxmap
object with two tables added. One table contains amplicon information with one row per predicted amplicon with the following info:
(f_primer) 5' AAGTACCTTAACGGAATTATAG -> (r_primer) <- TAAGCAAAGCATCCACCTCG 5' 5' ...AAGTACCTTAACGGAATTATAG......ATTCGTTTCGTAGGTGGAGC... 3' ^ ^ ^ ^ f_start f_end r_rtart r_end |--------------------||----||------------------| f_match amplicon r_match |----------------------------------------------| product
The taxon IDs for the sequence.
The index of the input sequence.
The sequence of the forward primer.
The sequence of the reverse primer.
The number of mismatches on the forward primer.
The number of mismatches on the reverse primer.
The start location of the forward primer.
The end location of the forward primer.
The start location of the reverse primer.
The end location of the reverse primer.
The sequence matched by the forward primer.
The sequence matched by the reverse primer.
The sequence amplified by the primers, not including the primers.
The sequence amplified by the primers including the primers. This simulates a real PCR product.
The other table contains per-taxon information about the PCR, with one row per taxon. It has the following columns:
Taxon IDs.
The number of sequences used as input.
The number of sequences that had at least one amplicon.
The number of amplicons. Might be more than one per sequence.
If at least one sequence of that taxon had at least one amplicon.
If at least one sequences had at least two amplicons.
The proportion of sequences with at least one amplicon.
The median amplicon length.
The minimum amplicon length.
The maximum amplicon length.
The median product length.
The minimum product length.
The maximum product length.
The command-line tool "primersearch" from the EMBOSS tool kit is needed to use this function. How you install EMBOSS will depend on your operating system:
Linux:
Open up a terminal and type:
sudo apt-get install emboss
Mac OSX:
The easiest way to install EMBOSS on OSX is to use homebrew. After installing homebrew, open up a terminal and type:
brew install homebrew/science/emboss
Windows:
There is an installer for Windows here:
ftp://emboss.open-bio.org/pub/EMBOSS/windows/mEMBOSS-6.5.0.0-setup.exe
## Not run: # Get example FASTA file fasta_path <- system.file(file.path("extdata", "silva_subset.fa"), package = "metacoder") # Parse the FASTA file as a taxmap object obj <- parse_silva_fasta(file = fasta_path) # Simulate PCR with primersearch # Have to replace Us with Ts in sequences since primersearch # does not understand Us. obj <- primersearch(obj, gsub(silva_seq, pattern = "U", replace = "T"), forward = c("U519F" = "CAGYMGCCRCGGKAAHACC"), reverse = c("Arch806R" = "GGACTACNSGGGTMTCTAAT"), mismatch = 10) # Plot what did not ampilify obj %>% filter_taxa(prop_amplified < 1) %>% heat_tree(node_label = taxon_names, node_color = prop_amplified, node_color_range = c("grey", "red", "purple", "green"), node_color_trans = "linear", node_color_axis_label = "Proportion amplified", node_size = n_obs, node_size_axis_label = "Number of sequences", layout = "da", initial_layout = "re") ## End(Not run)
## Not run: # Get example FASTA file fasta_path <- system.file(file.path("extdata", "silva_subset.fa"), package = "metacoder") # Parse the FASTA file as a taxmap object obj <- parse_silva_fasta(file = fasta_path) # Simulate PCR with primersearch # Have to replace Us with Ts in sequences since primersearch # does not understand Us. obj <- primersearch(obj, gsub(silva_seq, pattern = "U", replace = "T"), forward = c("U519F" = "CAGYMGCCRCGGKAAHACC"), reverse = c("Arch806R" = "GGACTACNSGGGTMTCTAAT"), mismatch = 10) # Plot what did not ampilify obj %>% filter_taxa(prop_amplified < 1) %>% heat_tree(node_label = taxon_names, node_color = prop_amplified, node_color_range = c("grey", "red", "purple", "green"), node_color_trans = "linear", node_color_axis_label = "Proportion amplified", node_size = n_obs, node_size_axis_label = "Number of sequences", layout = "da", initial_layout = "re") ## End(Not run)
A pair of primers are aligned against a set of sequences. The location of the best hits, quality of match, and predicted amplicons are returned. Requires the EMBOSS tool kit (https://emboss.sourceforge.net/) to be installed.
primersearch_raw(input = NULL, file = NULL, forward, reverse, mismatch = 5)
primersearch_raw(input = NULL, file = NULL, forward, reverse, mismatch = 5)
input |
(
Either "input" or "file" must be supplied but not both. |
file |
The path to a FASTA file containing sequences to use. Either "input" or "file" must be supplied but not both. |
forward |
( |
reverse |
( |
mismatch |
An integer vector of length 1. The percentage of mismatches allowed. |
It can be confusing how the primer sequence relates to the binding sites on a reference database sequence. A simplified diagram can help. For example, if the top strand below (5' -> 3') is the database sequence, the forward primer has the same sequence as the target region, since it will bind to the other strand (3' -> 5') during PCR and extend on the 3' end. However, the reverse primer must bind to the database strand, so it will have to be the complement of the reference sequence. It also has to be reversed to make it in the standard 5' -> 3' orientation. Therefore, the reverse primer must be the reverse complement of its binding site on the reference sequence.
Primer 1: 5' AAGTACCTTAACGGAATTATAG 3' Primer 2: 5' GCTCCACCTACGAAACGAAT 3' <- TAAGCAAAGCATCCACCTCG 5' 5' ...AAGTACCTTAACGGAATTATAG......ATTCGTTTCGTAGGTGGAGC... 3' 3' ...TTCATGGAATTGCCTTAATATC......TAAGCAAAGCATCCACCTCG... 5' 5' AAGTACCTTAACGGAATTATAG ->
However, a database might have either the top or the bottom strand as a reference sequence. Since one implies the sequence of the other, either is valid, but this is another source of confusion. If we take the diagram above and rotate it 180 degrees, it would mean the same thing, but which primer we would want to call "forward" and which we would want to call "reverse" would change. Databases of a single locus (e.g. Greengenes) will likely have a convention for which strand will be present, so relative to this convention, there is a distinct "forward" and "reverse". However, computers dont know about this convention, so the "forward" primer is whichever primer has the same sequence as its binding region in the database (as opposed to the reverse complement). For this reason, primersearch will redefine which primer is "forward" and which is "reverse" based on how it binds the reference sequence. See the example code for a demonstration of this.
A table with one row per predicted amplicon with the following info:
(f_primer) 5' AAGTACCTTAACGGAATTATAG -> (r_primer) <- TAAGCAAAGCATCCACCTCG 5' 5' ...AAGTACCTTAACGGAATTATAG......ATTCGTTTCGTAGGTGGAGC... 3' ^ ^ ^ ^ f_start f_end r_rtart r_end |--------------------||----||------------------| f_match amplicon r_match |----------------------------------------------| product f_mismatch: The number of mismatches on the forward primer r_mismatch: The number of mismatches on the reverse primer input: The index of the input sequence
The command-line tool "primersearch" from the EMBOSS tool kit is needed to use this function. How you install EMBOSS will depend on your operating system:
Linux:
Open up a terminal and type:
sudo apt-get install emboss
Mac OSX:
The easiest way to install EMBOSS on OSX is to use homebrew. After installing homebrew, open up a terminal and type:
brew install homebrew/science/emboss
Windows:
There is an installer for Windows here:
ftp://emboss.open-bio.org/pub/EMBOSS/windows/mEMBOSS-6.5.0.0-setup.exe
## Not run: ### Dummy test data set ### primer_1_site <- "AAGTACCTTAACGGAATTATAG" primer_2_site <- "ATTCGTTTCGTAGGTGGAGC" amplicon <- "NNNAGTGGATAGATAGGGGTTCTGTGGCGTTTGGGAATTAAAGATTAGAGANNN" seq_1 <- paste0("AA", primer_1_site, amplicon, primer_2_site, "AAAA") seq_2 <- rev_comp(seq_1) f_primer <- "ACGTACCTTAACGGAATTATAG" # Note the "C" mismatch at position 2 r_primer <- rev_comp(primer_2_site) seqs <- c(a = seq_1, b = seq_2) result <- primersearch_raw(seqs, forward = f_primer, reverse = r_primer) ### Real data set ### # Get example FASTA file fasta_path <- system.file(file.path("extdata", "silva_subset.fa"), package = "metacoder") # Parse the FASTA file as a taxmap object obj <- parse_silva_fasta(file = fasta_path) # Simulate PCR with primersearch pcr_result <- primersearch_raw(obj$data$tax_data$silva_seq, forward = c("U519F" = "CAGYMGCCRCGGKAAHACC"), reverse = c("Arch806R" = "GGACTACNSGGGTMTCTAAT"), mismatch = 10) # Add result to input table # NOTE: We want to add a function to handle running pcr on a # taxmap object directly, but we are still trying to figure out # the best way to implement it. For now, do the following: obj$data$pcr <- pcr_result obj$data$pcr$taxon_id <- obj$data$tax_data$taxon_id[pcr_result$input] # Visualize which taxa were amplified # This work because only amplicons are returned by `primersearch` n_amplified <- unlist(obj$obs_apply("pcr", function(x) length(unique(x)), value = "input")) prop_amped <- n_amplified / obj$n_obs() heat_tree(obj, node_label = taxon_names, node_color = prop_amped, node_color_range = c("grey", "red", "purple", "green"), node_color_trans = "linear", node_color_axis_label = "Proportion amplified", node_size = n_obs, node_size_axis_label = "Number of sequences", layout = "da", initial_layout = "re") ## End(Not run)
## Not run: ### Dummy test data set ### primer_1_site <- "AAGTACCTTAACGGAATTATAG" primer_2_site <- "ATTCGTTTCGTAGGTGGAGC" amplicon <- "NNNAGTGGATAGATAGGGGTTCTGTGGCGTTTGGGAATTAAAGATTAGAGANNN" seq_1 <- paste0("AA", primer_1_site, amplicon, primer_2_site, "AAAA") seq_2 <- rev_comp(seq_1) f_primer <- "ACGTACCTTAACGGAATTATAG" # Note the "C" mismatch at position 2 r_primer <- rev_comp(primer_2_site) seqs <- c(a = seq_1, b = seq_2) result <- primersearch_raw(seqs, forward = f_primer, reverse = r_primer) ### Real data set ### # Get example FASTA file fasta_path <- system.file(file.path("extdata", "silva_subset.fa"), package = "metacoder") # Parse the FASTA file as a taxmap object obj <- parse_silva_fasta(file = fasta_path) # Simulate PCR with primersearch pcr_result <- primersearch_raw(obj$data$tax_data$silva_seq, forward = c("U519F" = "CAGYMGCCRCGGKAAHACC"), reverse = c("Arch806R" = "GGACTACNSGGGTMTCTAAT"), mismatch = 10) # Add result to input table # NOTE: We want to add a function to handle running pcr on a # taxmap object directly, but we are still trying to figure out # the best way to implement it. For now, do the following: obj$data$pcr <- pcr_result obj$data$pcr$taxon_id <- obj$data$tax_data$taxon_id[pcr_result$input] # Visualize which taxa were amplified # This work because only amplicons are returned by `primersearch` n_amplified <- unlist(obj$obs_apply("pcr", function(x) length(unique(x)), value = "input")) prop_amped <- n_amplified / obj$n_obs() heat_tree(obj, node_label = taxon_names, node_color = prop_amped, node_color_range = c("grey", "red", "purple", "green"), node_color_trans = "linear", node_color_axis_label = "Proportion amplified", node_size = n_obs, node_size_axis_label = "Number of sequences", layout = "da", initial_layout = "re") ## End(Not run)
Print a text-based tree of a [taxonomy()] or [taxmap()] object.
obj |
A |
value |
What data to return. Default is taxon names. Any result of [all_names()] can be used, but it usually only makes sense to use data with one value per taxon, like taxon names. |
print_tree(ex_taxmap)
print_tree(ex_taxmap)
Returns the default color palette for qualitative data
qualitative_palette()
qualitative_palette()
character
of hex color codes
qualitative_palette()
qualitative_palette()
Returns the default color palette for quantative data.
quantative_palette()
quantative_palette()
character
of hex color codes
quantative_palette()
quantative_palette()
Composed of two columns:
rankid - the ordered identifier value. lower values mean higher rank
ranks - all the rank names that belong to the same level, with different variants that mean essentially the same thing
For a given table in a taxmap
object, rarefy counts to a constant total. This
is a wrapper around rrarefy
that automatically detects
which columns are numeric and handles the reformatting needed to use tibbles.
rarefy_obs( obj, data, sample_size = NULL, cols = NULL, other_cols = FALSE, out_names = NULL, dataset = NULL )
rarefy_obs( obj, data, sample_size = NULL, cols = NULL, other_cols = FALSE, out_names = NULL, dataset = NULL )
obj |
A |
data |
The name of a table in |
sample_size |
The sample size counts will be rarefied to. This can be either a single integer or a vector of integers of equal length to the number of columns. |
cols |
The columns in
|
other_cols |
Preserve in the output non-target columns present in the input data. New columns will always be on the end. The "taxon_id" column will be preserved in the front. Takes one of the following inputs:
|
out_names |
The names of count columns in the output. Must be the same
length and order as |
dataset |
DEPRECIATED. use "data" instead. |
A tibble
Other calculations:
calc_diff_abund_deseq2()
,
calc_group_mean()
,
calc_group_median()
,
calc_group_rsd()
,
calc_group_stat()
,
calc_n_samples()
,
calc_obs_props()
,
calc_prop_samples()
,
calc_taxon_abund()
,
compare_groups()
,
counts_to_presence()
,
zero_low_counts()
## Not run: # Parse data for examples x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$") # Rarefy all numeric columns rarefy_obs(x, "tax_data") # Rarefy a subset of columns rarefy_obs(x, "tax_data", cols = c("700035949", "700097855", "700100489")) rarefy_obs(x, "tax_data", cols = 4:6) rarefy_obs(x, "tax_data", cols = startsWith(colnames(x$data$tax_data), "70001")) # Including all other columns in ouput rarefy_obs(x, "tax_data", other_cols = TRUE) # Inlcuding specific columns in output rarefy_obs(x, "tax_data", cols = c("700035949", "700097855", "700100489"), other_cols = 2:3) # Rename output columns rarefy_obs(x, "tax_data", cols = c("700035949", "700097855", "700100489"), out_names = c("a", "b", "c")) ## End(Not run)
## Not run: # Parse data for examples x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$") # Rarefy all numeric columns rarefy_obs(x, "tax_data") # Rarefy a subset of columns rarefy_obs(x, "tax_data", cols = c("700035949", "700097855", "700100489")) rarefy_obs(x, "tax_data", cols = 4:6) rarefy_obs(x, "tax_data", cols = startsWith(colnames(x$data$tax_data), "70001")) # Including all other columns in ouput rarefy_obs(x, "tax_data", other_cols = TRUE) # Inlcuding specific columns in output rarefy_obs(x, "tax_data", cols = c("700035949", "700097855", "700100489"), other_cols = 2:3) # Rename output columns rarefy_obs(x, "tax_data", cols = c("700035949", "700097855", "700100489"), out_names = c("a", "b", "c")) ## End(Not run)
Reads a FASTA file. This is the FASTA parser for metacoder. It simply tries
to read a FASTA file into a named character vector with minimal fuss. It does
not do any checks for valid characters etc. Other FASTA parsers you might
want to consider include read.FASTA
or
read.fasta
.
read_fasta(file_path)
read_fasta(file_path)
file_path |
( |
named character
vector
# Get example FASTA file fasta_path <- system.file(file.path("extdata", "silva_subset.fa"), package = "metacoder") # Read fasta file my_seqs <- read_fasta(fasta_path)
# Get example FASTA file fasta_path <- system.file(file.path("extdata", "silva_subset.fa"), package = "metacoder") # Read fasta file my_seqs <- read_fasta(fasta_path)
Remove the names of parent taxa in the beginning of their children's names in a taxonomy
or taxmap
object.
This is useful for removing genus names in species binomials.
obj$remove_redundant_names() remove_redundant_names(obj)
obj |
A |
A taxonomy
or taxmap
object
# Remove genus named from species taxa species_data <- c("Carnivora;Felidae;Panthera;Panthera leo", "Carnivora;Felidae;Panthera;Panthera tigris", "Carnivora;Ursidae;Ursus;Ursus americanus") obj <- parse_tax_data(species_data, class_sep = ";") remove_redundant_names(obj)
# Remove genus named from species taxa species_data <- c("Carnivora;Felidae;Panthera;Panthera leo", "Carnivora;Felidae;Panthera;Panthera tigris", "Carnivora;Ursidae;Ursus;Ursus americanus") obj <- parse_tax_data(species_data, class_sep = ";") remove_redundant_names(obj)
Replace taxon ids in a [taxmap()] or [taxonomy()] object.
obj$replace_taxon_ids(new_ids) replace_taxon_ids(obj, new_ids)
obj |
The [taxonomy()] or [taxmap()] object. |
new_ids |
A vector of new ids, one per taxon. They must be unique and in the same order as the corresponding ids in 'obj$taxon_ids()'. |
A [taxonomy()] or [taxmap()] object with new taxon ids
# Replace taxon IDs with numbers replace_taxon_ids(ex_taxmap, seq_len(length(ex_taxmap$taxa))) # Make taxon IDs capital letters replace_taxon_ids(ex_taxmap, toupper(taxon_ids(ex_taxmap)))
# Replace taxon IDs with numbers replace_taxon_ids(ex_taxmap, seq_len(length(ex_taxmap$taxa))) # Make taxon IDs capital letters replace_taxon_ids(ex_taxmap, toupper(taxon_ids(ex_taxmap)))
Make the reverse complement of one or more sequences stored as a character
vector. This is a wrapper for comp
for character
vectors instead of lists of character vectors with one value per letter.
IUPAC ambiguity codes are handled and the upper/lower case is preserved.
rev_comp(seqs)
rev_comp(seqs)
seqs |
A character vector with one element per sequence. |
Other sequence transformations:
complement()
,
reverse()
rev_comp(c("aagtgGGTGaa", "AAGTGGT"))
rev_comp(c("aagtgGGTGaa", "AAGTGGT"))
Find the reverse of one or more sequences stored as a character
vector. This is a wrapper for rev
for character
vectors instead of lists of character vectors with one value per letter.
reverse(seqs)
reverse(seqs)
seqs |
A character vector with one element per sequence. |
Other sequence transformations:
complement()
,
rev_comp()
reverse(c("aagtgGGTGaa", "AAGTGGT"))
reverse(c("aagtgGGTGaa", "AAGTGGT"))
Return the root taxa for a [taxonomy()] or [taxmap()] object. Can also be used to get the roots of a subset of taxa.
obj$roots(subset = NULL, value = "taxon_indexes") roots(obj, subset = NULL, value = "taxon_indexes")
obj |
The [taxonomy()] or [taxmap()] object containing taxon information to be queried. |
subset |
Taxon IDs, TRUE/FALSE vector, or taxon indexes to find roots for. Default: All taxa in 'obj' will be used. Any variable name that appears in [all_names()] can be used as if it was a vector on its own. |
value |
What data to return. This is usually the name of column in a table in 'obj$data'. Any result of 'all_names(obj)' can be used, but it usually only makes sense to data that corresponds to taxa 1:1, such as [taxon_ranks()]. By default, taxon indexes are returned. |
'character'
Other taxonomy indexing functions:
branches()
,
internodes()
,
leaves()
,
stems()
,
subtaxa()
,
supertaxa()
# Return indexes of root taxa roots(ex_taxmap) # Return indexes for a subset of taxa roots(ex_taxmap, subset = 2:17) # Return something besides taxon indexes roots(ex_taxmap, value = "taxon_names")
# Return indexes of root taxa roots(ex_taxmap) # Return indexes for a subset of taxa roots(ex_taxmap, subset = 2:17) # Return something besides taxon indexes roots(ex_taxmap, value = "taxon_names")
Randomly sample some proportion of observations from a [taxmap()] object. Weights can be specified for observations or their taxa. See [dplyr::sample_frac()] for the inspiration for this function. Calling the function using the 'obj$sample_frac_obs(...)' style edits "obj" in place, unlike most R functions. However, calling the function using the 'sample_frac_obs(obj, ...)‘ imitates R’s traditional copy-on-modify semantics, so "obj" would not be changed; instead a changed version would be returned, like most R functions.
obj$sample_frac_obs(data, size, replace = FALSE, taxon_weight = NULL, obs_weight = NULL, use_supertaxa = TRUE, collapse_func = mean, ...) sample_frac_obs(obj, data, size, replace = FALSE, taxon_weight = NULL, obs_weight = NULL, use_supertaxa = TRUE, collapse_func = mean, ...)
obj |
([taxmap()]) The object to sample from. |
data |
Dataset names, indexes, or a logical vector that indicates which datasets in 'obj$data' to sample. If multiple datasets are sample at once, then they must be the same length. |
size |
('numeric' of length 1) The proportion of observations to sample. |
replace |
('logical' of length 1) If 'TRUE', sample with replacement. |
taxon_weight |
('numeric') Non-negative sampling weights of each taxon. If 'use_supertaxa' is 'TRUE', the weights for each taxon in an observation's classification are supplied to 'collapse_func' to get the observation weight. If 'obs_weight' is also specified, the two weights are multiplied (after 'taxon_weight' for each observation is calculated). |
obs_weight |
('numeric') Sampling weights of each observation. If 'taxon_weight' is also specified, the two weights are multiplied (after 'taxon_weight' for each observation is calculated). |
use_supertaxa |
('logical' or 'numeric' of length 1) Affects how the 'taxon_weight' is used. If 'TRUE', the weights for each taxon in an observation's classification are multiplied to get the observation weight. If 'FALSE' just the taxonomic level the observation is assign to it considered. Positive numbers indicate the number of ranks above the each taxon to use. '0' is equivalent to 'FALSE'. Negative numbers are equivalent to 'TRUE'. |
collapse_func |
('function' of length 1) If 'taxon_weight' option is used and 'supertaxa' is 'TRUE', the weights for each taxon in an observation's classification are supplied to 'collapse_func' to get the observation weight. This function should take numeric vector and return a single number. |
... |
Additional options are passed to [filter_obs()]. |
target |
DEPRECIATED. use "data" instead. |
An object of type [taxmap()]
Other taxmap manipulation functions:
arrange_obs()
,
arrange_taxa()
,
filter_obs()
,
filter_taxa()
,
mutate_obs()
,
sample_frac_taxa()
,
sample_n_obs()
,
sample_n_taxa()
,
select_obs()
,
transmute_obs()
# Sample half of the rows fram a table sample_frac_obs(ex_taxmap, "info", 0.5) # Sample multiple datasets at once sample_frac_obs(ex_taxmap, c("info", "phylopic_ids", "foods"), 0.5)
# Sample half of the rows fram a table sample_frac_obs(ex_taxmap, "info", 0.5) # Sample multiple datasets at once sample_frac_obs(ex_taxmap, c("info", "phylopic_ids", "foods"), 0.5)
Randomly sample some proportion of taxa from a [taxonomy()] or [taxmap()] object. Weights can be specified for taxa or the observations assigned to them. See [dplyr::sample_frac()] for the inspiration for this function.
obj$sample_frac_taxa(size, taxon_weight = NULL, obs_weight = NULL, obs_target = NULL, use_subtaxa = TRUE, collapse_func = mean, ...) sample_frac_taxa(obj, size, taxon_weight = NULL, obs_weight = NULL, obs_target = NULL, use_subtaxa = TRUE, collapse_func = mean, ...)
obj |
([taxonomy()] or [taxmap()]) The object to sample from. |
size |
('numeric' of length 1) The proportion of taxa to sample. |
taxon_weight |
('numeric') Non-negative sampling weights of each taxon. If 'obs_weight' is also specified, the two weights are multiplied (after 'obs_weight' for each taxon is calculated). |
obs_weight |
('numeric') This option only applies to [taxmap()] objects. Sampling weights of each observation. The weights for each observation assigned to a given taxon are supplied to 'collapse_func' to get the taxon weight. If 'use_subtaxa' is 'TRUE' then the observations assigned to every subtaxa are also used. Any variable name that appears in [all_names()] can be used as if it was a vector on its own. If 'taxon_weight' is also specified, the two weights are multiplied (after 'obs_weight' for each observation is calculated). 'obs_target' must be used with this option. |
obs_target |
('character' of length 1) This option only applies to [taxmap()] objects. The name of the data set in 'obj$data' that values in 'obs_weight' corresponds to. Must be used when 'obs_weight' is used. |
use_subtaxa |
('logical' or 'numeric' of length 1) Affects how the 'obs_weight' option is used. If 'TRUE', the weights for each taxon in an observation's classification are multiplied to get the observation weight. If 'TRUE' just the taxonomic level the observation is assign to it considered. Positive numbers indicate the number of ranks below the target taxa to return. '0' is equivalent to 'FALSE'. Negative numbers are equivalent to 'TRUE'. |
collapse_func |
('function' of length 1) If 'taxon_weight' is used and 'supertaxa' is 'TRUE', the weights for each taxon in an observation's classification are supplied to 'collapse_func' to get the observation weight. This function should take numeric vector and return a single number. |
... |
Additional options are passed to [filter_taxa()]. |
An object of type [taxonomy()] or [taxmap()]
Other taxmap manipulation functions:
arrange_obs()
,
arrange_taxa()
,
filter_obs()
,
filter_taxa()
,
mutate_obs()
,
sample_frac_obs()
,
sample_n_obs()
,
sample_n_taxa()
,
select_obs()
,
transmute_obs()
# sample half of the taxa sample_frac_taxa(ex_taxmap, 0.5, supertaxa = TRUE)
# sample half of the taxa sample_frac_taxa(ex_taxmap, 0.5, supertaxa = TRUE)
Randomly sample some number of observations from a [taxmap()] object. Weights can be specified for observations or the taxa they are classified by. Any variable name that appears in [all_names()] can be used as if it was a vector on its own. See [dplyr::sample_n()] for the inspiration for this function. Calling the function using the 'obj$sample_n_obs(...)' style edits "obj" in place, unlike most R functions. However, calling the function using the ‘sample_n_obs(obj, ...)' imitates R’s traditional copy-on-modify semantics, so "obj" would not be changed; instead a changed version would be returned, like most R functions.
obj$sample_n_obs(data, size, replace = FALSE, taxon_weight = NULL, obs_weight = NULL, use_supertaxa = TRUE, collapse_func = mean, ...) sample_n_obs(obj, data, size, replace = FALSE, taxon_weight = NULL, obs_weight = NULL, use_supertaxa = TRUE, collapse_func = mean, ...)
obj |
([taxmap()]) The object to sample from. |
data |
Dataset names, indexes, or a logical vector that indicates which datasets in 'obj$data' to sample. If multiple datasets are sampled at once, then they must be the same length. |
size |
('numeric' of length 1) The number of observations to sample. |
replace |
('logical' of length 1) If 'TRUE', sample with replacement. |
taxon_weight |
('numeric') Non-negative sampling weights of each taxon. If 'use_supertaxa' is 'TRUE', the weights for each taxon in an observation's classification are supplied to 'collapse_func' to get the observation weight. If 'obs_weight' is also specified, the two weights are multiplied (after 'taxon_weight' for each observation is calculated). |
obs_weight |
('numeric') Sampling weights of each observation. If 'taxon_weight' is also specified, the two weights are multiplied (after 'taxon_weight' for each observation is calculated). |
use_supertaxa |
('logical' or 'numeric' of length 1) Affects how the 'taxon_weight' is used. If 'TRUE', the weights for each taxon in an observation's classification are multiplied to get the observation weight. Otherwise, just the taxonomic level the observation is assign to it considered. If 'TRUE', use all supertaxa. Positive numbers indicate the number of ranks above each taxon to use. '0' is equivalent to 'FALSE'. Negative numbers are equivalent to 'TRUE'. |
collapse_func |
('function' of length 1) If 'taxon_weight' option is used and 'supertaxa' is 'TRUE', the weights for each taxon in an observation's classification are supplied to 'collapse_func' to get the observation weight. This function should take numeric vector and return a single number. |
... |
Additional options are passed to [filter_obs()]. |
target |
DEPRECIATED. use "data" instead. |
An object of type [taxmap()]
Other taxmap manipulation functions:
arrange_obs()
,
arrange_taxa()
,
filter_obs()
,
filter_taxa()
,
mutate_obs()
,
sample_frac_obs()
,
sample_frac_taxa()
,
sample_n_taxa()
,
select_obs()
,
transmute_obs()
# Sample 2 rows without replacement sample_n_obs(ex_taxmap, "info", 2) sample_n_obs(ex_taxmap, "foods", 2) # Sample with replacement sample_n_obs(ex_taxmap, "info", 10, replace = TRUE) # Sample some rows for often then others sample_n_obs(ex_taxmap, "info", 3, obs_weight = n_legs) # Sample multiple datasets at once sample_n_obs(ex_taxmap, c("info", "phylopic_ids", "foods"), 3)
# Sample 2 rows without replacement sample_n_obs(ex_taxmap, "info", 2) sample_n_obs(ex_taxmap, "foods", 2) # Sample with replacement sample_n_obs(ex_taxmap, "info", 10, replace = TRUE) # Sample some rows for often then others sample_n_obs(ex_taxmap, "info", 3, obs_weight = n_legs) # Sample multiple datasets at once sample_n_obs(ex_taxmap, c("info", "phylopic_ids", "foods"), 3)
Randomly sample some number of taxa from a [taxonomy()] or [taxmap()] object. Weights can be specified for taxa or the observations assigned to them. See [dplyr::sample_n()] for the inspiration for this function.
obj$sample_n_taxa(size, taxon_weight = NULL, obs_weight = NULL, obs_target = NULL, use_subtaxa = TRUE, collapse_func = mean, ...) sample_n_taxa(obj, size, taxon_weight = NULL, obs_weight = NULL, obs_target = NULL, use_subtaxa = TRUE, collapse_func = mean, ...)
obj |
([taxonomy()] or [taxmap()]) The object to sample from. |
size |
('numeric' of length 1) The number of taxa to sample. |
taxon_weight |
('numeric') Non-negative sampling weights of each taxon. If 'obs_weight' is also specified, the two weights are multiplied (after 'obs_weight' for each taxon is calculated). |
obs_weight |
('numeric') This option only applies to [taxmap()] objects. Sampling weights of each observation. The weights for each observation assigned to a given taxon are supplied to 'collapse_func' to get the taxon weight. If 'use_subtaxa' is 'TRUE' then the observations assigned to every subtaxa are also used. Any variable name that appears in [all_names()] can be used as if it was a vector on its own. If 'taxon_weight' is also specified, the two weights are multiplied (after 'obs_weight' for each observation is calculated). 'obs_target' must be used with this option. |
obs_target |
('character' of length 1) This option only applies to [taxmap()] objects. The name of the data set in 'obj$data' that values in 'obs_weight' corresponds to. Must be used when 'obs_weight' is used. |
use_subtaxa |
('logical' or 'numeric' of length 1) Affects how the 'obs_weight' option is used. If 'TRUE', the weights for each taxon in an observation's classification are multiplied to get the observation weight. If 'FALSE' just the taxonomic level the observation is assign to it considered. Positive numbers indicate the number of ranks below the each taxon to use. '0' is equivalent to 'FALSE'. Negative numbers are equivalent to 'TRUE'. |
collapse_func |
('function' of length 1) If 'taxon_weight' is used and ‘supertaxa' is 'TRUE', the weights for each taxon in an observation’s classification are supplied to 'collapse_func' to get the observation weight. This function should take numeric vector and return a single number. |
... |
Additional options are passed to [filter_taxa()]. |
An object of type [taxonomy()] or [taxmap()]
Other taxmap manipulation functions:
arrange_obs()
,
arrange_taxa()
,
filter_obs()
,
filter_taxa()
,
mutate_obs()
,
sample_frac_obs()
,
sample_frac_taxa()
,
sample_n_obs()
,
select_obs()
,
transmute_obs()
# Randomly sample three taxa sample_n_taxa(ex_taxmap, 3) # Include supertaxa sample_n_taxa(ex_taxmap, 3, supertaxa = TRUE) # Include subtaxa sample_n_taxa(ex_taxmap, 1, subtaxa = TRUE) # Sample some taxa more often then others sample_n_taxa(ex_taxmap, 3, supertaxa = TRUE, obs_weight = n_legs, obs_target = "info")
# Randomly sample three taxa sample_n_taxa(ex_taxmap, 3) # Include supertaxa sample_n_taxa(ex_taxmap, 3, supertaxa = TRUE) # Include subtaxa sample_n_taxa(ex_taxmap, 1, subtaxa = TRUE) # Sample some taxa more often then others sample_n_taxa(ex_taxmap, 3, supertaxa = TRUE, obs_weight = n_legs, obs_target = "info")
Subsets columns in a [taxmap()] object. Takes and returns a [taxmap()] object. Any variable name that appears in [all_names()] can be used as if it was a vector on its own. See [dplyr::select()] for the inspiration for this function and more information. Calling the function using the 'obj$select_obs(...)' style edits "obj" in place, unlike most R functions. However, calling the function using the ‘select_obs(obj, ...)' imitates R’s traditional copy-on-modify semantics, so "obj" would not be changed; instead a changed version would be returned, like most R functions.
obj$select_obs(data, ...) select_obs(obj, data, ...)
obj |
An object of type [taxmap()] |
data |
Dataset names, indexes, or a logical vector that indicates which tables in 'obj$data' to subset columns in. Multiple tables can be subset at once. |
... |
One or more column names to return in the new object. Each can be one of two things:
To match column names with a character vector, use 'matches("my_col_name")'. To match a logical vector, convert it to a column index using 'which'. |
target |
DEPRECIATED. use "data" instead. |
An object of type [taxmap()]
Other taxmap manipulation functions:
arrange_obs()
,
arrange_taxa()
,
filter_obs()
,
filter_taxa()
,
mutate_obs()
,
sample_frac_obs()
,
sample_frac_taxa()
,
sample_n_obs()
,
sample_n_taxa()
,
transmute_obs()
# Selecting a column by name select_obs(ex_taxmap, "info", dangerous) # Selecting a column by index select_obs(ex_taxmap, "info", 3) # Selecting a column by regular expressions select_obs(ex_taxmap, "info", matches("^n"))
# Selecting a column by name select_obs(ex_taxmap, "info", dangerous) # Selecting a column by index select_obs(ex_taxmap, "info", 3) # Selecting a column by regular expressions select_obs(ex_taxmap, "info", matches("^n"))
Return the stem taxa for a [taxonomy()] or a [taxmap()] object. Stem taxa are all those from the roots to the first taxon with more than one subtaxon.
obj$stems(subset = NULL, simplify = FALSE, value = "taxon_indexes", exclude_leaves = FALSE) stems(obj, subset = NULL, simplify = FALSE, value = "taxon_indexes", exclude_leaves = FALSE)
obj |
The [taxonomy()] or [taxmap()] object containing taxon information to be queried. |
subset |
Taxon IDs, TRUE/FALSE vector, or taxon indexes to find stems for. Default: All taxa in 'obj' will be used. Any variable name that appears in [all_names()] can be used as if it was a vector on its own. |
value |
What data to return. This is usually the name of column in a table in 'obj$data'. Any result of 'all_names(obj)' can be used, but it usually only makes sense to data that corresponds to taxa 1:1, such as [taxon_ranks()]. By default, taxon indexes are returned. |
simplify |
('logical') If 'TRUE', then combine all the results into a single vector of unique values. |
exclude_leaves |
('logical') If 'TRUE', the do not include taxa with no subtaxa. |
'character'
Other taxonomy indexing functions:
branches()
,
internodes()
,
leaves()
,
roots()
,
subtaxa()
,
supertaxa()
# Return indexes of stem taxa stems(ex_taxmap) # Return indexes for a subset of taxa stems(ex_taxmap, subset = 2:17) # Return something besides taxon indexes stems(ex_taxmap, value = "taxon_names") # Return a vector instead of a list stems(ex_taxmap, value = "taxon_names", simplify = TRUE)
# Return indexes of stem taxa stems(ex_taxmap) # Return indexes for a subset of taxa stems(ex_taxmap, subset = 2:17) # Return something besides taxon indexes stems(ex_taxmap, value = "taxon_names") # Return a vector instead of a list stems(ex_taxmap, value = "taxon_names", simplify = TRUE)
Return data for the subtaxa of each taxon in an [taxonomy()] or [taxmap()] object.
obj$subtaxa(subset = NULL, recursive = TRUE, simplify = FALSE, include_input = FALSE, value = "taxon_indexes") subtaxa(obj, subset = NULL, recursive = TRUE, simplify = FALSE, include_input = FALSE, value = "taxon_indexes")
obj |
The [taxonomy()] or [taxmap()] object containing taxon information to be queried. |
subset |
Taxon IDs, TRUE/FALSE vector, or taxon indexes to find subtaxa for. Default: All taxa in 'obj' will be used. Any variable name that appears in [all_names()] can be used as if it was a vector on its own. |
recursive |
('logical' or 'numeric') If 'FALSE', only return the subtaxa one rank below the target taxa. If 'TRUE', return all the subtaxa of every subtaxa, etc. Positive numbers indicate the number of ranks below the immediate subtaxa to return. '1' is equivalent to 'FALSE'. Negative numbers are equivalent to 'TRUE'. Since the algorithm is optimized for traversing all of large trees, 'numeric' values greater than 0 for this option actually take slightly longer to compute than either TRUE or FALSE. |
simplify |
('logical') If 'TRUE', then combine all the results into a single vector of unique values. |
include_input |
('logical') If 'TRUE', the input taxa are included in the output |
value |
What data to return. This is usually the name of column in a table in 'obj$data'. Any result of [all_names()] can be used, but it usually only makes sense to data that corresponds to taxa 1:1, such as [taxon_ranks()]. By default, taxon indexes are returned. |
If 'simplify = FALSE', then a list of vectors are returned corresponding to the 'target' argument. If 'simplify = TRUE', then the unique values are returned in a single vector.
Other taxonomy indexing functions:
branches()
,
internodes()
,
leaves()
,
roots()
,
stems()
,
supertaxa()
# return the indexes for subtaxa for each taxon subtaxa(ex_taxmap) # Only return data for some taxa using taxon indexes subtaxa(ex_taxmap, subset = 1:3) # Only return data for some taxa using taxon ids subtaxa(ex_taxmap, subset = c("d", "e")) # Only return data for some taxa using logical tests subtaxa(ex_taxmap, subset = taxon_ranks == "genus") # Only return subtaxa one level below subtaxa(ex_taxmap, recursive = FALSE) # Only return subtaxa some number of ranks below subtaxa(ex_taxmap, recursive = 2) # Return something besides taxon indexes subtaxa(ex_taxmap, value = "taxon_names")
# return the indexes for subtaxa for each taxon subtaxa(ex_taxmap) # Only return data for some taxa using taxon indexes subtaxa(ex_taxmap, subset = 1:3) # Only return data for some taxa using taxon ids subtaxa(ex_taxmap, subset = c("d", "e")) # Only return data for some taxa using logical tests subtaxa(ex_taxmap, subset = taxon_ranks == "genus") # Only return subtaxa one level below subtaxa(ex_taxmap, recursive = FALSE) # Only return subtaxa some number of ranks below subtaxa(ex_taxmap, recursive = 2) # Return something besides taxon indexes subtaxa(ex_taxmap, value = "taxon_names")
Apply a function to the subtaxa for each taxon. This is similar to using [subtaxa()] with [lapply()] or [sapply()].
obj$subtaxa_apply(func, subset = NULL, recursive = TRUE, simplify = FALSE, include_input = FALSE, value = "taxon_indexes", ...) subtaxa_apply(obj, func, subset = NULL, recursive = TRUE, simplify = FALSE, include_input = FALSE, value = "taxon_indexes", ...)
obj |
The [taxonomy()] or [taxmap()] object containing taxon information to be queried. |
func |
('function') The function to apply. |
subset |
Taxon IDs, TRUE/FALSE vector, or taxon indexes to use. Default: All taxa in 'obj' will be used. Any variable name that appears in [all_names()] can be used as if it was a vector on its own. |
recursive |
('logical' or 'numeric') If 'FALSE', only return the subtaxa one rank below the target taxa. If 'TRUE', return all the subtaxa of every subtaxa, etc. Positive numbers indicate the number of recursions (i.e. number of ranks below the target taxon to return). '1' is equivalent to 'FALSE'. Negative numbers are equivalent to 'TRUE'. |
simplify |
('logical') If 'TRUE', then combine all the results into a single vector of unique values. |
include_input |
('logical') If 'TRUE', the input taxa are included in the output |
value |
What data to give to the function. Any result of 'all_names(obj)' can be used, but it usually only makes sense to use data that has an associated taxon id. |
... |
Extra arguments are passed to the function. |
# Count number of subtaxa in each taxon subtaxa_apply(ex_taxmap, length) # Paste all the subtaxon names for each taxon subtaxa_apply(ex_taxmap, value = "taxon_names", recursive = FALSE, paste0, collapse = ", ")
# Count number of subtaxa in each taxon subtaxa_apply(ex_taxmap, length) # Paste all the subtaxon names for each taxon subtaxa_apply(ex_taxmap, value = "taxon_names", recursive = FALSE, paste0, collapse = ", ")
Return data for supertaxa (i.e. all taxa the target taxa are a part of) of each taxon in a [taxonomy()] or [taxmap()] object.
obj$supertaxa(subset = NULL, recursive = TRUE, simplify = FALSE, include_input = FALSE, value = "taxon_indexes", na = FALSE) supertaxa(obj, subset = NULL, recursive = TRUE, simplify = FALSE, include_input = FALSE, value = "taxon_indexes", na = FALSE)
obj |
The [taxonomy()] or [taxmap()] object containing taxon information to be queried. |
subset |
Taxon IDs, TRUE/FALSE vector, or taxon indexes to find supertaxa for. Default: All taxa in 'obj' will be used. Any variable name that appears in [all_names()] can be used as if it was a vector on its own. |
recursive |
('logical' or 'numeric') If 'FALSE', only return the supertaxa one rank above the target taxa. If 'TRUE', return all the supertaxa of every supertaxa, etc. Positive numbers indicate the number of recursions (i.e. number of ranks above the target taxon to return). '1' is equivalent to 'FALSE'. Negative numbers are equivalent to 'TRUE'. |
simplify |
('logical') If 'TRUE', then combine all the results into a single vector of unique values. |
include_input |
('logical') If 'TRUE', the input taxa are included in the output |
value |
What data to return. Any result of [all_names()] can be used, but it usually only makes sense to use data that has an associated taxon id. |
na |
('logical') If 'TRUE', return 'NA' where information is not available. |
If 'simplify = FALSE', then a list of vectors are returned corresponding to the 'subset' argument. If 'simplify = TRUE', then unique values are returned in a single vector.
Other taxonomy indexing functions:
branches()
,
internodes()
,
leaves()
,
roots()
,
stems()
,
subtaxa()
# return the indexes for supertaxa for each taxon supertaxa(ex_taxmap) # Only return data for some taxa using taxon indexes supertaxa(ex_taxmap, subset = 1:3) # Only return data for some taxa using taxon ids supertaxa(ex_taxmap, subset = c("d", "e")) # Only return data for some taxa using logical tests supertaxa(ex_taxmap, subset = taxon_ranks == "species") # Only return supertaxa one level above supertaxa(ex_taxmap, recursive = FALSE) # Only return supertaxa some number of ranks above supertaxa(ex_taxmap, recursive = 2) # Return something besides taxon indexes supertaxa(ex_taxmap, value = "taxon_names")
# return the indexes for supertaxa for each taxon supertaxa(ex_taxmap) # Only return data for some taxa using taxon indexes supertaxa(ex_taxmap, subset = 1:3) # Only return data for some taxa using taxon ids supertaxa(ex_taxmap, subset = c("d", "e")) # Only return data for some taxa using logical tests supertaxa(ex_taxmap, subset = taxon_ranks == "species") # Only return supertaxa one level above supertaxa(ex_taxmap, recursive = FALSE) # Only return supertaxa some number of ranks above supertaxa(ex_taxmap, recursive = 2) # Return something besides taxon indexes supertaxa(ex_taxmap, value = "taxon_names")
Apply a function to the supertaxa for each taxon. This is similar to using [supertaxa()] with [lapply()] or [sapply()].
obj$supertaxa_apply(func, subset = NULL, recursive = TRUE, simplify = FALSE, include_input = FALSE, value = "taxon_indexes", na = FALSE, ...) supertaxa_apply(obj, func, subset = NULL, recursive = TRUE, simplify = FALSE, include_input = FALSE, value = "taxon_indexes", na = FALSE, ....)
obj |
The [taxonomy()] or [taxmap()] object containing taxon information to be queried. |
func |
('function') The function to apply. |
subset |
Taxon IDs, TRUE/FALSE vector, or taxon indexes of taxa to use. Default: All taxa in 'obj' will be used. Any variable name that appears in [all_names()] can be used as if it was a vector on its own. |
recursive |
('logical' or 'numeric') If 'FALSE', only return the supertaxa one rank above the target taxa. If 'TRUE', return all the supertaxa of every supertaxa, etc. Positive numbers indicate the number of recursions (i.e. number of ranks above the target taxon to return). '1' is equivalent to 'FALSE'. Negative numbers are equivalent to 'TRUE'. |
simplify |
('logical') If 'TRUE', then combine all the results into a single vector of unique values. |
include_input |
('logical') If 'TRUE', the input taxa are included in the output |
value |
What data to give to the function. Any result of 'all_names(obj)' can be used, but it usually only makes sense to use data that has an associated taxon id. |
na |
('logical') If 'TRUE', return 'NA' where information is not available. |
... |
Extra arguments are passed to the function. |
# Get number of supertaxa that each taxon is contained in supertaxa_apply(ex_taxmap, length) # Get classifications for each taxon # Note; this can be done with `classifications()` easier supertaxa_apply(ex_taxmap, paste, collapse = ";", include_input = TRUE, value = "taxon_names")
# Get number of supertaxa that each taxon is contained in supertaxa_apply(ex_taxmap, length) # Get classifications for each taxon # Note; this can be done with `classifications()` easier supertaxa_apply(ex_taxmap, paste, collapse = ";", include_input = TRUE, value = "taxon_names")
Stores one or more [taxon()] objects. This is just a thin wrapper for a list of [taxon()] objects.
taxa(..., .list = NULL)
taxa(..., .list = NULL)
... |
Any number of object of class [taxon()] |
.list |
An alternate to the '...' input. Any number of object of class [taxon()]. Cannot be used with '...'. |
This is the documentation for the class called 'taxa'. If you are looking for the documentation for the package as a whole: [taxa-package].
An 'R6Class' object of class 'Taxon'
Other classes:
hierarchies()
,
hierarchy()
,
taxmap()
,
taxon()
,
taxon_database()
,
taxon_id()
,
taxon_name()
,
taxon_rank()
,
taxonomy()
(a <- taxon( name = taxon_name("Poa annua"), rank = taxon_rank("species"), id = taxon_id(93036) )) taxa(a, a, a) # a null set x <- taxon(NULL) taxa(x, x, x) # combo non-null and null taxa(a, x, a)
(a <- taxon( name = taxon_name("Poa annua"), rank = taxon_rank("species"), id = taxon_id(93036) )) taxa(a, a, a) # a null set x <- taxon(NULL) taxa(x, x, x) # combo non-null and null taxa(a, x, a)
A class designed to store a taxonomy and associated information. This class builds on the [taxonomy()] class. User defined data can be stored in the list 'obj$data', where 'obj' is a taxmap object. Data that is associated with taxa can be manipulated in a variety of ways using functions like [filter_taxa()] and [filter_obs()]. To associate the items of lists/vectors with taxa, name them by [taxon_ids()]. For tables, add a column named 'taxon_id' that stores [taxon_ids()].
taxmap(..., .list = NULL, data = NULL, funcs = list(), named_by_rank = FALSE)
taxmap(..., .list = NULL, data = NULL, funcs = list(), named_by_rank = FALSE)
... |
Any number of object of class [hierarchy()] or character vectors. |
.list |
An alternate to the '...' input. Any number of object of class [hierarchy()] or character vectors in a list. Cannot be used with '...'. |
data |
A list of tables with data associated with the taxa. |
funcs |
A named list of functions to include in the class. Referring to the names of these in functions like [filter_taxa()] will execute the function and return the results. If the function has at least one argument, the taxmap object is passed to it. |
named_by_rank |
('TRUE'/'FALSE') If 'TRUE' and the input is a list of vectors with each vector named by ranks, include that rank info in the output object, so it can be accessed by 'out$taxon_ranks()'. If 'TRUE', taxa with different ranks, but the same name and location in the taxonomy, will be considered different taxa. |
To initialize a 'taxmap' object with associated data sets, use the parsing functions [parse_tax_data()], [lookup_tax_data()], and [extract_tax_data()].
on initialize, function sorts the taxon list based on rank (if rank information is available), see [ranks_ref] for the reference rank names and orders
An 'R6Class' object of class [taxmap()]
Other classes:
hierarchies()
,
hierarchy()
,
taxa()
,
taxon()
,
taxon_database()
,
taxon_id()
,
taxon_name()
,
taxon_rank()
,
taxonomy()
# The code below shows how to contruct a taxmap object from scratch. # Typically, taxmap objects would be the output of a parsing function, # not created from scratch, but this is for demostration purposes. notoryctidae <- taxon( name = taxon_name("Notoryctidae"), rank = taxon_rank("family"), id = taxon_id(4479) ) notoryctes <- taxon( name = taxon_name("Notoryctes"), rank = taxon_rank("genus"), id = taxon_id(4544) ) typhlops <- taxon( name = taxon_name("typhlops"), rank = taxon_rank("species"), id = taxon_id(93036) ) mammalia <- taxon( name = taxon_name("Mammalia"), rank = taxon_rank("class"), id = taxon_id(9681) ) felidae <- taxon( name = taxon_name("Felidae"), rank = taxon_rank("family"), id = taxon_id(9681) ) felis <- taxon( name = taxon_name("Felis"), rank = taxon_rank("genus"), id = taxon_id(9682) ) catus <- taxon( name = taxon_name("catus"), rank = taxon_rank("species"), id = taxon_id(9685) ) panthera <- taxon( name = taxon_name("Panthera"), rank = taxon_rank("genus"), id = taxon_id(146712) ) tigris <- taxon( name = taxon_name("tigris"), rank = taxon_rank("species"), id = taxon_id(9696) ) plantae <- taxon( name = taxon_name("Plantae"), rank = taxon_rank("kingdom"), id = taxon_id(33090) ) solanaceae <- taxon( name = taxon_name("Solanaceae"), rank = taxon_rank("family"), id = taxon_id(4070) ) solanum <- taxon( name = taxon_name("Solanum"), rank = taxon_rank("genus"), id = taxon_id(4107) ) lycopersicum <- taxon( name = taxon_name("lycopersicum"), rank = taxon_rank("species"), id = taxon_id(49274) ) tuberosum <- taxon( name = taxon_name("tuberosum"), rank = taxon_rank("species"), id = taxon_id(4113) ) homo <- taxon( name = taxon_name("homo"), rank = taxon_rank("genus"), id = taxon_id(9605) ) sapiens <- taxon( name = taxon_name("sapiens"), rank = taxon_rank("species"), id = taxon_id(9606) ) hominidae <- taxon( name = taxon_name("Hominidae"), rank = taxon_rank("family"), id = taxon_id(9604) ) unidentified <- taxon( name = taxon_name("unidentified") ) tiger <- hierarchy(mammalia, felidae, panthera, tigris) cat <- hierarchy(mammalia, felidae, felis, catus) human <- hierarchy(mammalia, hominidae, homo, sapiens) mole <- hierarchy(mammalia, notoryctidae, notoryctes, typhlops) tomato <- hierarchy(plantae, solanaceae, solanum, lycopersicum) potato <- hierarchy(plantae, solanaceae, solanum, tuberosum) potato_partial <- hierarchy(solanaceae, solanum, tuberosum) unidentified_animal <- hierarchy(mammalia, unidentified) unidentified_plant <- hierarchy(plantae, unidentified) info <- data.frame(stringsAsFactors = FALSE, name = c("tiger", "cat", "mole", "human", "tomato", "potato"), n_legs = c(4, 4, 4, 2, 0, 0), dangerous = c(TRUE, FALSE, FALSE, TRUE, FALSE, FALSE)) abund <- data.frame(code = rep(c("T", "C", "M", "H"), 2), sample_id = rep(c("A", "B"), each = 2), count = c(1,2,5,2,6,2,4,0), taxon_index = rep(1:4, 2)) phylopic_ids <- c("e148eabb-f138-43c6-b1e4-5cda2180485a", "12899ba0-9923-4feb-a7f9-758c3c7d5e13", "11b783d5-af1c-4f4e-8ab5-a51470652b47", "9fae30cd-fb59-4a81-a39c-e1826a35f612", "b6400f39-345a-4711-ab4f-92fd4e22cb1a", "63604565-0406-460b-8cb8-1abe954b3f3a") foods <- list(c("mammals", "birds"), c("cat food", "mice"), c("insects"), c("Most things, but especially anything rare or expensive"), c("light", "dirt"), c("light", "dirt")) reaction <- function(x) { ifelse(x$data$info$dangerous, paste0("Watch out! That ", x$data$info$name, " might attack!"), paste0("No worries; its just a ", x$data$info$name, ".")) } ex_taxmap <- taxmap(tiger, cat, mole, human, tomato, potato, data = list(info = info, phylopic_ids = phylopic_ids, foods = foods, abund = abund), funcs = list(reaction = reaction))
# The code below shows how to contruct a taxmap object from scratch. # Typically, taxmap objects would be the output of a parsing function, # not created from scratch, but this is for demostration purposes. notoryctidae <- taxon( name = taxon_name("Notoryctidae"), rank = taxon_rank("family"), id = taxon_id(4479) ) notoryctes <- taxon( name = taxon_name("Notoryctes"), rank = taxon_rank("genus"), id = taxon_id(4544) ) typhlops <- taxon( name = taxon_name("typhlops"), rank = taxon_rank("species"), id = taxon_id(93036) ) mammalia <- taxon( name = taxon_name("Mammalia"), rank = taxon_rank("class"), id = taxon_id(9681) ) felidae <- taxon( name = taxon_name("Felidae"), rank = taxon_rank("family"), id = taxon_id(9681) ) felis <- taxon( name = taxon_name("Felis"), rank = taxon_rank("genus"), id = taxon_id(9682) ) catus <- taxon( name = taxon_name("catus"), rank = taxon_rank("species"), id = taxon_id(9685) ) panthera <- taxon( name = taxon_name("Panthera"), rank = taxon_rank("genus"), id = taxon_id(146712) ) tigris <- taxon( name = taxon_name("tigris"), rank = taxon_rank("species"), id = taxon_id(9696) ) plantae <- taxon( name = taxon_name("Plantae"), rank = taxon_rank("kingdom"), id = taxon_id(33090) ) solanaceae <- taxon( name = taxon_name("Solanaceae"), rank = taxon_rank("family"), id = taxon_id(4070) ) solanum <- taxon( name = taxon_name("Solanum"), rank = taxon_rank("genus"), id = taxon_id(4107) ) lycopersicum <- taxon( name = taxon_name("lycopersicum"), rank = taxon_rank("species"), id = taxon_id(49274) ) tuberosum <- taxon( name = taxon_name("tuberosum"), rank = taxon_rank("species"), id = taxon_id(4113) ) homo <- taxon( name = taxon_name("homo"), rank = taxon_rank("genus"), id = taxon_id(9605) ) sapiens <- taxon( name = taxon_name("sapiens"), rank = taxon_rank("species"), id = taxon_id(9606) ) hominidae <- taxon( name = taxon_name("Hominidae"), rank = taxon_rank("family"), id = taxon_id(9604) ) unidentified <- taxon( name = taxon_name("unidentified") ) tiger <- hierarchy(mammalia, felidae, panthera, tigris) cat <- hierarchy(mammalia, felidae, felis, catus) human <- hierarchy(mammalia, hominidae, homo, sapiens) mole <- hierarchy(mammalia, notoryctidae, notoryctes, typhlops) tomato <- hierarchy(plantae, solanaceae, solanum, lycopersicum) potato <- hierarchy(plantae, solanaceae, solanum, tuberosum) potato_partial <- hierarchy(solanaceae, solanum, tuberosum) unidentified_animal <- hierarchy(mammalia, unidentified) unidentified_plant <- hierarchy(plantae, unidentified) info <- data.frame(stringsAsFactors = FALSE, name = c("tiger", "cat", "mole", "human", "tomato", "potato"), n_legs = c(4, 4, 4, 2, 0, 0), dangerous = c(TRUE, FALSE, FALSE, TRUE, FALSE, FALSE)) abund <- data.frame(code = rep(c("T", "C", "M", "H"), 2), sample_id = rep(c("A", "B"), each = 2), count = c(1,2,5,2,6,2,4,0), taxon_index = rep(1:4, 2)) phylopic_ids <- c("e148eabb-f138-43c6-b1e4-5cda2180485a", "12899ba0-9923-4feb-a7f9-758c3c7d5e13", "11b783d5-af1c-4f4e-8ab5-a51470652b47", "9fae30cd-fb59-4a81-a39c-e1826a35f612", "b6400f39-345a-4711-ab4f-92fd4e22cb1a", "63604565-0406-460b-8cb8-1abe954b3f3a") foods <- list(c("mammals", "birds"), c("cat food", "mice"), c("insects"), c("Most things, but especially anything rare or expensive"), c("light", "dirt"), c("light", "dirt")) reaction <- function(x) { ifelse(x$data$info$dangerous, paste0("Watch out! That ", x$data$info$name, " might attack!"), paste0("No worries; its just a ", x$data$info$name, ".")) } ex_taxmap <- taxmap(tiger, cat, mole, human, tomato, potato, data = list(info = info, phylopic_ids = phylopic_ids, foods = foods, abund = abund), funcs = list(reaction = reaction))
A class used to define a single taxon. Most other classes in the taxa package include one or more objects of this class.
taxon(name, rank = NULL, id = NULL, authority = NULL)
taxon(name, rank = NULL, id = NULL, authority = NULL)
name |
a TaxonName object [taxon_name()] or character string. if character passed in, we'll coerce to a TaxonName object internally, required |
rank |
a TaxonRank object [taxon_rank()] or character string. if character passed in, we'll coerce to a TaxonRank object internally, required |
id |
a TaxonId object [taxon_id()], numeric/integer, or character string. if numeric/integer/character passed in, we'll coerce to a TaxonId object internally, required |
authority |
(character) a character string, optional |
Note that there is a special use case of this function - you can pass 'NULL' as the first parameter to get an empty 'taxon' object. It makes sense to retain the original behavior where nothing passed in to the first parameter leads to an error, and thus creating a 'NULL' taxon is done very explicitly.
An 'R6Class' object of class 'Taxon'
Other classes:
hierarchies()
,
hierarchy()
,
taxa()
,
taxmap()
,
taxon_database()
,
taxon_id()
,
taxon_name()
,
taxon_rank()
,
taxonomy()
(x <- taxon( name = taxon_name("Poa annua"), rank = taxon_rank("species"), id = taxon_id(93036) )) x$name x$rank x$id # a null taxon object taxon(NULL) ## with all NULL objects from the other classes taxon( name = taxon_name(NULL), rank = taxon_rank(NULL), id = taxon_id(NULL) )
(x <- taxon( name = taxon_name("Poa annua"), rank = taxon_rank("species"), id = taxon_id(93036) )) x$name x$rank x$id # a null taxon object taxon(NULL) ## with all NULL objects from the other classes taxon( name = taxon_name(NULL), rank = taxon_rank(NULL), id = taxon_id(NULL) )
Used to store information about taxonomy databases. This is typically used to store where taxon information came from in [taxon()] objects.
taxon_database(name = NULL, url = NULL, description = NULL, id_regex = NULL)
taxon_database(name = NULL, url = NULL, description = NULL, id_regex = NULL)
name |
(character) name of the database |
url |
(character) url for the database |
description |
(character) description of the database |
id_regex |
(character) id regex |
An 'R6Class' object of class 'TaxonDatabase'
[database_list]
Other classes:
hierarchies()
,
hierarchy()
,
taxa()
,
taxmap()
,
taxon()
,
taxon_id()
,
taxon_name()
,
taxon_rank()
,
taxonomy()
# create a database entry (x <- taxon_database( "ncbi", "http://www.ncbi.nlm.nih.gov/taxonomy", "NCBI Taxonomy Database", "*" )) x$name x$url # use pre-created database objects database_list database_list$ncbi
# create a database entry (x <- taxon_database( "ncbi", "http://www.ncbi.nlm.nih.gov/taxonomy", "NCBI Taxonomy Database", "*" )) x$name x$url # use pre-created database objects database_list database_list$ncbi
Used to store taxon IDs, either arbitrary or from a taxonomy database. This is typically used to store taxon IDs in [taxon()] objects.
taxon_id(id, database = NULL)
taxon_id(id, database = NULL)
id |
(character/integer/numeric) a taxonomic id, required |
database |
(database) database class object, optional |
An 'R6Class' object of class 'TaxonId'
Other classes:
hierarchies()
,
hierarchy()
,
taxa()
,
taxmap()
,
taxon()
,
taxon_database()
,
taxon_name()
,
taxon_rank()
,
taxonomy()
(x <- taxon_id(12345)) x$id x$database (x <- taxon_id( 12345, database_list$ncbi )) x$id x$database # a null taxon_name object taxon_name(NULL)
(x <- taxon_id(12345)) x$id x$database (x <- taxon_id( 12345, database_list$ncbi )) x$id x$database # a null taxon_name object taxon_name(NULL)
Return the taxon IDs in a [taxonomy()] or [taxmap()] object. They are in the order they appear in the edge list.
obj$taxon_ids() taxon_ids(obj)
obj |
The [taxonomy()] or [taxmap()] object. |
Other taxonomy data functions:
classifications()
,
id_classifications()
,
is_branch()
,
is_internode()
,
is_leaf()
,
is_root()
,
is_stem()
,
map_data()
,
map_data_()
,
n_leaves()
,
n_leaves_1()
,
n_subtaxa()
,
n_subtaxa_1()
,
n_supertaxa()
,
n_supertaxa_1()
,
taxon_indexes()
,
taxon_names()
,
taxon_ranks()
# Return the taxon IDs for each taxon taxon_ids(ex_taxmap) # Filter using taxon IDs filter_taxa(ex_taxmap, ! taxon_ids %in% c("c", "d"))
# Return the taxon IDs for each taxon taxon_ids(ex_taxmap) # Filter using taxon IDs filter_taxa(ex_taxmap, ! taxon_ids %in% c("c", "d"))
Return the taxon indexes in a [taxonomy()] or [taxmap()] object. They are the indexes of the edge list rows.
obj$taxon_indexes() taxon_indexes(obj)
obj |
The [taxonomy()] or [taxmap()] object. |
Other taxonomy data functions:
classifications()
,
id_classifications()
,
is_branch()
,
is_internode()
,
is_leaf()
,
is_root()
,
is_stem()
,
map_data()
,
map_data_()
,
n_leaves()
,
n_leaves_1()
,
n_subtaxa()
,
n_subtaxa_1()
,
n_supertaxa()
,
n_supertaxa_1()
,
taxon_ids()
,
taxon_names()
,
taxon_ranks()
# Return the indexes for each taxon taxon_indexes(ex_taxmap) # Use in another function (stupid example; 1:5 would work too) filter_taxa(ex_taxmap, taxon_indexes < 5)
# Return the indexes for each taxon taxon_indexes(ex_taxmap) # Use in another function (stupid example; 1:5 would work too) filter_taxa(ex_taxmap, taxon_indexes < 5)
Used to store the name of taxa. This is typically used to store where taxon names in [taxon()] objects.
taxon_name(name, database = NULL)
taxon_name(name, database = NULL)
name |
(character) a taxonomic name. required |
database |
(character) database class object, optional |
An 'R6Class' object of class 'TaxonName'
Other classes:
hierarchies()
,
hierarchy()
,
taxa()
,
taxmap()
,
taxon()
,
taxon_database()
,
taxon_id()
,
taxon_rank()
,
taxonomy()
(poa <- taxon_name("Poa")) (undef <- taxon_name("undefined")) (sp1 <- taxon_name("species 1")) (poa_annua <- taxon_name("Poa annua")) (x <- taxon_name("Poa annua L.")) x$name x$database (x <- taxon_name( "Poa annua", database_list$ncbi )) x$rank x$database # a null taxon_name object taxon_name(NULL)
(poa <- taxon_name("Poa")) (undef <- taxon_name("undefined")) (sp1 <- taxon_name("species 1")) (poa_annua <- taxon_name("Poa annua")) (x <- taxon_name("Poa annua L.")) x$name x$database (x <- taxon_name( "Poa annua", database_list$ncbi )) x$rank x$database # a null taxon_name object taxon_name(NULL)
Return the taxon names in a [taxonomy()] or [taxmap()] object. They are in the order they appear in the edge list.
obj$taxon_names() taxon_names(obj)
obj |
The [taxonomy()] or [taxmap()] object. |
Other taxonomy data functions:
classifications()
,
id_classifications()
,
is_branch()
,
is_internode()
,
is_leaf()
,
is_root()
,
is_stem()
,
map_data()
,
map_data_()
,
n_leaves()
,
n_leaves_1()
,
n_subtaxa()
,
n_subtaxa_1()
,
n_supertaxa()
,
n_supertaxa_1()
,
taxon_ids()
,
taxon_indexes()
,
taxon_ranks()
# Return the names for each taxon taxon_names(ex_taxmap) # Filter by taxon name filter_taxa(ex_taxmap, taxon_names == "Felidae", subtaxa = TRUE)
# Return the names for each taxon taxon_names(ex_taxmap) # Filter by taxon name filter_taxa(ex_taxmap, taxon_names == "Felidae", subtaxa = TRUE)
Stores the rank of a taxon. This is typically used to store where taxon information came from in [taxon()] objects.
taxon_rank(name, database = NULL)
taxon_rank(name, database = NULL)
name |
(character) rank name. required |
database |
(character) database class object, optional |
An 'R6Class' object of class 'TaxonRank'
Other classes:
hierarchies()
,
hierarchy()
,
taxa()
,
taxmap()
,
taxon()
,
taxon_database()
,
taxon_id()
,
taxon_name()
,
taxonomy()
taxon_rank("species") taxon_rank("genus") taxon_rank("kingdom") (x <- taxon_rank( "species", database_list$ncbi )) x$rank x$database # a null taxon_name object taxon_name(NULL)
taxon_rank("species") taxon_rank("genus") taxon_rank("kingdom") (x <- taxon_rank( "species", database_list$ncbi )) x$rank x$database # a null taxon_name object taxon_name(NULL)
Return the taxon ranks in a [taxonomy()] or [taxmap()] object. They are in the order taxa appear in the edge list.
obj$taxon_ranks() taxon_ranks(obj)
obj |
The [taxonomy()] or [taxmap()] object. |
Other taxonomy data functions:
classifications()
,
id_classifications()
,
is_branch()
,
is_internode()
,
is_leaf()
,
is_root()
,
is_stem()
,
map_data()
,
map_data_()
,
n_leaves()
,
n_leaves_1()
,
n_subtaxa()
,
n_subtaxa_1()
,
n_supertaxa()
,
n_supertaxa_1()
,
taxon_ids()
,
taxon_indexes()
,
taxon_names()
# Get ranks for each taxon taxon_ranks(ex_taxmap) # Filter by rank filter_taxa(ex_taxmap, taxon_ranks == "family", supertaxa = TRUE)
# Get ranks for each taxon taxon_ranks(ex_taxmap) # Filter by rank filter_taxa(ex_taxmap, taxon_ranks == "family", supertaxa = TRUE)
Stores a taxonomy composed of [taxon()] objects organized in a tree structure. This differs from the [hierarchies()] class in how the [taxon()] objects are stored. Unlike [hierarchies()], each taxon is only stored once and the relationships between taxa are stored in an [edge list](https://en.wikipedia.org/wiki/Adjacency_list).
taxonomy(..., .list = NULL, named_by_rank = FALSE)
taxonomy(..., .list = NULL, named_by_rank = FALSE)
... |
Any number of object of class [hierarchy()] or character vectors. |
.list |
An alternate to the '...' input. Any number of object of class [hierarchy()] or character vectors in a list. Cannot be used with '...'. |
named_by_rank |
('TRUE'/'FALSE') If 'TRUE' and the input is a list of vectors with each vector named by ranks, include that rank info in the output object, so it can be accessed by 'out$taxon_ranks()'. If 'TRUE', taxa with different ranks, but the same name and location in the taxonomy, will be considered different taxa. |
An 'R6Class' object of class 'Taxonomy'
Other classes:
hierarchies()
,
hierarchy()
,
taxa()
,
taxmap()
,
taxon()
,
taxon_database()
,
taxon_id()
,
taxon_name()
,
taxon_rank()
# Making a taxonomy object with vectors taxonomy(c("mammalia", "felidae", "panthera", "tigris"), c("mammalia", "felidae", "panthera", "leo"), c("mammalia", "felidae", "felis", "catus")) # Making a taxonomy object from scratch # Note: This information would usually come from a parsing function. # This is just for demonstration. x <- taxon( name = taxon_name("Notoryctidae"), rank = taxon_rank("family"), id = taxon_id(4479) ) y <- taxon( name = taxon_name("Notoryctes"), rank = taxon_rank("genus"), id = taxon_id(4544) ) z <- taxon( name = taxon_name("Notoryctes typhlops"), rank = taxon_rank("species"), id = taxon_id(93036) ) a <- taxon( name = taxon_name("Mammalia"), rank = taxon_rank("class"), id = taxon_id(9681) ) b <- taxon( name = taxon_name("Felidae"), rank = taxon_rank("family"), id = taxon_id(9681) ) cc <- taxon( name = taxon_name("Puma"), rank = taxon_rank("genus"), id = taxon_id(146712) ) d <- taxon( name = taxon_name("Puma concolor"), rank = taxon_rank("species"), id = taxon_id(9696) ) m <- taxon( name = taxon_name("Panthera"), rank = taxon_rank("genus"), id = taxon_id(146712) ) n <- taxon( name = taxon_name("Panthera tigris"), rank = taxon_rank("species"), id = taxon_id(9696) ) (hier1 <- hierarchy(z, y, x, a)) (hier2 <- hierarchy(cc, b, a, d)) (hier3 <- hierarchy(n, m, b, a)) (hrs <- hierarchies(hier1, hier2, hier3)) ex_taxonomy <- taxonomy(hier1, hier2, hier3)
# Making a taxonomy object with vectors taxonomy(c("mammalia", "felidae", "panthera", "tigris"), c("mammalia", "felidae", "panthera", "leo"), c("mammalia", "felidae", "felis", "catus")) # Making a taxonomy object from scratch # Note: This information would usually come from a parsing function. # This is just for demonstration. x <- taxon( name = taxon_name("Notoryctidae"), rank = taxon_rank("family"), id = taxon_id(4479) ) y <- taxon( name = taxon_name("Notoryctes"), rank = taxon_rank("genus"), id = taxon_id(4544) ) z <- taxon( name = taxon_name("Notoryctes typhlops"), rank = taxon_rank("species"), id = taxon_id(93036) ) a <- taxon( name = taxon_name("Mammalia"), rank = taxon_rank("class"), id = taxon_id(9681) ) b <- taxon( name = taxon_name("Felidae"), rank = taxon_rank("family"), id = taxon_id(9681) ) cc <- taxon( name = taxon_name("Puma"), rank = taxon_rank("genus"), id = taxon_id(146712) ) d <- taxon( name = taxon_name("Puma concolor"), rank = taxon_rank("species"), id = taxon_id(9696) ) m <- taxon( name = taxon_name("Panthera"), rank = taxon_rank("genus"), id = taxon_id(146712) ) n <- taxon( name = taxon_name("Panthera tigris"), rank = taxon_rank("species"), id = taxon_id(9696) ) (hier1 <- hierarchy(z, y, x, a)) (hier2 <- hierarchy(cc, b, a, d)) (hier3 <- hierarchy(n, m, b, a)) (hrs <- hierarchies(hier1, hier2, hier3)) ex_taxonomy <- taxonomy(hier1, hier2, hier3)
Convert per-taxon information, like taxon names, to a table of taxa (rows) by ranks (columns).
obj |
A |
subset |
Taxon IDs, TRUE/FALSE vector, or taxon indexes to find supertaxa for. Default: All leaves will be used. Any variable name that appears in [all_names()] can be used as if it was a vector on its own. |
value |
What data to return. Default is taxon names. Any result of [all_names()] can be used, but it usually only makes sense to use data with one value per taxon, like taxon names. |
use_ranks |
Which ranks to use. Must be one of the following: * 'NULL' (the default): If there is rank information, use the ranks that appear in the lineage with the most ranks. Otherwise, assume the number of supertaxa corresponds to rank and use placeholders for the rank column names in the output. * 'TRUE': Use the ranks that appear in the lineage with the most ranks. An error will occur if no rank information is available. * 'FALSE': Assume the number of supertaxa corresponds to rank and use placeholders for the rank column names in the output. Do not use included rank information. * 'character': The names of the ranks to use. Requires included rank information. * 'numeric': The "depth" of the ranks to use. These are equal to 'n_supertaxa' + 1. |
add_id_col |
If 'TRUE', include a taxon ID column. |
A tibble of taxa (rows) by ranks (columns).
# Make a table of taxon names taxonomy_table(ex_taxmap) # Use a differnt value taxonomy_table(ex_taxmap, value = "taxon_ids") # Return a subset of taxa taxonomy_table(ex_taxmap, subset = taxon_ranks == "genus") # Use arbitrary ranks names based on depth taxonomy_table(ex_taxmap, use_ranks = FALSE)
# Make a table of taxon names taxonomy_table(ex_taxmap) # Use a differnt value taxonomy_table(ex_taxmap, value = "taxon_ids") # Return a subset of taxa taxonomy_table(ex_taxmap, subset = taxon_ranks == "genus") # Use arbitrary ranks names based on depth taxonomy_table(ex_taxmap, use_ranks = FALSE)
Replace columns of tables in 'obj$data' in [taxmap()] objects. See [dplyr::transmute()] for the inspiration for this function and more information. Calling the function using the 'obj$transmute_obs(...)' style edits "obj" in place, unlike most R functions. However, calling the function using the ‘transmute_obs(obj, ...)' imitates R’s traditional copy-on-modify semantics, so "obj" would not be changed; instead a changed version would be returned, like most R functions.
obj$transmute_obs(data, ...) transmute_obs(obj, data, ...)
obj |
An object of type [taxmap()] |
data |
Dataset name, index, or a logical vector that indicates which dataset in 'obj$data' to use. |
... |
One or more named columns to add. Newly created columns can be referenced in the same function call. Any variable name that appears in [all_names()] can be used as if it was a vector on its own. |
target |
DEPRECIATED. use "data" instead. |
An object of type [taxmap()]
Other taxmap manipulation functions:
arrange_obs()
,
arrange_taxa()
,
filter_obs()
,
filter_taxa()
,
mutate_obs()
,
sample_frac_obs()
,
sample_frac_taxa()
,
sample_n_obs()
,
sample_n_taxa()
,
select_obs()
# Replace columns in a table with new columns transmute_obs(ex_taxmap, "info", new_col = paste0(name, "!!!"))
# Replace columns in a table with new columns transmute_obs(ex_taxmap, "info", new_col = paste0(name, "!!!"))
Attempts to save taxonomic and sequence information of a taxmap object in the
Greengenes output format. If the taxmap object was created using
parse_greengenes
, then it should be able to replicate the
format exactly with the default settings.
write_greengenes( obj, tax_file = NULL, seq_file = NULL, tax_names = obj$get_data("taxon_names")[[1]], ranks = obj$get_data("gg_rank")[[1]], ids = obj$get_data("gg_id")[[1]], sequences = obj$get_data("gg_seq")[[1]] )
write_greengenes( obj, tax_file = NULL, seq_file = NULL, tax_names = obj$get_data("taxon_names")[[1]], ranks = obj$get_data("gg_rank")[[1]], ids = obj$get_data("gg_id")[[1]], sequences = obj$get_data("gg_seq")[[1]] )
obj |
A taxmap object |
tax_file |
( |
seq_file |
( |
tax_names |
( |
ranks |
( |
ids |
( |
sequences |
( |
The taxonomy output file has a format like:
228054 k__Bacteria; p__Cyanobacteria; c__Synechococcophycideae; o__Synech... 844608 k__Bacteria; p__Cyanobacteria; c__Synechococcophycideae; o__Synech... ...
The optional sequence file has a format like:
>1111886 AACGAACGCTGGCGGCATGCCTAACACATGCAAGTCGAACGAGACCTTCGGGTCTAGTGGCGCACGGGTGCGTA... >1111885 AGAGTTTGATCCTGGCTCAGAATGAACGCTGGCGGCGTGCCTAACACATGCAAGTCGTACGAGAAATCCCGAGC... ...
Other writers:
make_dada2_asv_table()
,
make_dada2_tax_table()
,
write_mothur_taxonomy()
,
write_rdp()
,
write_silva_fasta()
,
write_unite_general()
Attempts to save taxonomic information of a taxmap object in the
mothur '*.taxonomy' format. If the taxmap object was created using
parse_mothur_taxonomy
, then it should be able to replicate the format
exactly with the default settings.
write_mothur_taxonomy( obj, file, tax_names = obj$get_data("taxon_names")[[1]], ids = obj$get_data("sequence_id")[[1]], scores = NULL )
write_mothur_taxonomy( obj, file, tax_names = obj$get_data("taxon_names")[[1]], ids = obj$get_data("sequence_id")[[1]], scores = NULL )
obj |
A taxmap object |
file |
( |
tax_names |
( |
ids |
( |
scores |
( |
The output file has a format like:
AY457915 Bacteria(100);Firmicutes(99);Clostridiales(99);Johnsone... AY457914 Bacteria(100);Firmicutes(100);Clostridiales(100);Johnso... AY457913 Bacteria(100);Firmicutes(100);Clostridiales(100);Johnso... AY457912 Bacteria(100);Firmicutes(99);Clostridiales(99);Johnsone... AY457911 Bacteria(100);Firmicutes(99);Clostridiales(98);Ruminoco...
or...
AY457915 Bacteria;Firmicutes;Clostridiales;Johnsonella_et_rel.;J... AY457914 Bacteria;Firmicutes;Clostridiales;Johnsonella_et_rel.;J... AY457913 Bacteria;Firmicutes;Clostridiales;Johnsonella_et_rel.;J... AY457912 Bacteria;Firmicutes;Clostridiales;Johnsonella_et_rel.;J... AY457911 Bacteria;Firmicutes;Clostridiales;Ruminococcus_et_rel.;...
Other writers:
make_dada2_asv_table()
,
make_dada2_tax_table()
,
write_greengenes()
,
write_rdp()
,
write_silva_fasta()
,
write_unite_general()
Attempts to save taxonomic and sequence information of a taxmap object in the
RDP FASTA format. If the taxmap object was created using
parse_rdp
, then it should be able to replicate the format
exactly with the default settings.
write_rdp( obj, file, tax_names = obj$get_data("taxon_names")[[1]], ranks = obj$get_data("rdp_rank")[[1]], ids = obj$get_data("rdp_id")[[1]], info = obj$get_data("seq_name")[[1]], sequences = obj$get_data("rdp_seq")[[1]] )
write_rdp( obj, file, tax_names = obj$get_data("taxon_names")[[1]], ranks = obj$get_data("rdp_rank")[[1]], ids = obj$get_data("rdp_id")[[1]], info = obj$get_data("seq_name")[[1]], sequences = obj$get_data("rdp_seq")[[1]] )
obj |
A taxmap object |
file |
( |
tax_names |
( |
ranks |
( |
ids |
( |
info |
( |
sequences |
( |
The output file has a format like:
>S000448483 Sparassis crispa; MBUH-PIRJO&ILKKA94-1587/ss5 Lineage=Root;rootrank;Fun... ggattcccctagtaactgcgagtgaagcgggaagagctcaaatttaaaatctggcggcgtcctcgtcgtccgagttgtaa tctggagaagcgacatccgcgctggaccgtgtacaagtctcttggaaaagagcgtcgtagagggtgacaatcccgtcttt ...
Other writers:
make_dada2_asv_table()
,
make_dada2_tax_table()
,
write_greengenes()
,
write_mothur_taxonomy()
,
write_silva_fasta()
,
write_unite_general()
Attempts to save taxonomic and sequence information of a taxmap object in the
SILVA FASTA format. If the taxmap object was created using
parse_silva_fasta
, then it should be able to replicate the format
exactly with the default settings.
write_silva_fasta( obj, file, tax_names = obj$get_data("taxon_names")[[1]], other_names = obj$get_data("other_name")[[1]], ids = obj$get_data("ncbi_id")[[1]], start = obj$get_data("start_pos")[[1]], end = obj$get_data("end_pos")[[1]], sequences = obj$get_data("silva_seq")[[1]] )
write_silva_fasta( obj, file, tax_names = obj$get_data("taxon_names")[[1]], other_names = obj$get_data("other_name")[[1]], ids = obj$get_data("ncbi_id")[[1]], start = obj$get_data("start_pos")[[1]], end = obj$get_data("end_pos")[[1]], sequences = obj$get_data("silva_seq")[[1]] )
obj |
A taxmap object |
file |
( |
tax_names |
( |
other_names |
( |
ids |
( |
start |
( |
end |
( |
sequences |
( |
The output file has a format like:
>GCVF01000431.1.2369 Bacteria;Proteobacteria;Gammaproteobacteria;Oceanospiril... CGUGCACGGUGGAUGCCUUGGCAGCCAGAGGCGAUGAAGGACGUUGUAGCCUGCGAUAAGCUCCGGUUAGGUGGCAAACA ACCGUUUGACCCGGAGAUCUCCGAAUGGGGCAACCCACCCGUUGUAAGGCGGGUAUCACCGACUGAAUCCAUAGGUCGGU ...
Other writers:
make_dada2_asv_table()
,
make_dada2_tax_table()
,
write_greengenes()
,
write_mothur_taxonomy()
,
write_rdp()
,
write_unite_general()
Attempts to save taxonomic and sequence information of a taxmap object in the
UNITE general FASTA format. If the taxmap object was created using
parse_unite_general
, then it should be able to replicate the format
exactly with the default settings.
write_unite_general( obj, file, tax_names = obj$get_data("taxon_names")[[1]], ranks = obj$get_data("unite_rank")[[1]], sequences = obj$get_data("unite_seq")[[1]], seq_name = obj$get_data("organism")[[1]], ids = obj$get_data("unite_id")[[1]], gb_acc = obj$get_data("acc_num")[[1]], type = obj$get_data("unite_type")[[1]] )
write_unite_general( obj, file, tax_names = obj$get_data("taxon_names")[[1]], ranks = obj$get_data("unite_rank")[[1]], sequences = obj$get_data("unite_seq")[[1]], seq_name = obj$get_data("organism")[[1]], ids = obj$get_data("unite_id")[[1]], gb_acc = obj$get_data("acc_num")[[1]], type = obj$get_data("unite_type")[[1]] )
obj |
A taxmap object |
file |
( |
tax_names |
( |
ranks |
( |
sequences |
( |
seq_name |
( |
ids |
( |
gb_acc |
( |
type |
( |
The output file has a format like:
>Glomeromycota_sp|KJ484724|SH523877.07FU|reps|k__Fungi;p__Glomeromycota;c__unid... ATAATTTGCCGAACCTAGCGTTAGCGCGAGGTTCTGCGATCAACACTTATATTTAAAACCCAACTCTTAAATTTTGTAT... ...
Other writers:
make_dada2_asv_table()
,
make_dada2_tax_table()
,
write_greengenes()
,
write_mothur_taxonomy()
,
write_rdp()
,
write_silva_fasta()
For a given table in a taxmap
object, convert all counts
below a minimum number to zero. This is useful for effectively removing
"singletons", "doubletons", or other low abundance counts.
zero_low_counts( obj, data, min_count = 2, use_total = FALSE, cols = NULL, other_cols = FALSE, out_names = NULL, dataset = NULL )
zero_low_counts( obj, data, min_count = 2, use_total = FALSE, cols = NULL, other_cols = FALSE, out_names = NULL, dataset = NULL )
obj |
A |
data |
The name of a table in |
min_count |
The minimum number of counts needed for a count to remain
unchanged. Any could less than this will be converted to a zero. For
example, |
use_total |
If |
cols |
The columns in
|
other_cols |
Preserve in the output non-target columns present in the input data. New columns will always be on the end. The "taxon_id" column will be preserved in the front. Takes one of the following inputs:
|
out_names |
The names of count columns in the output. Must be the same
length and order as |
dataset |
DEPRECIATED. use "data" instead. |
A tibble
Other calculations:
calc_diff_abund_deseq2()
,
calc_group_mean()
,
calc_group_median()
,
calc_group_rsd()
,
calc_group_stat()
,
calc_n_samples()
,
calc_obs_props()
,
calc_prop_samples()
,
calc_taxon_abund()
,
compare_groups()
,
counts_to_presence()
,
rarefy_obs()
## Not run: # Parse data for examples x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$") # Default use zero_low_counts(x, "tax_data") # Use only a subset of columns zero_low_counts(x, "tax_data", cols = c("700035949", "700097855", "700100489")) zero_low_counts(x, "tax_data", cols = 4:6) zero_low_counts(x, "tax_data", cols = startsWith(colnames(x$data$tax_data), "70001")) # Including all other columns in ouput zero_low_counts(x, "tax_data", other_cols = TRUE) # Inlcuding specific columns in output zero_low_counts(x, "tax_data", cols = c("700035949", "700097855", "700100489"), other_cols = 2:3) # Rename output columns zero_low_counts(x, "tax_data", cols = c("700035949", "700097855", "700100489"), out_names = c("a", "b", "c")) ## End(Not run)
## Not run: # Parse data for examples x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";", class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"), class_regex = "^(.+)__(.+)$") # Default use zero_low_counts(x, "tax_data") # Use only a subset of columns zero_low_counts(x, "tax_data", cols = c("700035949", "700097855", "700100489")) zero_low_counts(x, "tax_data", cols = 4:6) zero_low_counts(x, "tax_data", cols = startsWith(colnames(x$data$tax_data), "70001")) # Including all other columns in ouput zero_low_counts(x, "tax_data", other_cols = TRUE) # Inlcuding specific columns in output zero_low_counts(x, "tax_data", cols = c("700035949", "700097855", "700100489"), other_cols = 2:3) # Rename output columns zero_low_counts(x, "tax_data", cols = c("700035949", "700097855", "700100489"), out_names = c("a", "b", "c")) ## End(Not run)