Overview
overview.Rmd
Tidyproteomics is an R package for the post processing and analysis of quantitative proteomic data. Accomplished through a simplified S3 data object and corrisponing function. This package supports at a high level:
- data importing
- data filtering
- data visualization
- quantitative normalization & imputation
- two-sample expression & term enrichment analysis
- protein inference, sequence coverage and visualization
The objective of tidyproteomics is to simplify the post analysis of
many proteomics projects by providing an R framework for the analysis
and integration of methods and algorithms. The goal to is provide a set
of functional steps to processing your data, a record of that processing
and methods for visualization. It is intended to be much like how the
tidyverse provides data processing functions that can be piped together
for easily understood and cleaner code. Reference the
vignette("workflow-publication")
.
While there are several well developed and exceptional tools available to perform the exact same analysis, they are often tied to specific up-stream inputs, perform only a portion of the desired analysis, or have limited licensing availability.
This package was designed to allow for expansion and integration of other algorithms, methods and workflows in addition to providing access to different data formats via exported up-stream analyses. It is also intended to be open for review, improvement and bug fixing.
Package overview
data manipulation
Reference vignette("importing")
and
vignette("subsetting")
-
import()
- imports data from several sources into the tidyproteomics data object -
subset()
- subset a tidyproteomics data object by a given regex -
reassign()
- quickly reassign data to different sample sets -
merge()
- combines multiple imported data sets into a single object -
export_quant()
- exports a tidyproteomics data object to .csv, .tsv, .xlsx or .rds -
export_analysis()
- exports a tidyproteomics data object to .csv, .tsv, .xlsx or .rds
basic analysis
-
summary()
- provides a quick accounting of the number of proteins observed -
plot_counts()
- provides a quick bar chart for the number of proteins observed -
plot_quantrank()
- provides a quick plot on quantitative expression for all proteins observed
normalization
Reference vignette("normalizing")
-
normalize()
- normalize the raw data from a tidyproteomics data object -
select_normalization()
- use a weighted scheme to automatically pick the best normalization method, or manually set one for down-stream analysis
impute missing values
Reference vignette("imputing")
-
impute()
- impute missing values from a tidyproteomics data object
data visualization
-
plot_normalization()
- a boxplot of the raw and normalized values -
plot_variation_cv()
- a scatter plot of raw and normalized CV and dynamic range values -
plot_variation_pca()
- a scatter plot of raw and normalized PCA values -
plot_dynamic_range()
- a 2d density plot of raw and normalized CVs by log10 abundance -
plot_venn()
- a Venn accounting diagram of protein overlap between samples -
plot_euler()
- a Euler accounting diagram of protein overlap between samples -
plot_pca()
- a scatter plot of PCA values for the selected normalized data values -
plot_heatmap()
- a heatmap of protein by sample for the selected normalized data values, clustered in both dimensions
two-sample analysis
expression differences
-
expression()
- calculates the two-sample statistical differences for each protein -
plot_volcano()
- a scatter plot of log2-foldchange by p-values for a given expression test -
plot_proportion()
- a scatter plot of log2-foldchange by proportional-expression for a given expression test -
plot_compexp()
- a scatter plot comparison of two expression tests to visualize the intersection / difference
term enrichment
-
enrichment()
- term enrichment for a given expression test using Wilcoxon Rank Sum -
plot_enrichment()
- a bubble plot visualization of term enrichment for a given expression test