Overview
overview.Rmd
Tidyproteomics is an R package for the post processing and analysis of quantitative proteomic data. Accomplished through a simplified S3 data object and corrisponing function. This package supports at a high level:
 data importing
 data filtering
 data visualization
 quantitative normalization & imputation
 twosample expression & term enrichment analysis
 protein inference, sequence coverage and visualization
The objective of tidyproteomics is to simplify the post analysis of
many proteomics projects by providing an R framework for the analysis
and integration of methods and algorithms. The goal to is provide a set
of functional steps to processing your data, a record of that processing
and methods for visualization. It is intended to be much like how the
tidyverse provides data processing functions that can be piped together
for easily understood and cleaner code. Reference the
vignette("workflowpublication")
.
While there are several well developed and exceptional tools available to perform the exact same analysis, they are often tied to specific upstream inputs, perform only a portion of the desired analysis, or have limited licensing availability.
This package was designed to allow for expansion and integration of other algorithms, methods and workflows in addition to providing access to different data formats via exported upstream analyses. It is also intended to be open for review, improvement and bug fixing.
Package overview
data manipulation
Reference vignette("importing")
and
vignette("subsetting")

import()
 imports data from several sources into the tidyproteomics data object 
subset()
 subset a tidyproteomics data object by a given regex 
reassign()
 quickly reassign data to different sample sets 
merge()
 combines multiple imported data sets into a single object 
export_quant()
 exports a tidyproteomics data object to .csv, .tsv, .xlsx or .rds 
export_analysis()
 exports a tidyproteomics data object to .csv, .tsv, .xlsx or .rds
basic analysis

summary()
 provides a quick accounting of the number of proteins observed 
plot_counts()
 provides a quick bar chart for the number of proteins observed 
plot_quantrank()
 provides a quick plot on quantitative expression for all proteins observed
normalization
Reference vignette("normalizing")

normalize()
 normalize the raw data from a tidyproteomics data object 
select_normalization()
 use a weighted scheme to automatically pick the best normalization method, or manually set one for downstream analysis
impute missing values
Reference vignette("imputing")

impute()
 impute missing values from a tidyproteomics data object
data visualization

plot_normalization()
 a boxplot of the raw and normalized values 
plot_variation_cv()
 a scatter plot of raw and normalized CV and dynamic range values 
plot_variation_pca()
 a scatter plot of raw and normalized PCA values 
plot_dynamic_range()
 a 2d density plot of raw and normalized CVs by log10 abundance 
plot_venn()
 a Venn accounting diagram of protein overlap between samples 
plot_euler()
 a Euler accounting diagram of protein overlap between samples 
plot_pca()
 a scatter plot of PCA values for the selected normalized data values 
plot_heatmap()
 a heatmap of protein by sample for the selected normalized data values, clustered in both dimensions
twosample analysis
expression differences

expression()
 calculates the twosample statistical differences for each protein 
plot_volcano()
 a scatter plot of log2foldchange by pvalues for a given expression test 
plot_proportion()
 a scatter plot of log2foldchange by proportionalexpression for a given expression test 
plot_compexp()
 a scatter plot comparison of two expression tests to visualize the intersection / difference
term enrichment

enrichment()
 term enrichment for a given expression test using Wilcoxon Rank Sum 
plot_enrichment()
 a bubble plot visualization of term enrichment for a given expression test