Skip to contents

Tidyproteomics is an R package for the post processing and analysis of quantitative proteomic data. Accomplished through a simplified S3 data object and corrisponing function. This package supports at a high level:

  • data importing
  • data filtering
  • data visualization
  • quantitative normalization & imputation
  • two-sample expression & term enrichment analysis
  • protein inference, sequence coverage and visualization

The objective of tidyproteomics is to simplify the post analysis of many proteomics projects by providing an R framework for the analysis and integration of methods and algorithms. The goal to is provide a set of functional steps to processing your data, a record of that processing and methods for visualization. It is intended to be much like how the tidyverse provides data processing functions that can be piped together for easily understood and cleaner code. Reference the vignette("workflow-publication").

While there are several well developed and exceptional tools available to perform the exact same analysis, they are often tied to specific up-stream inputs, perform only a portion of the desired analysis, or have limited licensing availability.

This package was designed to allow for expansion and integration of other algorithms, methods and workflows in addition to providing access to different data formats via exported up-stream analyses. It is also intended to be open for review, improvement and bug fixing.

Package overview

data manipulation

Reference vignette("importing") and vignette("subsetting")

  • import() - imports data from several sources into the tidyproteomics data object
  • subset() - subset a tidyproteomics data object by a given regex
  • reassign() - quickly reassign data to different sample sets
  • merge() - combines multiple imported data sets into a single object
  • export_quant() - exports a tidyproteomics data object to .csv, .tsv, .xlsx or .rds
  • export_analysis() - exports a tidyproteomics data object to .csv, .tsv, .xlsx or .rds

basic analysis

  • summary() - provides a quick accounting of the number of proteins observed
  • plot_counts() - provides a quick bar chart for the number of proteins observed
  • plot_quantrank() - provides a quick plot on quantitative expression for all proteins observed

normalization

Reference vignette("normalizing")

  • normalize() - normalize the raw data from a tidyproteomics data object
  • select_normalization() - use a weighted scheme to automatically pick the best normalization method, or manually set one for down-stream analysis

impute missing values

Reference vignette("imputing")

  • impute() - impute missing values from a tidyproteomics data object

data visualization

  • plot_normalization() - a boxplot of the raw and normalized values
  • plot_variation_cv() - a scatter plot of raw and normalized CV and dynamic range values
  • plot_variation_pca() - a scatter plot of raw and normalized PCA values
  • plot_dynamic_range() - a 2d density plot of raw and normalized CVs by log10 abundance
  • plot_venn() - a Venn accounting diagram of protein overlap between samples
  • plot_euler() - a Euler accounting diagram of protein overlap between samples
  • plot_pca() - a scatter plot of PCA values for the selected normalized data values
  • plot_heatmap() - a heatmap of protein by sample for the selected normalized data values, clustered in both dimensions

two-sample analysis

expression differences

  • expression() - calculates the two-sample statistical differences for each protein
  • plot_volcano() - a scatter plot of log2-foldchange by p-values for a given expression test
  • plot_proportion() - a scatter plot of log2-foldchange by proportional-expression for a given expression test
  • plot_compexp() - a scatter plot comparison of two expression tests to visualize the intersection / difference

term enrichment

  • enrichment() - term enrichment for a given expression test using Wilcoxon Rank Sum
  • plot_enrichment() - a bubble plot visualization of term enrichment for a given expression test

Workflows

A simple work flow for importing data and summarizing

library("tidyproteomics")
hela_proteins <- path_to_package_data("proteins") %>%
  import("ProteomeDiscoverer", "proteins") 

hela_proteins %>% summary()
hela_proteins %>% summary(by = 'sample') 
hela_proteins %>% summary(by = 'contamination')