Import and Summarize • tidyproteomics

The following is a simple workflow for the importing and summarizing of a data set. The tidyproteomics data object has a defined print() function that will summarized the data contents, while the summary() function will provide a statistical summary of the quantitative data with values as described in Table 1.

Table 1 - summary table description

Colunm	Accounting
first	group (eg sample name)
files	integer number present in group
proteins	integer number present in group
protein_groups	integer number present in group
peptides	integer number present in group
peptides_unique	integer number present in group
quantifiable	percent ratio of non-zero values
CVs	ratio quantitative abundance (sd/mean)

library("dplyr")
library("tidyproteomics")

Importing Data

# path_to_package_data() loads data specific to this package
# for your project load local data
# example: 
# your_proteins <- "./data/your_exported_results.xlsx" %>%
#   import("ProteomeDiscoverer", "proteins")

hela_proteins <- path_to_package_data("p97KD_HCT116") %>%
  import("ProteomeDiscoverer", "proteins") %>%
  # change the sample labels
  reassign('sample', 'ctl', 'control') %>%
  reassign('sample', 'p97', 'knockdown')

Print Data Contents

Printing the imported data object, or simply exposing the object will show a summary of the data object contents

hela_proteins
#> 
#> ── Quantitative Proteomics Data Object ──
#> 
#> Origin          ProteomeDiscoverer 
#>                 proteins (10.67 MB) 
#> Composition     6 files 
#>                 2 samples (control, knockdown) 
#> Quantitation    7055 proteins 
#>                 4 log10 dynamic range 
#>                 28.8% missing values 
#>  *imputed        
#> Accounting      (4) num_peptides num_psms num_unique_peptides imputed 
#> Annotations     (9) description biological_process cellular_component molecular_function
#>                 gene_id_entrez gene_name wiki_pathway reactome_pathway
#>                 gene_id_ensemble 
#>

As more operations are performed on the data, more of the contents are summarized

hela_proteins %>%
  expression(knockdown/control) %>%
  enrichment(knockdown/control, .terms = 'biological_process') %>%
  enrichment(knockdown/control, .terms = 'molecular_function')
#> ℹ .. expression::t_test testing knockdown / control
#> ✔ .. expression::t_test testing knockdown / control [3.5s]
#> 
#> ℹ .. enrichment::gsea testing knockdown / control by term biological_process
#> ✔ .. enrichment::gsea testing knockdown / control by term biological_process [1…
#> 
#> ℹ .. enrichment::gsea testing knockdown / control by term molecular_function
#> ✔ .. enrichment::gsea testing knockdown / control by term molecular_function [5…
#> 
#> ── Quantitative Proteomics Data Object ──
#> 
#> Origin          ProteomeDiscoverer 
#>                 proteins (11.41 MB) 
#> Composition     6 files 
#>                 2 samples (control, knockdown) 
#> Quantitation    7055 proteins 
#>                 4 log10 dynamic range 
#>                 28.8% missing values 
#>  *imputed        
#> Accounting      (4) num_peptides num_psms num_unique_peptides imputed 
#> Annotations     (9) description biological_process cellular_component molecular_function
#>                 gene_id_entrez gene_name wiki_pathway reactome_pathway
#>                 gene_id_ensemble 
#> Analyses        (1) 
#>                 knockdown/control -> expression & enrichment (biological_process, molecular_function) 
#>

Summarize Quantitative Data

Use the explicit summary() function summarize the data, in this case globally.

hela_proteins %>% summary()
#> ── Summary: global ──
#> 
#>  proteins peptides peptides_unique quantifiable  CVs
#>      7055    66329           58706        0.908 0.25
#>

Here is a summary by unique sample names

hela_proteins %>% summary(by = 'sample') 
#> 
#> ── Summary: sample ──
#> 
#>     sample proteins peptides peptides_unique quantifiable  CVs
#>    control     7055    66329           58706        0.908 0.16
#>  knockdown     7055    66329           58706        0.909 0.21
#>

A summary that includes contamination where the description is contains ‘CRAP’, as in the crap-ome

hela_proteins %>% summary(contamination = 'CRAP') 
#> 
#> ── Summary: contamination ──
#> 
#>     sample replicate native   BSA Keratin    Other Trypsin sample_id
#>    control         1  92.7% 3.66%   3.56%  0.0023%    0.1%  9e6ed3ba
#>    control         2    92% 4.02%   3.89% 0.00205%  0.123%  cc56fc1d
#>    control         3    92% 4.01%    3.9% 0.00208%  0.113%  6a21f7a9
#>  knockdown         1    92% 4.01%   3.88% 0.00183%  0.125%  966be57f
#>  knockdown         2  92.7% 3.66%   3.59%  0.0023% 0.0648%  79a98e41
#>  knockdown         3  92.2% 3.89%   3.82% 0.00232% 0.0679%  9f804505
#>                 import_file sample_file
#>  p97KD_HCT116_proteins.xlsx          F1
#>  p97KD_HCT116_proteins.xlsx          F4
#>  p97KD_HCT116_proteins.xlsx          F5
#>  p97KD_HCT116_proteins.xlsx          F2
#>  p97KD_HCT116_proteins.xlsx          F3
#>  p97KD_HCT116_proteins.xlsx          F6
#>

A summary for contamination where we specify where the description is contains ‘ribosome’

hela_proteins %>% summary(contamination = "ribosome") 
#> 
#> ── Summary: contamination ──
#> 
#>     sample replicate native ribosome sample_id                import_file
#>    control         1  99.8%   0.155%  9e6ed3ba p97KD_HCT116_proteins.xlsx
#>    control         2  99.8%    0.15%  cc56fc1d p97KD_HCT116_proteins.xlsx
#>    control         3  99.8%   0.156%  6a21f7a9 p97KD_HCT116_proteins.xlsx
#>  knockdown         1  99.8%   0.171%  966be57f p97KD_HCT116_proteins.xlsx
#>  knockdown         2  99.8%   0.166%  79a98e41 p97KD_HCT116_proteins.xlsx
#>  knockdown         3  99.8%   0.164%  9f804505 p97KD_HCT116_proteins.xlsx
#>  sample_file
#>           F1
#>           F4
#>           F5
#>           F2
#>           F3
#>           F6
#>

A summary based on a term set in the provided annotations

hela_proteins %>% summary('biological_process')
#> 
#> ── Summary: biological_process ──
#> 
#>                biological_process proteins peptides peptides_unique
#>                cell communication        9      100              93
#>                        cell death        1        3               1
#>              cell differentiation        3        9               9
#>                       cell growth      104     1419             839
#>  cell organization and biogenesis       17      241             241
#>                cell proliferation     7055    66329           58706
#>       cellular component movement        6       13              11
#>              cellular homeostasis      324     2854            2631
#>                       coagulation        9       68              58
#>                       conjugation      181     1460            1240
#>                  defense response       15       83              76
#>                       development       38      180             164
#>                 metabolic process      342     2804            2422
#>  quantifiable   CVs
#>         0.920 0.200
#>         1.000 0.340
#>         0.389 0.305
#>         0.803 0.280
#>         0.967 0.210
#>         0.908 0.250
#>         0.679 0.410
#>         0.938 0.260
#>         0.885 0.220
#>         0.893 0.260
#>         0.709 0.275
#>         0.730 0.245
#>         0.886 0.250
#>