Import and Summarize
workflow-simple.Rmd
The following is a simple workflow for the importing and summarizing
of a data set. The tidyproteomics data object has a defined
print()
function that will summarized the data contents,
while the summary()
function will provide a statistical
summary of the quantitative data with values as described in Table
1.
Table 1 - summary table description
Colunm | Accounting |
---|---|
first | group (eg sample name) |
files | integer number present in group |
proteins | integer number present in group |
protein_groups | integer number present in group |
peptides | integer number present in group |
peptides_unique | integer number present in group |
quantifiable | percent ratio of non-zero values |
CVs | ratio quantitative abundance (sd/mean) |
Importing Data
# path_to_package_data() loads data specific to this package
# for your project load local data
# example:
# your_proteins <- "./data/your_exported_results.xlsx" %>%
# import("ProteomeDiscoverer", "proteins")
hela_proteins <- path_to_package_data("p97KD_HCT116") %>%
import("ProteomeDiscoverer", "proteins") %>%
# change the sample labels
reassign('sample', 'ctl', 'control') %>%
reassign('sample', 'p97', 'knockdown')
Print Data Contents
Printing the imported data object, or simply exposing the object will show a summary of the data object contents
hela_proteins
#>
#> ── Quantitative Proteomics Data Object ──
#>
#> Origin ProteomeDiscoverer
#> proteins (10.67 MB)
#> Composition 6 files
#> 2 samples (control, knockdown)
#> Quantitation 7055 proteins
#> 4 log10 dynamic range
#> 28.8% missing values
#> *imputed
#> Accounting (4) num_peptides num_psms num_unique_peptides imputed
#> Annotations (9) description biological_process cellular_component molecular_function
#> gene_id_entrez gene_name wiki_pathway reactome_pathway
#> gene_id_ensemble
#>
As more operations are performed on the data, more of the contents are summarized
hela_proteins %>%
expression(knockdown/control) %>%
enrichment(knockdown/control, .terms = 'biological_process') %>%
enrichment(knockdown/control, .terms = 'molecular_function')
#> ℹ .. expression::t_test testing knockdown / control
#> ✔ .. expression::t_test testing knockdown / control [3.3s]
#>
#> ℹ .. enrichment::gsea testing knockdown / control by term biological_process
#> ✔ .. enrichment::gsea testing knockdown / control by term biological_process [1…
#>
#> ℹ .. enrichment::gsea testing knockdown / control by term molecular_function
#> ✔ .. enrichment::gsea testing knockdown / control by term molecular_function [5…
#>
#> ── Quantitative Proteomics Data Object ──
#>
#> Origin ProteomeDiscoverer
#> proteins (11.41 MB)
#> Composition 6 files
#> 2 samples (control, knockdown)
#> Quantitation 7055 proteins
#> 4 log10 dynamic range
#> 28.8% missing values
#> *imputed
#> Accounting (4) num_peptides num_psms num_unique_peptides imputed
#> Annotations (9) description biological_process cellular_component molecular_function
#> gene_id_entrez gene_name wiki_pathway reactome_pathway
#> gene_id_ensemble
#> Analyses (1)
#> knockdown/control -> expression & enrichment (biological_process, molecular_function)
#>
Summarize Quantitative Data
Use the explicit summary()
function summarize the data,
in this case globally.
hela_proteins %>% summary()
#> ── Summary: global ──
#>
#> proteins peptides peptides_unique quantifiable CVs
#> 7055 66329 58706 0.908 0.25
#>
Here is a summary by unique sample names
hela_proteins %>% summary(by = 'sample')
#>
#> ── Summary: sample ──
#>
#> sample proteins peptides peptides_unique quantifiable CVs
#> control 7055 66329 58706 0.908 0.16
#> knockdown 7055 66329 58706 0.909 0.21
#>
A summary that includes contamination where the description is contains ‘CRAP’, as in the crap-ome
hela_proteins %>% summary(contamination = 'CRAP')
#>
#> ── Summary: contamination ──
#>
#> sample replicate native BSA Keratin Other Trypsin sample_id
#> control 1 92.7% 3.66% 3.56% 0.0023% 0.1% 9e6ed3ba
#> control 2 92% 4.02% 3.89% 0.00205% 0.123% cc56fc1d
#> control 3 92% 4.01% 3.9% 0.00208% 0.113% 6a21f7a9
#> knockdown 1 92% 4.01% 3.88% 0.00183% 0.125% 966be57f
#> knockdown 2 92.7% 3.66% 3.59% 0.0023% 0.0648% 79a98e41
#> knockdown 3 92.2% 3.89% 3.82% 0.00232% 0.0679% 9f804505
#> import_file sample_file
#> p97KD_HCT116_proteins.xlsx F1
#> p97KD_HCT116_proteins.xlsx F4
#> p97KD_HCT116_proteins.xlsx F5
#> p97KD_HCT116_proteins.xlsx F2
#> p97KD_HCT116_proteins.xlsx F3
#> p97KD_HCT116_proteins.xlsx F6
#>
A summary for contamination where we specify where the description is contains ‘ribosome’
hela_proteins %>% summary(contamination = "ribosome")
#>
#> ── Summary: contamination ──
#>
#> sample replicate native ribosome sample_id import_file
#> control 1 99.8% 0.155% 9e6ed3ba p97KD_HCT116_proteins.xlsx
#> control 2 99.8% 0.15% cc56fc1d p97KD_HCT116_proteins.xlsx
#> control 3 99.8% 0.156% 6a21f7a9 p97KD_HCT116_proteins.xlsx
#> knockdown 1 99.8% 0.171% 966be57f p97KD_HCT116_proteins.xlsx
#> knockdown 2 99.8% 0.166% 79a98e41 p97KD_HCT116_proteins.xlsx
#> knockdown 3 99.8% 0.164% 9f804505 p97KD_HCT116_proteins.xlsx
#> sample_file
#> F1
#> F4
#> F5
#> F2
#> F3
#> F6
#>
A summary based on a term set in the provided annotations
hela_proteins %>% summary('biological_process')
#>
#> ── Summary: biological_process ──
#>
#> biological_process proteins peptides peptides_unique
#> cell communication 9 100 93
#> cell death 1 3 1
#> cell differentiation 3 9 9
#> cell growth 104 1419 839
#> cell organization and biogenesis 17 241 241
#> cell proliferation 7055 66329 58706
#> cellular component movement 6 13 11
#> cellular homeostasis 324 2854 2631
#> coagulation 9 68 58
#> conjugation 181 1460 1240
#> defense response 15 83 76
#> development 38 180 164
#> metabolic process 342 2804 2422
#> quantifiable CVs
#> 0.920 0.200
#> 1.000 0.340
#> 0.389 0.305
#> 0.803 0.280
#> 0.967 0.210
#> 0.908 0.250
#> 0.679 0.410
#> 0.938 0.260
#> 0.885 0.220
#> 0.893 0.260
#> 0.709 0.275
#> 0.730 0.245
#> 0.886 0.250
#>