Skip to contents

collapse() produces a protein based tidyproteomics data-object from a peptide based tidyproteomics data-object.


  data = NULL,
  collapse_to = "protein",
  assign_by = c("all-possible", "razor-local", "razor-global", "non-homologous"),
  top_n = Inf,
  split_abundance = FALSE,
  fasta_path = NULL,
  .verbose = TRUE,
  .function = fsum



a tidyproteomics data-object


a character string representing the final aggregation point. Conventionally this is the protein name or id, however, if a gene_name or any other term exists in the annotations table of the data-object, peptides can be aggregated to that.


the method to by which to combine peptides into proteins; all-possible allows peptide's quantitative value to be included in all assigned proteins, razor-local (razor peptides are shared between proteins, a peptide which could belong to different proteins is assigned to the protein that has the highest likelihood to be actually present in the sample, so the shared peptide can only contribute to the identification score of the protein group which has the highest probability of being in the sample), in this case assignment goes to the protein of highest probability only within a sample class, such that peptides from another sample group which change the protein of highest probability are not accounted for in this scheme. razor-global determines protein of highest probability using all available peptides in the data set, non-homologous only utilizes the abundance values from peptides that have a single unique identity.


a numeric to indicate the N number of peptides summed account for the protein quantitative value, this assumes that peptides have been summed across charge states


(experimental) a boolean to indicate if abundances for razor peptides should be split according to protein prevalence, or the proportion of total abundance between all proteins that share a particular peptide.


if supplied, it will be used to fill in annotation values such as description, protein_name and gene_name


a boolean


an assignable protein abundance summary function, fsum, fmean, fgeomean and fmedian have constructed as NAs must be removed. The default is fsum() fsum <- function(x){base::sum(x, na.rm = TRUE)}, where x is the vector of peptide abundances assigned to that protein by the assign_by method. Note - peptides that have a 0 or NA quantitative value are still used to determine razor assignments, as that sequence was observed, quantitative values are just missing.


a tidyproteomics data-object


library(dplyr, warn.conflicts = FALSE)
# data <- hela_peptides %>% collapse()
# data %>% summary("sample")