Skip to contents

Main method for imputing missing values

Usage

impute(
  data = NULL,
  .function = base::min,
  method = c("row", "column", "matrix"),
  group_by_sample = FALSE,
  cores = 2
)

Arguments

data

a tidyproteomics list data-object

.function

summary statistic function. Default is base::min, examples of other functions include min, max, mean, sum. Note, NAs will be be removed in the function call.

method

a character string to indicate the imputation method (row, column, matrix). Consider a data matrix of peptide/protein "rows" and dataset "columns". A 'row' functions by imputing values between samples looking at the values for a given peptide/protein, while the 'column' method imputes within a dataset of values. The function 'randomforest' imputes using data from all rows and columns, or the "matrix", without bias toward sample groups. If given a bias for sample groups, expression differences would also bias sample groups. If it is the case that sample groups should be biased (such as gene deletion), then it is suggested to impute using min function and the 'within' method.

group_by_sample

a boolean to indicate that the data should be grouped by sample name to bias the imputation to within that sample.

cores

the number of threads used to speed the calculation

Value

a tidyproteomics list data-object

Examples

library(dplyr, warn.conflicts = FALSE)
library(tidyproteomics)
hela_proteins %>% summary("sample")
#> ── Summary: sample ──
#> 
#>     sample proteins peptides peptides_unique quantifiable  CVs
#>    control     7055    66329           58706        0.908 0.16
#>  knockdown     7055    66329           58706        0.909 0.21
#> 

hela_proteins %>% impute(.function = stats::median) %>% summary("sample")
#>  Imputing by row using the function base::quote function (x, na.rm = FALSE, ..
#>  Imputing by row using the function base::quote function (x, na.rm = FALSE, ..
#> 
#>  ... 1919 values imputed
#> 
#> ── Summary: sample ──
#> 
#>     sample proteins peptides peptides_unique quantifiable  CVs
#>    control     7055    66329           58706        0.931 0.16
#>  knockdown     7055    66329           58706        0.931 0.20
#> 

hela_proteins %>% impute(.function = impute.randomforest) %>% summary("sample")
#>  Imputing by row using the function base::quote function (matrix = NULL, cores
#> Error in .f(as.vector(stats::na.omit(x))): imput data must be a matrix
#>  Imputing by row using the function base::quote function (matrix = NULL, cores
#>