Perform the evaluation of different nmfLasso solutions by bootstrap for K (unknown) somatic mutational signatures given a set of observations x. The estimation can slow down because of memory usage and intensive computations, when a big number of bootstrap repetitions is asked and when the analysis is performed for a big range of signatures (K). In this case, an advice may be to split the computation into multiple smaller sets.

nmfLassoBootstrap(
  x,
  K = 3:10,
  starting_beta = NULL,
  background_signature = NULL,
  normalize_counts = TRUE,
  nmf_runs = 10,
  bootstrap_repetitions = 50,
  iterations = 30,
  max_iterations_lasso = 10000,
  num_processes = Inf,
  seed = NULL,
  verbose = TRUE,
  log_file = ""
)

Arguments

x

count matrix for a set of n patients and 96 trinucleotides.

K

a range of numeric value (each of them greater than 1) indicating the number of signatures to be discovered.

starting_beta

a list of starting beta value for each configuration of K. If it is NULL, starting betas are estimated by NMF.

background_signature

background signature to be used. If not provided, a warning is thrown and an initial value for it is estimated by NMF. If beta is not NULL, this parameter is ignored.

normalize_counts

if true, the input count matrix x is normalize such that the patients have the same number of mutation.

nmf_runs

number of iteration (minimum 1) of NMF to be performed for a robust estimation of starting beta. If beta is not NULL, this parameter is ignored.

bootstrap_repetitions

Number of time bootstrap should be repeated. Higher values result in better estimate, but are computationally more expensive.

iterations

Number of iterations to be performed. Each iteration corresponds to a first step where beta is fitted and a second step where alpha is fitted.

max_iterations_lasso

Number of maximum iterations to be performed during the sparsification via Lasso.

num_processes

Number of processes to be used during parallel execution. To execute in single process mode, this parameter needs to be set to either NA or NULL.

seed

Seed for reproducibility.

verbose

boolean; Shall I print all messages?

log_file

log file where to print outputs when using parallel. If parallel execution is disabled, this parameter is ignored.

Value

A list of 3 elements: stability, RSS and evar. Here, stability reports the estimared cosine similarity for alpha and beta at each bootstrap repetition; RSS reports for each configuration the estimated residual sum of squares; finally, evar reports the explained variance.

Examples

data(background)
data(patients)
res = nmfLassoBootstrap(x=patients[1:100,],
     K=3:5,
     background_signature=background,
     nmf_runs=1,
     bootstrap_repetitions=2,
     num_processes=NA,
     seed=12345)
#> Performing a total of 2 bootstrap repetitions to assess nmfLasso solutions for different K ranks... 
#> Computing the initial values of beta by standard NMF for K equals to 3... 
#> Computing the initial values of beta by standard NMF for K equals to 4... 
#> Computing the initial values of beta by standard NMF for K equals to 5... 
#> Starting bootstrap for a total of 2 repetitions... 
#> Performing repetition 1 out of 2... 
#> Progress 50%... 
#> Performing repetition 2 out of 2... 
#> Progress 100%...