Estimate the range of lambda values for beta to be considered in the signature inference. Note that too small values of lambda result in dense signatures, but too large values lead to bad fit of the counts.

lambdaRangeBetaEvaluation(
  x,
  K = 5,
  beta = NULL,
  background_signature = NULL,
  normalize_counts = TRUE,
  nmf_runs = 10,
  lambda_values = c(0.01, 0.05, 0.1, 0.2),
  iterations = 30,
  max_iterations_lasso = 10000,
  num_processes = Inf,
  seed = NULL,
  verbose = TRUE,
  log_file = ""
)

Arguments

x

count matrix for a set of n patients and 96 trinucleotides.

K

numeric value (minimum 2) indicating the number of signatures to be discovered.

beta

starting beta for the estimation. If it is NULL, starting beta is estimated by NMF.

background_signature

background signature to be used. If not provided, a warning is thrown and an initial value for it is estimated by NMF. If beta is not NULL, this parameter is ignored.

normalize_counts

if true, the input count matrix x is normalize such that the patients have the same number of mutation.

nmf_runs

number of iteration (minimum 1) of NMF to be performed for a robust estimation of starting beta. If beta is not NULL, this parameter is ignored.

lambda_values

value of LASSO to be used for beta between 0 and 1. This value should be greater than 0. 1 is the value of LASSO that would shrink all the signatures to 0 within one step. The higher lambda_values is, the sparser are the resulting signatures, but too large values may result in a reduced fit of the observed counts.

iterations

Number of iterations to be performed. Each iteration corresponds to a first step where beta is fitted and a second step where alpha is fitted.

max_iterations_lasso

Number of maximum iterations to be performed during the sparsification via Lasso.

num_processes

Number of processes to be used during parallel execution. To execute in single process mode, this parameter needs to be set to either NA or NULL.

seed

Seed for reproducibility.

verbose

boolean; Shall I print all messages?

log_file

log file where to print outputs when using parallel. If parallel execution is disabled, this parameter is ignored.

Value

A list corresponding to results of the function nmfLasso for each value of lambda to be tested. This function allows to test a good range of lambda values for beta to be considered. One should keep in mind that too small values generate dense solution, while too high ones leads to poor fit. This behavior is resampled in the values of loglik_progression, which should be increasing: too small values of lambda results in unstable log-likelihood through the iterations, while too large values make log-likelihood drop.

Examples

data(background)
data(patients)
res = lambdaRangeBetaEvaluation(x=patients[1:100,],
     K=5,
     background_signature=background,
     nmf_runs=1,
     lambda_values=c(0.01,0.05),
     num_processes=NA,
     seed=12345)
#> Computing the initial values of beta by standard NMF... 
#> Performing estimation of lambda range for beta... 
#> Progress 50%... 
#> Progress 100%...