Perform the assessment of different nmfLasso solutions by cross validation for K (unknown) somatic mutational signatures given a set of observations x. The estimation can slow down because of memory usage and intensive computations, when a big number of cross validation repetitions is asked and when the grid search is performed for a lot of configurations. In this case, an advice may be to split the computation into multiple smaller sets.
nmfLassoCV(
x,
K = 3:10,
starting_beta = NULL,
background_signature = NULL,
normalize_counts = TRUE,
nmf_runs = 10,
lambda_values_alpha = c(0, 0.01, 0.05, 0.1),
lambda_values_beta = c(0, 0.01, 0.05, 0.1),
cross_validation_entries = 0.01,
cross_validation_iterations = 5,
cross_validation_repetitions = 50,
iterations = 30,
max_iterations_lasso = 10000,
num_processes = Inf,
seed = NULL,
verbose = TRUE,
log_file = ""
)
count matrix for a set of n patients and 96 trinucleotides.
a range of numeric value (each of them greater than 1) indicating the number of signatures to be discovered.
a list of starting beta value for each configuration of K. If it is NULL, starting betas are estimated by NMF.
background signature to be used. If not provided, a warning is thrown and an initial value for it is estimated by NMF. If beta is not NULL, this parameter is ignored.
if true, the input count matrix x is normalize such that the patients have the same number of mutation.
number of iteration (minimum 1) of NMF to be performed for a robust estimation of starting beta. If beta is not NULL, this parameter is ignored.
value of LASSO to be used for alpha between 0 and 1. This value should be greater than 0. 1 is the value of LASSO that would shrink all the signatures to 0 within one step. The higher lambda_rate_alpha is, the sparser are the resulting exposures, but too large values may result in a reduced fit of the observed counts.
value of LASSO to be used for beta between 0 and 1. This value should be greater than 0. 1 is the value of LASSO that would shrink all the signatures to 0 within one step. The higher lambda_rate_beta is, the sparser are the resulting exposures, but too large values may result in a reduced fit of the observed counts.
Percentage of cells in the count matrix to be replaced by 0s during cross validation.
For each configuration, the first time the signatures are discovered form a matrix with a percentage of values replaced by 0s. This may result in poor fit/results. Then, we perform predictions of these entries and replace them with such predicted values. This parameter is the number of restarts to be performed to improve this estimate and obtain more stable solutions.
Number of time cross-validation should be repeated. Higher values result in better estimate, but are computationally more expensive.
Number of iterations to be performed. Each iteration corresponds to a first step where beta is fitted and a second step where alpha is fitted.
Number of maximum iterations to be performed during the sparsification via Lasso.
Number of processes to be used during parallel execution. To execute in single process mode, this parameter needs to be set to either NA or NULL.
Seed for reproducibility.
boolean; Shall I print all messages?
log file where to print outputs when using parallel. If parallel execution is disabled, this parameter is ignored.
A list of 2 elements: grid_search_mse and and grid_search_loglik. Here, grid_search_mse reports the mean squared error for each configuration of performed cross validation; grid_search_loglik reports for each configuration the number of times the algorithm converged.
data(background)
data(patients)
res = nmfLassoCV(x=patients[1:100,],
K=3:5,
background_signature=background,
nmf_runs=1,
lambda_values_alpha=c(0.00),
lambda_values_beta=c(0.00),
cross_validation_repetitions=2,
num_processes=NA,
seed=12345)
#> Performing a grid search to estimate the best values of K and lambda with a total of 2 cross validation repetitions...
#> Computing the initial values of beta by standard NMF for K equals to 3...
#> Computing the initial values of beta by standard NMF for K equals to 4...
#> Computing the initial values of beta by standard NMF for K equals to 5...
#> Starting cross validation with a total of 2 repetitions...
#> Performing repetition 1 out of 2...
#> Performing cross validation iteration 1 out of 5...
#> Performing cross validation iteration 2 out of 5...
#> Performing cross validation iteration 3 out of 5...
#> Performing cross validation iteration 4 out of 5...
#> Performing cross validation iteration 5 out of 5...
#> Progress 33.333%...
#> Performing cross validation iteration 1 out of 5...
#> Performing cross validation iteration 2 out of 5...
#> Performing cross validation iteration 3 out of 5...
#> Performing cross validation iteration 4 out of 5...
#> Performing cross validation iteration 5 out of 5...
#> Progress 66.667%...
#> Performing cross validation iteration 1 out of 5...
#> Performing cross validation iteration 2 out of 5...
#> Performing cross validation iteration 3 out of 5...
#> Performing cross validation iteration 4 out of 5...
#> Performing cross validation iteration 5 out of 5...
#> Progress 100%...
#> Performing repetition 2 out of 2...
#> Performing cross validation iteration 1 out of 5...
#> Performing cross validation iteration 2 out of 5...
#> Performing cross validation iteration 3 out of 5...
#> Performing cross validation iteration 4 out of 5...
#> Performing cross validation iteration 5 out of 5...
#> Progress 33.333%...
#> Performing cross validation iteration 1 out of 5...
#> Performing cross validation iteration 2 out of 5...
#> Performing cross validation iteration 3 out of 5...
#> Performing cross validation iteration 4 out of 5...
#> Performing cross validation iteration 5 out of 5...
#> Progress 66.667%...
#> Performing cross validation iteration 1 out of 5...
#> Performing cross validation iteration 2 out of 5...
#> Performing cross validation iteration 3 out of 5...
#> Performing cross validation iteration 4 out of 5...
#> Performing cross validation iteration 5 out of 5...
#> Progress 100%...