Perform the assessment of different signaturesDecomposition solutions by cross-validation for K (beta, as estimated by signaturesDecomposition) somatic mutational signatures given a set of observations x and discovered signatures beta.

signaturesCV(
  x,
  beta,
  normalize_counts = FALSE,
  cross_validation_entries = 0.01,
  cross_validation_iterations = 5,
  cross_validation_repetitions = 100,
  num_processes = Inf,
  verbose = TRUE
)

Arguments

x

Counts matrix for a set of n patients and m categories. These can be, e.g., SBS, MNV, CN or CN counts; in the case of SBS it would be an n patients x 96 trinucleotides matrix.

beta

A set of inferred signatures as returned by signaturesDecomposition function.

normalize_counts

If true, the input counts matrix x is normalized such that the patients have the same number of mutation.

cross_validation_entries

Percentage of cells in the counts matrix to be replaced by 0s during cross-validation.

cross_validation_iterations

For each configuration, the first time the signatures are fitted form a matrix with a percentage of values replaced by 0s. This may result in poor fit/results. Then, we perform predictions of these entries and replace them with such predicted values. This parameter is the number of restarts to be performed to improve this estimate and obtain more stable solutions.

cross_validation_repetitions

Number of time cross-validation should be repeated. Higher values result in better estimate, but are computationally more expensive.

num_processes

Number of processes to be used during parallel execution. To execute in single process mode, this parameter needs to be set to either NA or NULL.

verbose

Boolean. Shall I print information messages?

Value

A list of 2 elements: estimates and summary. Here, cv_estimates reports the mean squared error for each configuration of performed cross-validation; rank_estimates reports mean and median values for each value of K.

Examples

data(background)
data(patients)
set.seed(12345)
sigs <- signaturesDecomposition(x = patients[seq_len(3),seq_len(2)],
                                K = 3:4,
                                background_signature = background[seq_len(2)],
                                nmf_runs = 2,
                                num_processes = 1)
#> Performing signatures discovery and rank estimation...
#> Performing inference for K=3...
#> Performing NMF run 1 out of 2...
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Performing NMF run 2 out of 2...
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Performing inference for K=4...
#> Performing NMF run 1 out of 2...
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Performing NMF run 2 out of 2...
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
set.seed(12345)
res <- signaturesCV(x = patients[seq_len(3),seq_len(2)],
                    beta = sigs$beta,
                    cross_validation_iterations = 2,
                    cross_validation_repetitions = 2,
                    num_processes = 1)
#> Estimating the optimal number of signatures with a total of 2 cross-validation repetitions...
#> Performing repetition 1 out of 2...
#> Performing estimation for K=3...
#> Performing cross-validation iteration 1 out of 2...
#> Performing cross-validation iteration 2 out of 2...
#> Progress 50%...
#> Performing estimation for K=4...
#> Performing cross-validation iteration 1 out of 2...
#> Performing cross-validation iteration 2 out of 2...
#> Progress 100%...
#> Performing repetition 2 out of 2...
#> Performing estimation for K=3...
#> Performing cross-validation iteration 1 out of 2...
#> Performing cross-validation iteration 2 out of 2...
#> Progress 50%...
#> Performing estimation for K=4...
#> Performing cross-validation iteration 1 out of 2...
#> Performing cross-validation iteration 2 out of 2...
#> Progress 100%...