Perform signatures discovery and rank estimation for a range of K somatic mutational signatures given a set of observed counts x. This function can be used to estimate different types of mutational signatures such as: SBS (single base substitutions) and MNV (multi-nucleotide variant) (see Degasperi, Andrea, et al. 'Substitution mutational signatures in whole-genome–sequenced cancers in the UK population.' Science 376.6591 (2022): abl9283), CX (chromosomal instability) (see Drews, Ruben M., et al. 'A pan-cancer compendium of chromosomal instability.' Nature 606.7916 (2022): 976-983) and CN (copy number) signatures (see Steele, Christopher D., et al. 'Signatures of copy number alterations in human cancer.' Nature 606.7916 (2022): 984-991).

signaturesDecomposition(
  x,
  K,
  background_signature = NULL,
  normalize_counts = FALSE,
  nmf_runs = 100,
  num_processes = Inf,
  verbose = TRUE
)

Arguments

x

Counts matrix for a set of n patients and m categories. These can be, e.g., SBS, MNV, CN or CN counts; in the case of SBS it would be an n patients x 96 trinucleotides matrix.

K

Either one value or a range of numeric values (each of them greater than 0) indicating the number of signatures to be considered.

background_signature

Background signature to be used.

normalize_counts

If true, the input counts matrix x is normalized such that the patients have the same number of mutation.

nmf_runs

Number of iteration (minimum 1) of NMF to be performed for a robust estimation of beta.

num_processes

Number of processes to be used during parallel execution. To execute in single process mode, this parameter needs to be set to either NA or NULL.

verbose

Boolean. Shall I print information messages?

Value

A list with the discovered signatures and related rank measures. It includes 5 elements: alpha: list of matrices of the discovered exposure values for each possible rank in the range K. beta: list of matrices of the discovered signatures for each possible rank in the range K. unexplained_mutations: number of unexplained mutations per sample. cosine_similarity: cosine similarity comparing input data x and predictions for each rank in the range K. measures: a data.frame containing the quality measures for each possible rank in the range K.

Examples

data(background)
data(patients)
set.seed(12345)
res <- signaturesDecomposition(x = patients[seq_len(3),seq_len(2)],
                               K = 3:4,
                               background_signature = background[seq_len(2)],
                               nmf_runs = 2,
                               num_processes = 1)
#> Performing signatures discovery and rank estimation...
#> Performing inference for K=3...
#> Performing NMF run 1 out of 2...
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Performing NMF run 2 out of 2...
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Performing inference for K=4...
#> Performing NMF run 1 out of 2...
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Performing NMF run 2 out of 2...
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold