Perform signatures discovery and rank estimation for a range of K somatic mutational signatures given a set of observed counts x. This function can be used to estimate different types of mutational signatures such as: SBS (single base substitutions) and MNV (multi-nucleotide variant) (see Degasperi, Andrea, et al. 'Substitution mutational signatures in whole-genome–sequenced cancers in the UK population.' Science 376.6591 (2022): abl9283), CX (chromosomal instability) (see Drews, Ruben M., et al. 'A pan-cancer compendium of chromosomal instability.' Nature 606.7916 (2022): 976-983) and CN (copy number) signatures (see Steele, Christopher D., et al. 'Signatures of copy number alterations in human cancer.' Nature 606.7916 (2022): 984-991).
signaturesDecomposition(
x,
K,
background_signature = NULL,
normalize_counts = FALSE,
nmf_runs = 100,
num_processes = Inf,
verbose = TRUE
)
Counts matrix for a set of n patients and m categories. These can be, e.g., SBS, MNV, CN or CN counts; in the case of SBS it would be an n patients x 96 trinucleotides matrix.
Either one value or a range of numeric values (each of them greater than 0) indicating the number of signatures to be considered.
Background signature to be used.
If true, the input counts matrix x is normalized such that the patients have the same number of mutation.
Number of iteration (minimum 1) of NMF to be performed for a robust estimation of beta.
Number of processes to be used during parallel execution. To execute in single process mode, this parameter needs to be set to either NA or NULL.
Boolean. Shall I print information messages?
A list with the discovered signatures and related rank measures. It includes 5 elements: alpha: list of matrices of the discovered exposure values for each possible rank in the range K. beta: list of matrices of the discovered signatures for each possible rank in the range K. unexplained_mutations: number of unexplained mutations per sample. cosine_similarity: cosine similarity comparing input data x and predictions for each rank in the range K. measures: a data.frame containing the quality measures for each possible rank in the range K.
data(background)
data(patients)
set.seed(12345)
res <- signaturesDecomposition(x = patients[seq_len(3),seq_len(2)],
K = 3:4,
background_signature = background[seq_len(2)],
nmf_runs = 2,
num_processes = 1)
#> Performing signatures discovery and rank estimation...
#> Performing inference for K=3...
#> Performing NMF run 1 out of 2...
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Performing NMF run 2 out of 2...
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Performing inference for K=4...
#> Performing NMF run 1 out of 2...
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Performing NMF run 2 out of 2...
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold