Title: | Nuclear Penalized Multinomial Regression |
---|---|
Description: | Fit multinomial logistic regression with a penalty on the nuclear norm of the estimated regression coefficient matrix, using proximal gradient descent. |
Authors: | Scott Powers, Trevor Hastie, Robert Tibshirani |
Maintainer: | Scott Powers <[email protected]> |
License: | GPL-2 |
Version: | 1.3.1 |
Built: | 2024-11-05 02:55:53 UTC |
Source: | https://github.com/cran/npmr |
As an alternative to an l1- or l2-penalty on multinomial logistic regression, this package fits multinomial regression with a penalty on the nuclear norm of the fitted regression coefficient matrix. The result is often a matrix of reduced rank, leveraging structure among the response classes so that the likelihood of one class informs the likelihood of other classes. Proximal gradient descent is used to solve the NPMR optimization problem.
The primary functions in the package are npmr
, which solves
nuclear penalized multinomial regression for a sequence of input values for the
regularization parameter lambda, and cv.npmr
, which chooses the
optimal value of the regularization parameter lambda via cross validation. Both
npmr
and cv.npmr
have predict and plot methods.
Scott Powers, Trevor Hastie, Rob Tibshirani
Maintainer: Scott Powers <[email protected]>
Scott Powers, Trevor Hastie and Rob Tibshirani (2016). “Nuclear penalized multinomial regression with an application to predicting at bat outcomes in baseball.” In prep.
# Fit NPMR to simulated data K = 5 n = 1000 m = 10000 p = 10 r = 2 # Simulated training data set.seed(8369) A = matrix(rnorm(p*r), p, r) C = matrix(rnorm(K*r), K, r) B = tcrossprod(A, C) # low-rank coefficient matrix X = matrix(rnorm(n*p), n, p) # covariate matrix with iid Gaussian entries eta = X P = exp(eta)/rowSums(exp(eta)) Y = t(apply(P, 1, rmultinom, n = 1, size = 1)) # Simulate test data Xtest = matrix(rnorm(m*p), m, p) etatest = Xtest Ptest = exp(etatest)/rowSums(exp(etatest)) Ytest = t(apply(Ptest, 1, rmultinom, n = 1, size = 1)) # Fit NPMR for a sequence of lambda values without CV: fit2 = npmr(X, Y, lambda = exp(seq(7, -2))) # Print the NPMR fit: fit2 # Produce a biplot: plot(fit2, lambda = 20) # Compute mean test error using the predict function (for each value of lambda): getloss = function(pred, Y) { -mean(log(rowSums(Y*pred))) } apply(predict(fit2, Xtest), 3, getloss, Ytest)
# Fit NPMR to simulated data K = 5 n = 1000 m = 10000 p = 10 r = 2 # Simulated training data set.seed(8369) A = matrix(rnorm(p*r), p, r) C = matrix(rnorm(K*r), K, r) B = tcrossprod(A, C) # low-rank coefficient matrix X = matrix(rnorm(n*p), n, p) # covariate matrix with iid Gaussian entries eta = X P = exp(eta)/rowSums(exp(eta)) Y = t(apply(P, 1, rmultinom, n = 1, size = 1)) # Simulate test data Xtest = matrix(rnorm(m*p), m, p) etatest = Xtest Ptest = exp(etatest)/rowSums(exp(etatest)) Ytest = t(apply(Ptest, 1, rmultinom, n = 1, size = 1)) # Fit NPMR for a sequence of lambda values without CV: fit2 = npmr(X, Y, lambda = exp(seq(7, -2))) # Print the NPMR fit: fit2 # Produce a biplot: plot(fit2, lambda = 20) # Compute mean test error using the predict function (for each value of lambda): getloss = function(pred, Y) { -mean(log(rowSums(Y*pred))) } apply(predict(fit2, Xtest), 3, getloss, Ytest)
Divide the training data into folds. Hold out each fold and fit NPMR for a range of regularization values on the remaining data, testing the result on the held-out fold. After the optimal value of the regularization parameter is determined, fit NPMR with this tuning parameter to the whole training set.
cv.npmr(X, Y, lambda = exp(seq(7, -2)), s = 0.1/max(X), eps = 1e-06, group = NULL, accelerated = TRUE, B.init = NULL, b.init = NULL, foldid = NULL, nfolds = 10)
cv.npmr(X, Y, lambda = exp(seq(7, -2)), s = 0.1/max(X), eps = 1e-06, group = NULL, accelerated = TRUE, B.init = NULL, b.init = NULL, foldid = NULL, nfolds = 10)
X |
Covariate matrix. May be in sparse form from |
Y |
Multinomial reponse. May be (1) a vector of length equal to nrow(X), which will be interpreted as a factor, with levels representing response classes; or (2) a matrix with nrow(Y) = nrow(X), and each row has exactly one 1 representing the response class for that observation, with the remaining entries of the row being zero. |
lambda |
Vector of regularization parameter values for penalizing nuclear norm. Default is a wide range of values. We suggest that the user choose this sequence via trial and error. If the model takes too long to fit, try larger values of lambda. If cross validation error is strictly increasing or strictly decreasing over the range of lambda specified, try extending the range in the direction of the smallest cross validation error. |
s |
Step size for proximal gradient descent |
eps |
Convergence threshold. When relative change in the objective function after an interation drops below this threshold, algorithm halts. |
group |
Vector of length equal to number of variables (ncol(X) and nrow(B)). Variables in the same group indexed by a POSITIVE integer will be penalized together (the nuclear norm of the sub-matrix of the regression coefficients will be penalized). Variables without positive integers will NOT be penalized. Default is NULL, which means there are no sub-groups; nuclear norm of entire coefficient matrix is penalized. |
accelerated |
Logical. Should accelerated proximal gradient descent be used? Default is TRUE. |
B.init |
Initial value of the regression coefficient matrix for proximal gradient descent |
b.init |
Initial value of the regression intercept vector for proximal gradient descent |
foldid |
Vector of length equal to nrow(X). Specifies folds for cross validation. |
nfolds |
Number of folds for cross validation. Ignored if |
An object of class “cv.npmr” with values:
call |
the call that produced this object |
error |
A vector of total cross validation error for each value of |
fit |
An object of class |
lambda.min |
The value of lambda with minimum cross validation error |
lambda |
The input sequence of regularization parameter values for which cross validation error was calculated |
n |
number of rows in the input covariate matrix |
Scott Powers, Trevor Hastie, Rob Tibshirani
Scott Powers, Trevor Hastie and Rob Tibshirani (2016). “Nuclear penalized multinomial regression with an application to predicting at bat outcomes in baseball.” In prep.
npmr
, predict.cv.npmr
, print.cv.npmr
,
plot.cv.npmr
# Fit NPMR to simulated data K = 5 n = 1000 m = 10000 p = 10 r = 2 # Simulated training data set.seed(8369) A = matrix(rnorm(p*r), p, r) C = matrix(rnorm(K*r), K, r) B = tcrossprod(A, C) # low-rank coefficient matrix X = matrix(rnorm(n*p), n, p) # covariate matrix with iid Gaussian entries eta = X P = exp(eta)/rowSums(exp(eta)) Y = t(apply(P, 1, rmultinom, n = 1, size = 1)) fold = sample(rep(1:10, length = nrow(X))) # Simulate test data Xtest = matrix(rnorm(m*p), m, p) etatest = Xtest Ptest = exp(etatest)/rowSums(exp(etatest)) Ytest = t(apply(Ptest, 1, rmultinom, n = 1, size = 1)) # Fit NPMR for a sequence of lambda values without CV: fit2 = cv.npmr(X, Y, lambda = exp(seq(7, -2)), foldid = fold) # Print the NPMR fit: fit2 # Produce a biplot: plot(fit2) # Compute mean test error using the predict function: -mean(log(rowSums(Ytest*predict(fit2, Xtest))))
# Fit NPMR to simulated data K = 5 n = 1000 m = 10000 p = 10 r = 2 # Simulated training data set.seed(8369) A = matrix(rnorm(p*r), p, r) C = matrix(rnorm(K*r), K, r) B = tcrossprod(A, C) # low-rank coefficient matrix X = matrix(rnorm(n*p), n, p) # covariate matrix with iid Gaussian entries eta = X P = exp(eta)/rowSums(exp(eta)) Y = t(apply(P, 1, rmultinom, n = 1, size = 1)) fold = sample(rep(1:10, length = nrow(X))) # Simulate test data Xtest = matrix(rnorm(m*p), m, p) etatest = Xtest Ptest = exp(etatest)/rowSums(exp(etatest)) Ytest = t(apply(Ptest, 1, rmultinom, n = 1, size = 1)) # Fit NPMR for a sequence of lambda values without CV: fit2 = cv.npmr(X, Y, lambda = exp(seq(7, -2)), foldid = fold) # Print the NPMR fit: fit2 # Produce a biplot: plot(fit2) # Compute mean test error using the predict function: -mean(log(rowSums(Ytest*predict(fit2, Xtest))))
Computes the log-likelihood of the fitted regression parameters given the data observed. Intended for internal use only.
logL(B, b, X, Y)
logL(B, b, X, Y)
B |
Regression coefficient matrix |
b |
Regression intercept vector |
X |
Covariate matrix |
Y |
Multinomial response matrix |
The log-likelihood of B and b given X and Y
Scott Powers, Trevor Hastie, Rob Tibshirani
Fit a multinomial logistic regression model for a sequence of regularization parameters penalizing the nuclear norm of the regression coefficient matrix.
npmr(X, Y, lambda = exp(seq(7, -2)), s = 0.1/max(X), eps = 1e-06, group = NULL, accelerated = TRUE, B.init = NULL, b.init = NULL, quiet = TRUE)
npmr(X, Y, lambda = exp(seq(7, -2)), s = 0.1/max(X), eps = 1e-06, group = NULL, accelerated = TRUE, B.init = NULL, b.init = NULL, quiet = TRUE)
X |
Covariate matrix. May be in sparse form from |
Y |
Multinomial reponse. May be (1) a vector of length equal to nrow(X), which will be interpreted as a factor, with levels representing response classes; or (2) a matrix with nrow(Y) = nrow(X), and each row has exactly one 1 representing the response class for that observation, with the remaining entries of the row being zero. |
lambda |
Vector of regularization parameter values for penalizing nuclear norm. Default is a wide range of values. We suggest that the user choose this sequence via trial and error. If the model takes too long to fit, try larger values of lambda. |
s |
Step size for proximal gradient descent |
eps |
Convergence threshold. When relative change in the objective function after an interation drops below this threshold, algorithm halts. |
group |
Vector of length equal to number of variables (ncol(X) and nrow(B)). Variables in the same group indexed by a POSITIVE integer will be penalized together (the nuclear norm of the sub-matrix of the regression coefficients will be penalized). Variables without positive integers will NOT be penalized. Default is NULL, which means there are no sub-groups; nuclear norm of entire coefficient matrix is penalized. |
accelerated |
Logical. Should accelerated proximal gradient descent be used? Default is TRUE. |
B.init |
Initial value of the regression coefficient matrix for proximal gradient descent |
b.init |
Initial value of the regression intercept vector for proximal gradient descent |
quiet |
Logical. Should output be silenced? If not, print the value of the objective function after each step of proximal gradient descent. Perhaps useful for debugging. Default is TRUE. |
In multinomial regression (in contrast with Gaussian regression or logistic regression) there is a matrix of regression coefficients, not just a vector. NPMR fits a logistic multinomial regression with a penalty on the nuclear norm of this regression coefficient matrix B. Specifically, the objective is
-loglik(B,b|X,Y) + lambda*||B||_*
where ||.||_* denotes the nuclear norm. This implementation solves the problem using proximal gradient descent, which iteratively steps in the direction of the negative gradient of the loss function and soft-thresholds the singular values of the result.
This function makes available the option, through the groups argument, of dividing the regression coefficient matrix into sub-matrices (by row) and penalizing the sum of the nuclear norms of these submatrices. Rows (correspond to variables) can be given no penalty in this way.
An object of class “npmr” with values:
call |
the call that produced this object |
B |
A 3-dimensional array, with dimensions
( |
b |
A matrix with |
objective |
A vector of length equal to the length of |
lambda |
The input sequence of values for the regularization parameter, for each of which NPMR has been solved. |
Scott Powers, Trevor Hastie, Rob Tibshirani
Scott Powers, Trevor Hastie and Rob Tibshirani (2016). “Nuclear penalized multinomial regression with an application to predicting at bat outcomes in baseball.” In prep.
cv.npmr
, predict.npmr
, print.npmr
,
plot.npmr
# Fit NPMR to simulated data K = 5 n = 1000 m = 10000 p = 10 r = 2 # Simulated training data set.seed(8369) A = matrix(rnorm(p*r), p, r) C = matrix(rnorm(K*r), K, r) B = tcrossprod(A, C) # low-rank coefficient matrix X = matrix(rnorm(n*p), n, p) # covariate matrix with iid Gaussian entries eta = X P = exp(eta)/rowSums(exp(eta)) Y = t(apply(P, 1, rmultinom, n = 1, size = 1)) # Simulate test data Xtest = matrix(rnorm(m*p), m, p) etatest = Xtest Ptest = exp(etatest)/rowSums(exp(etatest)) Ytest = t(apply(Ptest, 1, rmultinom, n = 1, size = 1)) # Fit NPMR for a sequence of lambda values without CV: fit2 = npmr(X, Y, lambda = exp(seq(7, -2))) # Print the NPMR fit: fit2 # Produce a biplot: plot(fit2, lambda = 20) # Compute mean test error using the predict function (for each value of lambda): getloss = function(pred, Y) { -mean(log(rowSums(Y*pred))) } apply(predict(fit2, Xtest), 3, getloss, Ytest)
# Fit NPMR to simulated data K = 5 n = 1000 m = 10000 p = 10 r = 2 # Simulated training data set.seed(8369) A = matrix(rnorm(p*r), p, r) C = matrix(rnorm(K*r), K, r) B = tcrossprod(A, C) # low-rank coefficient matrix X = matrix(rnorm(n*p), n, p) # covariate matrix with iid Gaussian entries eta = X P = exp(eta)/rowSums(exp(eta)) Y = t(apply(P, 1, rmultinom, n = 1, size = 1)) # Simulate test data Xtest = matrix(rnorm(m*p), m, p) etatest = Xtest Ptest = exp(etatest)/rowSums(exp(etatest)) Ytest = t(apply(Ptest, 1, rmultinom, n = 1, size = 1)) # Fit NPMR for a sequence of lambda values without CV: fit2 = npmr(X, Y, lambda = exp(seq(7, -2))) # Print the NPMR fit: fit2 # Produce a biplot: plot(fit2, lambda = 20) # Compute mean test error using the predict function (for each value of lambda): getloss = function(pred, Y) { -mean(log(rowSums(Y*pred))) } apply(predict(fit2, Xtest), 3, getloss, Ytest)
Returns the nuclear norm of a matrix, which is the sum of its singular values, obtained through a singular value decomposition. Intended for internal use only.
nuclear(B)
nuclear(B)
B |
a matrix |
the nuclear norm of the matrix
Scott Powers, Trevor Hastie, Rob Tibshirani
logL
, objective
, objectiveFast
Return the objective function of the data and the fitted parameters for nuclear penalized multinomial regression. The objective is the sum of the negative log-likelihood and the product of the regularization parameter and nuclear norm of the fitted regression coefficient matrix. Intended for internal use only.
objective(B, b, X, Y, lambda)
objective(B, b, X, Y, lambda)
B |
fitted regression coefficient matrix |
b |
fitted regression intercept vector |
X |
covariate matrix |
Y |
multinomial response matrix |
lambda |
regularization parameter (maybe be a vector of values) |
a vector of objective values for the NPMR optimization problem, one for each
value of lambda
Scott Powers, Trevor Hastie, Rob Tibshirani
Scott Powers, Trevor Hastie and Rob Tibshirani (2016). “Nuclear penalized multinomial regression with an application to predicting at bat outcomes in baseball.” In prep.
Computes the objective function for NPMR more quickly than objective
by
leveraging pre-computed fitted values, which is the bottleneck in
computing the objective. Intended for internal use only.
objectiveFast(B, P, W, lambda)
objectiveFast(B, P, W, lambda)
B |
fitted regression coefficient matrix |
P |
matrix of fitted multinomial class probabilities |
W |
vector containing indices of |
lambda |
regularization parameter (maybe be a vector of values) |
a vector of objective values for the NPMR optimization problem, one for each
value of lambda
Scott Powers, Trevor Hastie, Rob Tibshirani
Scott Powers, Trevor Hastie and Rob Tibshirani (2016). “Nuclear penalized multinomial regression with an application to predicting at bat outcomes in baseball.” In prep.
Iterates steps of proximal gradient descent until convergence, by repeatedly taking steps in the direction of the negative of the gradient and soft-thresholding the singular values of the result. Intended for internal use only.
PGDnpmr(B, b, X, Y, lambda, s, group = NULL, accelerated = TRUE, eps = 1e-07, maxit = 1e+05, quiet = TRUE)
PGDnpmr(B, b, X, Y, lambda, s, group = NULL, accelerated = TRUE, eps = 1e-07, maxit = 1e+05, quiet = TRUE)
B |
Initial regression coefficient matrix |
b |
Initial intercept vector |
X |
Covariate matrix. May be in sparse form from |
Y |
Response matrix. Each row has exactly one 1 indicating response category for that observation. All other entries are zero. |
lambda |
Vector of regularization parameter values for penalizing nuclear norm |
s |
Step size for proximal gradient descent |
group |
Vector of length equal to number of variables (ncol(X) and nrow(B)). Variables in the same group indexed by a POSITIVE integer will be penalized together (the nuclear norm of the sub-matrix of the regression coefficients will be penalized). Variables without positive integers will NOT be penalized. Default is NULL, which means there are no sub-groups; nuclear norm of entire coefficient matrix is penalized. |
accelerated |
Logical. Should accelerated proximal gradient descent be used? Default is TRUE. |
eps |
Convergence threshold. When relative change in the objective function after an interation drops below this threshold, algorithm halts. |
maxit |
Maximum number of iterations for proximal gradient descent. |
quiet |
Logical. Should output be silenced? If not, print the value of the objective function after each step of proximal gradient descent. Perhaps useful for debugging. Default is TRUE. |
B |
Optimal value of the regression coeffient matrix at convergence |
b |
Optimal value of the regression intercept vector at convergence |
objectivePath |
Vector showing the value of the objective function at each step in proximal gradient descent |
time |
Time taken until convergence |
Scott Powers, Trevor Hastie, Rob Tibshirani
Scott Powers, Trevor Hastie and Rob Tibshirani (2016). “Nuclear penalized multinomial regression with an application to predicting at bat outcomes in baseball.” In prep.
npmr
, prox
, objective
,
objectiveFast
Plots features (in orange) by their weights on the first two latent variables in the singular value decomposition of the regression coefficient matrix. Plots response classes (as blue arrows) by their loadings on the first two latent variables. Does this for the regression coefficient matrix fit with the value of lambda that led to the minimum cross validation error among all those tried.
## S3 method for class 'cv.npmr' plot(x, feature.names = TRUE, ...)
## S3 method for class 'cv.npmr' plot(x, feature.names = TRUE, ...)
x |
an object of class |
feature.names |
logical. Should the names of the covariates be used in the plot? If FALSE,
use standard plotting symbol ( |
... |
additional arguments to be passed to |
Scott Powers, Trevor Hastie, Rob Tibshirani
Scott Powers, Trevor Hastie and Rob Tibshirani (2016). “Nuclear penalized multinomial regression with an application to predicting at bat outcomes in baseball.” In prep.
# Fit NPMR to simulated data K = 5 n = 1000 m = 10000 p = 10 r = 2 # Simulated training data set.seed(8369) A = matrix(rnorm(p*r), p, r) C = matrix(rnorm(K*r), K, r) B = tcrossprod(A, C) # low-rank coefficient matrix X = matrix(rnorm(n*p), n, p) # covariate matrix with iid Gaussian entries eta = X P = exp(eta)/rowSums(exp(eta)) Y = t(apply(P, 1, rmultinom, n = 1, size = 1)) fold = sample(rep(1:10, length = nrow(X))) # Simulate test data Xtest = matrix(rnorm(m*p), m, p) etatest = Xtest Ptest = exp(etatest)/rowSums(exp(etatest)) Ytest = t(apply(Ptest, 1, rmultinom, n = 1, size = 1)) # Fit NPMR for a sequence of lambda values without CV: fit2 = cv.npmr(X, Y, lambda = exp(seq(7, -2)), foldid = fold) # Produce a biplot: plot(fit2)
# Fit NPMR to simulated data K = 5 n = 1000 m = 10000 p = 10 r = 2 # Simulated training data set.seed(8369) A = matrix(rnorm(p*r), p, r) C = matrix(rnorm(K*r), K, r) B = tcrossprod(A, C) # low-rank coefficient matrix X = matrix(rnorm(n*p), n, p) # covariate matrix with iid Gaussian entries eta = X P = exp(eta)/rowSums(exp(eta)) Y = t(apply(P, 1, rmultinom, n = 1, size = 1)) fold = sample(rep(1:10, length = nrow(X))) # Simulate test data Xtest = matrix(rnorm(m*p), m, p) etatest = Xtest Ptest = exp(etatest)/rowSums(exp(etatest)) Ytest = t(apply(Ptest, 1, rmultinom, n = 1, size = 1)) # Fit NPMR for a sequence of lambda values without CV: fit2 = cv.npmr(X, Y, lambda = exp(seq(7, -2)), foldid = fold) # Produce a biplot: plot(fit2)
Plots features (in orange) by their weights on the first two latent variables
in the singular value decomposition of the regression coefficient matrix. Plots
response classes (as blue arrows) by their loadings on the first two latent
variables. Does this for the regression coefficient matrix fit with the
value of lambda closest among all those tried to the value of lambda
specified.
## S3 method for class 'npmr' plot(x, lambda, feature.names = TRUE, ...)
## S3 method for class 'npmr' plot(x, lambda, feature.names = TRUE, ...)
x |
an object of class |
lambda |
a single regularization parameter value |
feature.names |
logical. Should the names of the covariates be used in the plot? If FALSE,
use standard plotting symbol ( |
... |
additional arguments to be passed to |
Scott Powers, Trevor Hastie, Rob Tibshirani
Scott Powers, Trevor Hastie and Rob Tibshirani (2016). “Nuclear penalized multinomial regression with an application to predicting at bat outcomes in baseball.” In prep.
# Fit NPMR to simulated data K = 5 n = 1000 m = 10000 p = 10 r = 2 # Simulated training data set.seed(8369) A = matrix(rnorm(p*r), p, r) C = matrix(rnorm(K*r), K, r) B = tcrossprod(A, C) # low-rank coefficient matrix X = matrix(rnorm(n*p), n, p) # covariate matrix with iid Gaussian entries eta = X P = exp(eta)/rowSums(exp(eta)) Y = t(apply(P, 1, rmultinom, n = 1, size = 1)) # Simulate test data Xtest = matrix(rnorm(m*p), m, p) etatest = Xtest Ptest = exp(etatest)/rowSums(exp(etatest)) Ytest = t(apply(Ptest, 1, rmultinom, n = 1, size = 1)) # Fit NPMR for a sequence of lambda values without CV: fit2 = npmr(X, Y, lambda = exp(seq(7, -2))) # Produce a biplot: plot(fit2, lambda = 20)
# Fit NPMR to simulated data K = 5 n = 1000 m = 10000 p = 10 r = 2 # Simulated training data set.seed(8369) A = matrix(rnorm(p*r), p, r) C = matrix(rnorm(K*r), K, r) B = tcrossprod(A, C) # low-rank coefficient matrix X = matrix(rnorm(n*p), n, p) # covariate matrix with iid Gaussian entries eta = X P = exp(eta)/rowSums(exp(eta)) Y = t(apply(P, 1, rmultinom, n = 1, size = 1)) # Simulate test data Xtest = matrix(rnorm(m*p), m, p) etatest = Xtest Ptest = exp(etatest)/rowSums(exp(etatest)) Ytest = t(apply(Ptest, 1, rmultinom, n = 1, size = 1)) # Fit NPMR for a sequence of lambda values without CV: fit2 = npmr(X, Y, lambda = exp(seq(7, -2))) # Produce a biplot: plot(fit2, lambda = 20)
Return predicted reponse class probabilities from a cross-validated NPMR model, using the value of the regularization parameter that led to the minimum cross validation error
## S3 method for class 'cv.npmr' predict(object, newx, ...)
## S3 method for class 'cv.npmr' predict(object, newx, ...)
object |
an object of class |
newx |
covariate matrix on which for which to make response class probability
predictions. Must have same number of columns as |
... |
ignored |
a matrix giving the predicted probability that each row of newx
belongs
to each class, corresponding the value of the regularization parameter that led
to minimum cross validation error.
Scott Powers, Trevor Hastie, Rob Tibshirani
Scott Powers, Trevor Hastie and Rob Tibshirani (2016). “Nuclear penalized multinomial regression with an application to predicting at bat outcomes in baseball.” In prep.
# Fit NPMR to simulated data K = 5 n = 1000 m = 10000 p = 10 r = 2 # Simulated training data set.seed(8369) A = matrix(rnorm(p*r), p, r) C = matrix(rnorm(K*r), K, r) B = tcrossprod(A, C) # low-rank coefficient matrix X = matrix(rnorm(n*p), n, p) # covariate matrix with iid Gaussian entries eta = X P = exp(eta)/rowSums(exp(eta)) Y = t(apply(P, 1, rmultinom, n = 1, size = 1)) fold = sample(rep(1:10, length = nrow(X))) # Simulate test data Xtest = matrix(rnorm(m*p), m, p) etatest = Xtest Ptest = exp(etatest)/rowSums(exp(etatest)) Ytest = t(apply(Ptest, 1, rmultinom, n = 1, size = 1)) # Fit NPMR for a sequence of lambda values without CV: fit2 = cv.npmr(X, Y, lambda = exp(seq(7, -2)), foldid = fold) # Compute mean test error using the predict function: -mean(log(rowSums(Ytest*predict(fit2, Xtest))))
# Fit NPMR to simulated data K = 5 n = 1000 m = 10000 p = 10 r = 2 # Simulated training data set.seed(8369) A = matrix(rnorm(p*r), p, r) C = matrix(rnorm(K*r), K, r) B = tcrossprod(A, C) # low-rank coefficient matrix X = matrix(rnorm(n*p), n, p) # covariate matrix with iid Gaussian entries eta = X P = exp(eta)/rowSums(exp(eta)) Y = t(apply(P, 1, rmultinom, n = 1, size = 1)) fold = sample(rep(1:10, length = nrow(X))) # Simulate test data Xtest = matrix(rnorm(m*p), m, p) etatest = Xtest Ptest = exp(etatest)/rowSums(exp(etatest)) Ytest = t(apply(Ptest, 1, rmultinom, n = 1, size = 1)) # Fit NPMR for a sequence of lambda values without CV: fit2 = cv.npmr(X, Y, lambda = exp(seq(7, -2)), foldid = fold) # Compute mean test error using the predict function: -mean(log(rowSums(Ytest*predict(fit2, Xtest))))
Return predicted reponse class probabilities from a fitted NPMR model, for each value of lambda on which the NPMR model was originally fit.
## S3 method for class 'npmr' predict(object, newx, ...)
## S3 method for class 'npmr' predict(object, newx, ...)
object |
an object of class |
newx |
covariate matrix on which for which to make response class probability
predictions. Must have same number of columns as |
... |
ignored |
a 3-dimensional array, with dimensions
(nrow(newx), ncol(Y), length(lambda)
).
For each lambda, this array stores for that value of lambda the predicted
response class probabilites for each observation.
Scott Powers, Trevor Hastie, Rob Tibshirani
Scott Powers, Trevor Hastie and Rob Tibshirani (2016). “Nuclear penalized multinomial regression with an application to predicting at bat outcomes in baseball.” In prep.
# Fit NPMR to simulated data K = 5 n = 1000 m = 10000 p = 10 r = 2 # Simulated training data set.seed(8369) A = matrix(rnorm(p*r), p, r) C = matrix(rnorm(K*r), K, r) B = tcrossprod(A, C) # low-rank coefficient matrix X = matrix(rnorm(n*p), n, p) # covariate matrix with iid Gaussian entries eta = X P = exp(eta)/rowSums(exp(eta)) Y = t(apply(P, 1, rmultinom, n = 1, size = 1)) # Simulate test data Xtest = matrix(rnorm(m*p), m, p) etatest = Xtest Ptest = exp(etatest)/rowSums(exp(etatest)) Ytest = t(apply(Ptest, 1, rmultinom, n = 1, size = 1)) # Fit NPMR for a sequence of lambda values without CV: fit2 = npmr(X, Y, lambda = exp(seq(7, -2))) # Compute mean test error using the predict function (for each value of lambda): getloss = function(pred, Y) { -mean(log(rowSums(Y*pred))) } apply(predict(fit2, Xtest), 3, getloss, Ytest)
# Fit NPMR to simulated data K = 5 n = 1000 m = 10000 p = 10 r = 2 # Simulated training data set.seed(8369) A = matrix(rnorm(p*r), p, r) C = matrix(rnorm(K*r), K, r) B = tcrossprod(A, C) # low-rank coefficient matrix X = matrix(rnorm(n*p), n, p) # covariate matrix with iid Gaussian entries eta = X P = exp(eta)/rowSums(exp(eta)) Y = t(apply(P, 1, rmultinom, n = 1, size = 1)) # Simulate test data Xtest = matrix(rnorm(m*p), m, p) etatest = Xtest Ptest = exp(etatest)/rowSums(exp(etatest)) Ytest = t(apply(Ptest, 1, rmultinom, n = 1, size = 1)) # Fit NPMR for a sequence of lambda values without CV: fit2 = npmr(X, Y, lambda = exp(seq(7, -2))) # Compute mean test error using the predict function (for each value of lambda): getloss = function(pred, Y) { -mean(log(rowSums(Y*pred))) } apply(predict(fit2, Xtest), 3, getloss, Ytest)
Print (1) the call that produced the cv.npmr
object;
(2) the value of the regularization parameter lambda that led to the
minimum cross validation error; (3) the rank of the fitted regression
coefficient matrix; and (4) the per-observation cross validation error using
the optimal lambda.
## S3 method for class 'cv.npmr' print(x, ...)
## S3 method for class 'cv.npmr' print(x, ...)
x |
an object of class |
... |
ignored |
Scott Powers, Trevor Hastie, Rob Tibshirani
Scott Powers, Trevor Hastie and Rob Tibshirani (2016). “Nuclear penalized multinomial regression with an application to predicting at bat outcomes in baseball.” In prep.
# Fit NPMR to simulated data K = 5 n = 1000 m = 10000 p = 10 r = 2 # Simulated training data set.seed(8369) A = matrix(rnorm(p*r), p, r) C = matrix(rnorm(K*r), K, r) B = tcrossprod(A, C) # low-rank coefficient matrix X = matrix(rnorm(n*p), n, p) # covariate matrix with iid Gaussian entries eta = X P = exp(eta)/rowSums(exp(eta)) Y = t(apply(P, 1, rmultinom, n = 1, size = 1)) fold = sample(rep(1:10, length = nrow(X))) # Simulate test data Xtest = matrix(rnorm(m*p), m, p) etatest = Xtest Ptest = exp(etatest)/rowSums(exp(etatest)) Ytest = t(apply(Ptest, 1, rmultinom, n = 1, size = 1)) # Fit NPMR for a sequence of lambda values without CV: fit2 = cv.npmr(X, Y, lambda = exp(seq(7, -2)), foldid = fold) # Print the NPMR fit: fit2
# Fit NPMR to simulated data K = 5 n = 1000 m = 10000 p = 10 r = 2 # Simulated training data set.seed(8369) A = matrix(rnorm(p*r), p, r) C = matrix(rnorm(K*r), K, r) B = tcrossprod(A, C) # low-rank coefficient matrix X = matrix(rnorm(n*p), n, p) # covariate matrix with iid Gaussian entries eta = X P = exp(eta)/rowSums(exp(eta)) Y = t(apply(P, 1, rmultinom, n = 1, size = 1)) fold = sample(rep(1:10, length = nrow(X))) # Simulate test data Xtest = matrix(rnorm(m*p), m, p) etatest = Xtest Ptest = exp(etatest)/rowSums(exp(etatest)) Ytest = t(apply(Ptest, 1, rmultinom, n = 1, size = 1)) # Fit NPMR for a sequence of lambda values without CV: fit2 = cv.npmr(X, Y, lambda = exp(seq(7, -2)), foldid = fold) # Print the NPMR fit: fit2
Print the call that produced the npmr
object and a dataframe
showing, for each value of the regularization parameter on
which the NPMR object was fit, the rank of the resulting regression coefficient
matrix and the corresponding value of the NPMR objective function.
## S3 method for class 'npmr' print(x, ...)
## S3 method for class 'npmr' print(x, ...)
x |
an object of class |
... |
ignored |
Scott Powers, Trevor Hastie, Rob Tibshirani
Scott Powers, Trevor Hastie and Rob Tibshirani (2016). “Nuclear penalized multinomial regression with an application to predicting at bat outcomes in baseball.” In prep.
# Fit NPMR to simulated data K = 5 n = 1000 m = 10000 p = 10 r = 2 # Simulated training data set.seed(8369) A = matrix(rnorm(p*r), p, r) C = matrix(rnorm(K*r), K, r) B = tcrossprod(A, C) # low-rank coefficient matrix X = matrix(rnorm(n*p), n, p) # covariate matrix with iid Gaussian entries eta = X P = exp(eta)/rowSums(exp(eta)) Y = t(apply(P, 1, rmultinom, n = 1, size = 1)) # Simulate test data Xtest = matrix(rnorm(m*p), m, p) etatest = Xtest Ptest = exp(etatest)/rowSums(exp(etatest)) Ytest = t(apply(Ptest, 1, rmultinom, n = 1, size = 1)) # Fit NPMR for a sequence of lambda values without CV: fit2 = npmr(X, Y, lambda = exp(seq(7, -2))) # Print the NPMR fit: fit2
# Fit NPMR to simulated data K = 5 n = 1000 m = 10000 p = 10 r = 2 # Simulated training data set.seed(8369) A = matrix(rnorm(p*r), p, r) C = matrix(rnorm(K*r), K, r) B = tcrossprod(A, C) # low-rank coefficient matrix X = matrix(rnorm(n*p), n, p) # covariate matrix with iid Gaussian entries eta = X P = exp(eta)/rowSums(exp(eta)) Y = t(apply(P, 1, rmultinom, n = 1, size = 1)) # Simulate test data Xtest = matrix(rnorm(m*p), m, p) etatest = Xtest Ptest = exp(etatest)/rowSums(exp(etatest)) Ytest = t(apply(Ptest, 1, rmultinom, n = 1, size = 1)) # Fit NPMR for a sequence of lambda values without CV: fit2 = npmr(X, Y, lambda = exp(seq(7, -2))) # Print the NPMR fit: fit2
Return the value of the proximal operator of the nuclear norm (scaled by
threshold
) applied to a matrix
prox(B, threshold, group)
prox(B, threshold, group)
B |
matrix |
threshold |
scaling factor applied to the nuclear norm. In proximal gradient descent for NPMR, this is the product of the stepsize and the regularization parameter lambda |
group |
Vector of length equal to number of variables, i.e. nrow(B). Variables in the same group indexed by a POSITIVE integer will be penalized together (the nuclear norm of the sub-matrix of the regression coefficients will be penalized). Variables without positive integers will NOT be penalized. Default is NULL, which means there are no sub-groups; nuclear norm of entire coefficient matrix is penalized. |
the value of the proximal operator of the nuclear norm (scaled by
threshold
) applied to B
Scott Powers, Trevor Hastie, Rob Tibshirani
Neal Parikh and Stephen Boyd (2013) “Proximal algorithms.” Foundations and Trends in Optimization 1, 3:123-231.
Speaker independent recognition of the eleven steady state vowels of British English using a specified training set of lpc derived log area ratios.
A data frame with 990 observations on the following 12 variables.
y
Class label indicating vowel spoken
subset
a factor with levels test
train
x.1
a numeric vector
x.2
a numeric vector
x.3
a numeric vector
x.4
a numeric vector
x.5
a numeric vector
x.6
a numeric vector
x.7
a numeric vector
x.8
a numeric vector
x.9
a numeric vector
x.10
a numeric vector
The speech signals were low pass filtered at 4.7kHz and then digitised to 12 bits with a 10kHz sampling rate. Twelfth order linear predictive analysis was carried out on six 512 sample Hamming windowed segments from the steady part of the vowel. The reflection coefficients were used to calculate 10 log area parameters, giving a 10 dimensional input space. For a general introduction to speech processing and an explanation of this technique see Rabiner and Schafer [RabinerSchafer78].
Each speaker thus yielded six frames of speech from eleven vowels. This gave 528 frames from the eight speakers used to train the networks and 462 frames from the seven speakers used to test the networks.
The eleven vowels, along with words demonstrating their sound, are: i (heed) I (hid) E (head) A (had) a: (hard) Y (hud) O (hod) C: (hoard) U (hood) u: (who'd) 3: (heard)
https://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/vowel/
D. H. Deterding, 1989, University of Cambridge, "Speaker Normalisation for Automatic Speech Recognition", submitted for PhD.
data(Vowel) summary(Vowel)
data(Vowel) summary(Vowel)