Title: | Customized Training for Lasso and Elastic-Net Regularized Generalized Linear Models |
---|---|
Description: | Customized training is a simple technique for transductive learning, when the test covariates are known at the time of training. The method identifies a subset of the training set to serve as the training set for each of a few identified subsets in the training set. This package implements customized training for the glmnet() and cv.glmnet() functions. |
Authors: | Scott Powers [aut, cre], Trevor Hastie [aut], Robert Tibshirani [aut] |
Maintainer: | Scott Powers <[email protected]> |
License: | GPL-2 |
Version: | 1.3 |
Built: | 2025-02-22 02:38:05 UTC |
Source: | https://github.com/saberpowers/customizedtraining |
Fit a regularized lasso model using customized training
customizedGlmnet( xTrain, yTrain, xTest, groupid = NULL, G = NULL, family = c("gaussian", "binomial", "multinomial"), dendrogram = NULL, dendrogramTestIndices = NULL )
customizedGlmnet( xTrain, yTrain, xTest, groupid = NULL, G = NULL, family = c("gaussian", "binomial", "multinomial"), dendrogram = NULL, dendrogramTestIndices = NULL )
xTrain |
an n-by-p matrix of training covariates |
yTrain |
a length-n vector of training responses. Numeric for family = |
xTest |
an m-by-p matrix of test covariates |
groupid |
an optional length-m vector of group memberships for the test set. If
specified, customized training subsets are identified using the union of
nearest neighbor sets for each test group. Either |
G |
a positive integer indicating the number of clusters for the joint clustering
of the test and training data. Ignored if |
family |
response type |
dendrogram |
optional output from |
dendrogramTestIndices |
optional set of indices (corresponding to dendrogram) held out in
cross-validation. Used by |
Identify customized training subsets of the training data through one of two methods: (1) If groupid is specified, grouping the test data, then for each test group find the 10 nearest neighbors of each observation in the group and use the union of these nearest neighbor sets as the customized training set or (2) If G is specified, jointly cluster the test and training data using hierarchical clustering with complete linkage. Within each cluster, the training data are used as the customized training subset for the test data. Once the customized training subsets have been identified, use glmnet to fit an l1-regularized regression model to each.
an object with class customizedGlmnet
the call that produced this object
a list containing the customized training subsets for each test group
a list containing the glmnet fit for each test group
a length-m vector containing the group memberships of the test data
a list containing train
(which is the input xTrain
) and test
(which is the input xTest
). Specified in function call
training response vector (specified in function call)
response type (specified in function call)
the fit of glmnet
to the entire training set using standard training
require(glmnet) # Simulate synthetic data n = m = 150 p = 50 q = 5 K = 3 sigmaC = 10 sigmaX = sigmaY = 1 set.seed(5914) beta = matrix(0, nrow = p, ncol = K) for (k in 1:K) beta[sample(1:p, q), k] = 1 c = matrix(rnorm(K*p, 0, sigmaC), K, p) eta = rnorm(K) pi = (exp(eta)+1)/sum(exp(eta)+1) z = t(rmultinom(m + n, 1, pi)) x = crossprod(t(z), c) + matrix(rnorm((m + n)*p, 0, sigmaX), m + n, p) y = rowSums(z*(crossprod(t(x), beta))) + rnorm(m + n, 0, sigmaY) x.train = x[1:n, ] y.train = y[1:n] x.test = x[n + 1:m, ] y.test = y[n + 1:m] # Example 1: Use clustering to fit the customized training model to training # and test data with no predefined test-set blocks fit1 = customizedGlmnet(x.train, y.train, x.test, G = 3, family = "gaussian") # Print the customized training model fit: fit1 # Extract nonzero regression coefficients for each group: nonzero(fit1, lambda = 10) # Compute test error using the predict function: mean((y.test - predict(fit1, lambda = 10))^2) # Plot nonzero coefficients by group: plot(fit1, lambda = 10) # Example 2: If the test set has predefined blocks, use these blocks to define # the customized training sets, instead of using clustering. group.id = apply(z == 1, 1, which)[n + 1:m] fit2 = customizedGlmnet(x.train, y.train, x.test, group.id) # Print the customized training model fit: fit2 # Extract nonzero regression coefficients for each group: nonzero(fit2, lambda = 10) # Compute test error using the predict function: mean((y.test - predict(fit2, lambda = 10))^2) # Plot nonzero coefficients by group: plot(fit2, lambda = 10)
require(glmnet) # Simulate synthetic data n = m = 150 p = 50 q = 5 K = 3 sigmaC = 10 sigmaX = sigmaY = 1 set.seed(5914) beta = matrix(0, nrow = p, ncol = K) for (k in 1:K) beta[sample(1:p, q), k] = 1 c = matrix(rnorm(K*p, 0, sigmaC), K, p) eta = rnorm(K) pi = (exp(eta)+1)/sum(exp(eta)+1) z = t(rmultinom(m + n, 1, pi)) x = crossprod(t(z), c) + matrix(rnorm((m + n)*p, 0, sigmaX), m + n, p) y = rowSums(z*(crossprod(t(x), beta))) + rnorm(m + n, 0, sigmaY) x.train = x[1:n, ] y.train = y[1:n] x.test = x[n + 1:m, ] y.test = y[n + 1:m] # Example 1: Use clustering to fit the customized training model to training # and test data with no predefined test-set blocks fit1 = customizedGlmnet(x.train, y.train, x.test, G = 3, family = "gaussian") # Print the customized training model fit: fit1 # Extract nonzero regression coefficients for each group: nonzero(fit1, lambda = 10) # Compute test error using the predict function: mean((y.test - predict(fit1, lambda = 10))^2) # Plot nonzero coefficients by group: plot(fit1, lambda = 10) # Example 2: If the test set has predefined blocks, use these blocks to define # the customized training sets, instead of using clustering. group.id = apply(z == 1, 1, which)[n + 1:m] fit2 = customizedGlmnet(x.train, y.train, x.test, group.id) # Print the customized training model fit: fit2 # Extract nonzero regression coefficients for each group: nonzero(fit2, lambda = 10) # Compute test error using the predict function: mean((y.test - predict(fit2, lambda = 10))^2) # Plot nonzero coefficients by group: plot(fit2, lambda = 10)
Does k-fold cross-validation for customizedGlmnet and returns a values for G
and lambda
cv.customizedGlmnet( xTrain, yTrain, xTest = NULL, groupid = NULL, Gs = NULL, dendrogram = NULL, dendrogramCV = NULL, lambda = NULL, nfolds = 10, foldid = NULL, keep = FALSE, family = c("gaussian", "binomial", "multinomial"), verbose = FALSE )
cv.customizedGlmnet( xTrain, yTrain, xTest = NULL, groupid = NULL, Gs = NULL, dendrogram = NULL, dendrogramCV = NULL, lambda = NULL, nfolds = 10, foldid = NULL, keep = FALSE, family = c("gaussian", "binomial", "multinomial"), verbose = FALSE )
xTrain |
an n-by-p matrix of training covariates |
yTrain |
a length-n vector of training responses. Numeric for family = |
xTest |
an m-by-p matrix of test covariates. May be left NULL, in which case cross validation predictions are made internally on the training set and no test predictions are returned. |
groupid |
an optional length-m vector of group memberships for the test set. If
specified, customized training subsets are identified using the union of
nearest neighbor sets for each test group, in which case cross-validation is
used only to select the regularization parameter |
Gs |
a vector of positive integers indicating the numbers of clusters over which to
perform cross-validation to determine the best number. Ignored if |
dendrogram |
optional output from |
dendrogramCV |
optional output from |
lambda |
sequence of values to use for the regularization parameter lambda. Recomended
to leave as NULL and allow |
nfolds |
number of folds – default is 10. Ignored if foldid is specified |
foldid |
an optional length-n vector of fold memberships used for cross-validation |
keep |
Should fitted values on the training set from cross validation be included in output? Default is FALSE. |
family |
response type |
verbose |
Should progress be printed to console as folds are evaluated during cross-validation? Default is FALSE. |
an object of class cv.customizedGlmnet
the call that produced this object
unless groupid is specified, the number of clusters minimizing CV error
the sequence of values of the regularization parameter lambda
considered
the value of the regularization parameter lambda
minimizing CV error
a matrix containing the CV error for each G
and lambda
a customizedGlmnet
object fit using G.min
and lambda.min
. Only returned if xTest
is not NULL.
a length-m vector of predictions for the test set, using the tuning parameters which minimize cross-validation error. Only returned if xTest
is not NULL.
a list of nonzero variables for each customized training set, using G.min
and lambda.min
. Only returned if xTest
is not NULL.
a array containing fitted values on the training set from cross validation. Only returned if keep
is TRUE.
require(glmnet) # Simulate synthetic data n = m = 150 p = 50 q = 5 K = 3 sigmaC = 10 sigmaX = sigmaY = 1 set.seed(5914) beta = matrix(0, nrow = p, ncol = K) for (k in 1:K) beta[sample(1:p, q), k] = 1 c = matrix(rnorm(K*p, 0, sigmaC), K, p) eta = rnorm(K) pi = (exp(eta)+1)/sum(exp(eta)+1) z = t(rmultinom(m + n, 1, pi)) x = crossprod(t(z), c) + matrix(rnorm((m + n)*p, 0, sigmaX), m + n, p) y = rowSums(z*(crossprod(t(x), beta))) + rnorm(m + n, 0, sigmaY) x.train = x[1:n, ] y.train = y[1:n] x.test = x[n + 1:m, ] y.test = y[n + 1:m] foldid = sample(rep(1:10, length = nrow(x.train))) # Example 1: Use clustering to fit the customized training model to training # and test data with no predefined test-set blocks fit1 = cv.customizedGlmnet(x.train, y.train, x.test, Gs = c(1, 2, 3, 5), family = "gaussian", foldid = foldid) # Print the optimal number of groups and value of lambda: fit1$G.min fit1$lambda.min # Print the customized training model fit: fit1 # Compute test error using the predict function: mean((y[n + 1:m] - predict(fit1))^2) # Plot nonzero coefficients by group: plot(fit1) # Example 2: If the test set has predefined blocks, use these blocks to define # the customized training sets, instead of using clustering. foldid = apply(z == 1, 1, which)[1:n] group.id = apply(z == 1, 1, which)[n + 1:m] fit2 = cv.customizedGlmnet(x.train, y.train, x.test, group.id, foldid = foldid) # Print the optimal value of lambda: fit2$lambda.min # Print the customized training model fit: fit2 # Compute test error using the predict function: mean((y[n + 1:m] - predict(fit2))^2) # Plot nonzero coefficients by group: plot(fit2) # Example 3: If there is no test set, but the training set is organized into # blocks, you can do cross validation with these blocks as the basis for the # customized training sets. fit3 = cv.customizedGlmnet(x.train, y.train, foldid = foldid) # Print the optimal value of lambda: fit3$lambda.min # Print the customized training model fit: fit3 # Compute test error using the predict function: mean((y[n + 1:m] - predict(fit3))^2) # Plot nonzero coefficients by group: plot(fit3)
require(glmnet) # Simulate synthetic data n = m = 150 p = 50 q = 5 K = 3 sigmaC = 10 sigmaX = sigmaY = 1 set.seed(5914) beta = matrix(0, nrow = p, ncol = K) for (k in 1:K) beta[sample(1:p, q), k] = 1 c = matrix(rnorm(K*p, 0, sigmaC), K, p) eta = rnorm(K) pi = (exp(eta)+1)/sum(exp(eta)+1) z = t(rmultinom(m + n, 1, pi)) x = crossprod(t(z), c) + matrix(rnorm((m + n)*p, 0, sigmaX), m + n, p) y = rowSums(z*(crossprod(t(x), beta))) + rnorm(m + n, 0, sigmaY) x.train = x[1:n, ] y.train = y[1:n] x.test = x[n + 1:m, ] y.test = y[n + 1:m] foldid = sample(rep(1:10, length = nrow(x.train))) # Example 1: Use clustering to fit the customized training model to training # and test data with no predefined test-set blocks fit1 = cv.customizedGlmnet(x.train, y.train, x.test, Gs = c(1, 2, 3, 5), family = "gaussian", foldid = foldid) # Print the optimal number of groups and value of lambda: fit1$G.min fit1$lambda.min # Print the customized training model fit: fit1 # Compute test error using the predict function: mean((y[n + 1:m] - predict(fit1))^2) # Plot nonzero coefficients by group: plot(fit1) # Example 2: If the test set has predefined blocks, use these blocks to define # the customized training sets, instead of using clustering. foldid = apply(z == 1, 1, which)[1:n] group.id = apply(z == 1, 1, which)[n + 1:m] fit2 = cv.customizedGlmnet(x.train, y.train, x.test, group.id, foldid = foldid) # Print the optimal value of lambda: fit2$lambda.min # Print the customized training model fit: fit2 # Compute test error using the predict function: mean((y[n + 1:m] - predict(fit2))^2) # Plot nonzero coefficients by group: plot(fit2) # Example 3: If there is no test set, but the training set is organized into # blocks, you can do cross validation with these blocks as the basis for the # customized training sets. fit3 = cv.customizedGlmnet(x.train, y.train, foldid = foldid) # Print the optimal value of lambda: fit3$lambda.min # Print the customized training model fit: fit3 # Compute test error using the predict function: mean((y[n + 1:m] - predict(fit3))^2) # Plot nonzero coefficients by group: plot(fit3)
nonzero
is a generic function for returning the set of variables selected by a model
nonzero(object, ...)
nonzero(object, ...)
object |
a model object for which the set of selected variables is desired |
... |
additional arguments, e.g. tuning parameters |
The form of the value returned by nonzero
depends on the class of its
argument. See the documentation of the particular methods for details of what
is produced by that method.
customizedGlmnet
objectReturns a list of vectors of selected variables for each separate glmnet model
fit by a customizedGlmnet
object
## S3 method for class 'customizedGlmnet' nonzero(object, lambda = NULL, ...)
## S3 method for class 'customizedGlmnet' nonzero(object, lambda = NULL, ...)
object |
fitted |
lambda |
value of regularization parameter to use. Must be specified |
... |
ignored |
a list of vectors, each vectors representing one of the glmnet
models
fit by the customizedGlmnet
model. Each vector gives the indices of the
variables selected by the model
require(glmnet) # Simulate synthetic data n = m = 150 p = 50 q = 5 K = 3 sigmaC = 10 sigmaX = sigmaY = 1 set.seed(5914) beta = matrix(0, nrow = p, ncol = K) for (k in 1:K) beta[sample(1:p, q), k] = 1 c = matrix(rnorm(K*p, 0, sigmaC), K, p) eta = rnorm(K) pi = (exp(eta)+1)/sum(exp(eta)+1) z = t(rmultinom(m + n, 1, pi)) x = crossprod(t(z), c) + matrix(rnorm((m + n)*p, 0, sigmaX), m + n, p) y = rowSums(z*(crossprod(t(x), beta))) + rnorm(m + n, 0, sigmaY) x.train = x[1:n, ] y.train = y[1:n] x.test = x[n + 1:m, ] y.test = y[n + 1:m] # Example 1: Use clustering to fit the customized training model to training # and test data with no predefined test-set blocks fit1 = customizedGlmnet(x.train, y.train, x.test, G = 3, family = "gaussian") # Extract nonzero regression coefficients for each group: nonzero(fit1, lambda = 10) # Example 2: If the test set has predefined blocks, use these blocks to define # the customized training sets, instead of using clustering. group.id = apply(z == 1, 1, which)[n + 1:m] fit2 = customizedGlmnet(x.train, y.train, x.test, group.id) # Extract nonzero regression coefficients for each group: nonzero(fit2, lambda = 10)
require(glmnet) # Simulate synthetic data n = m = 150 p = 50 q = 5 K = 3 sigmaC = 10 sigmaX = sigmaY = 1 set.seed(5914) beta = matrix(0, nrow = p, ncol = K) for (k in 1:K) beta[sample(1:p, q), k] = 1 c = matrix(rnorm(K*p, 0, sigmaC), K, p) eta = rnorm(K) pi = (exp(eta)+1)/sum(exp(eta)+1) z = t(rmultinom(m + n, 1, pi)) x = crossprod(t(z), c) + matrix(rnorm((m + n)*p, 0, sigmaX), m + n, p) y = rowSums(z*(crossprod(t(x), beta))) + rnorm(m + n, 0, sigmaY) x.train = x[1:n, ] y.train = y[1:n] x.test = x[n + 1:m, ] y.test = y[n + 1:m] # Example 1: Use clustering to fit the customized training model to training # and test data with no predefined test-set blocks fit1 = customizedGlmnet(x.train, y.train, x.test, G = 3, family = "gaussian") # Extract nonzero regression coefficients for each group: nonzero(fit1, lambda = 10) # Example 2: If the test set has predefined blocks, use these blocks to define # the customized training sets, instead of using clustering. group.id = apply(z == 1, 1, which)[n + 1:m] fit2 = customizedGlmnet(x.train, y.train, x.test, group.id) # Extract nonzero regression coefficients for each group: nonzero(fit2, lambda = 10)
singleton
objectReturns NULL. Intended for internal use only
## S3 method for class 'singleton' nonzero(object, ...)
## S3 method for class 'singleton' nonzero(object, ...)
object |
an object of class |
... |
ignored |
Produces a plot, with a row for each customized training submodel, showing the variables selected in the subset, with variables along the horizonal axis
## S3 method for class 'customizedGlmnet' plot(x, lambda, ...)
## S3 method for class 'customizedGlmnet' plot(x, lambda, ...)
x |
a fitted |
lambda |
regularization parameter. Required |
... |
ignored |
the function returns silently
require(glmnet) # Simulate synthetic data n = m = 150 p = 50 q = 5 K = 3 sigmaC = 10 sigmaX = sigmaY = 1 set.seed(5914) beta = matrix(0, nrow = p, ncol = K) for (k in 1:K) beta[sample(1:p, q), k] = 1 c = matrix(rnorm(K*p, 0, sigmaC), K, p) eta = rnorm(K) pi = (exp(eta)+1)/sum(exp(eta)+1) z = t(rmultinom(m + n, 1, pi)) x = crossprod(t(z), c) + matrix(rnorm((m + n)*p, 0, sigmaX), m + n, p) y = rowSums(z*(crossprod(t(x), beta))) + rnorm(m + n, 0, sigmaY) x.train = x[1:n, ] y.train = y[1:n] x.test = x[n + 1:m, ] y.test = y[n + 1:m] # Example 1: Use clustering to fit the customized training model to training # and test data with no predefined test-set blocks fit1 = customizedGlmnet(x.train, y.train, x.test, G = 3, family = "gaussian") # Plot nonzero coefficients by group: plot(fit1, lambda = 10) # Example 2: If the test set has predefined blocks, use these blocks to define # the customized training sets, instead of using clustering. group.id = apply(z == 1, 1, which)[n + 1:m] fit2 = customizedGlmnet(x.train, y.train, x.test, group.id) # Plot nonzero coefficients by group: plot(fit2, lambda = 10)
require(glmnet) # Simulate synthetic data n = m = 150 p = 50 q = 5 K = 3 sigmaC = 10 sigmaX = sigmaY = 1 set.seed(5914) beta = matrix(0, nrow = p, ncol = K) for (k in 1:K) beta[sample(1:p, q), k] = 1 c = matrix(rnorm(K*p, 0, sigmaC), K, p) eta = rnorm(K) pi = (exp(eta)+1)/sum(exp(eta)+1) z = t(rmultinom(m + n, 1, pi)) x = crossprod(t(z), c) + matrix(rnorm((m + n)*p, 0, sigmaX), m + n, p) y = rowSums(z*(crossprod(t(x), beta))) + rnorm(m + n, 0, sigmaY) x.train = x[1:n, ] y.train = y[1:n] x.test = x[n + 1:m, ] y.test = y[n + 1:m] # Example 1: Use clustering to fit the customized training model to training # and test data with no predefined test-set blocks fit1 = customizedGlmnet(x.train, y.train, x.test, G = 3, family = "gaussian") # Plot nonzero coefficients by group: plot(fit1, lambda = 10) # Example 2: If the test set has predefined blocks, use these blocks to define # the customized training sets, instead of using clustering. group.id = apply(z == 1, 1, which)[n + 1:m] fit2 = customizedGlmnet(x.train, y.train, x.test, group.id) # Plot nonzero coefficients by group: plot(fit2, lambda = 10)
Produces a plot, with a row for each customized training submodel, showing the variables selected in the subset, with variables along the horizonal axis. The lambda used is the one which minimizes CV error.
## S3 method for class 'cv.customizedGlmnet' plot(x, ...)
## S3 method for class 'cv.customizedGlmnet' plot(x, ...)
x |
a fitted |
... |
ignored |
the function returns silently
require(glmnet) # Simulate synthetic data n = m = 150 p = 50 q = 5 K = 3 sigmaC = 10 sigmaX = sigmaY = 1 set.seed(5914) beta = matrix(0, nrow = p, ncol = K) for (k in 1:K) beta[sample(1:p, q), k] = 1 c = matrix(rnorm(K*p, 0, sigmaC), K, p) eta = rnorm(K) pi = (exp(eta)+1)/sum(exp(eta)+1) z = t(rmultinom(m + n, 1, pi)) x = crossprod(t(z), c) + matrix(rnorm((m + n)*p, 0, sigmaX), m + n, p) y = rowSums(z*(crossprod(t(x), beta))) + rnorm(m + n, 0, sigmaY) x.train = x[1:n, ] y.train = y[1:n] x.test = x[n + 1:m, ] y.test = y[n + 1:m] foldid = sample(rep(1:10, length = nrow(x.train))) # Example 1: Use clustering to fit the customized training model to training # and test data with no predefined test-set blocks fit1 = cv.customizedGlmnet(x.train, y.train, x.test, Gs = c(1, 2, 3, 5), family = "gaussian", foldid = foldid) # Print the optimal number of groups and value of lambda: fit1$G.min fit1$lambda.min # Print the customized training model fit: fit1 # Compute test error using the predict function: mean((y[n + 1:m] - predict(fit1))^2) # Plot nonzero coefficients by group: plot(fit1) # Example 2: If the test set has predefined blocks, use these blocks to define # the customized training sets, instead of using clustering. foldid = apply(z == 1, 1, which)[1:n] group.id = apply(z == 1, 1, which)[n + 1:m] fit2 = cv.customizedGlmnet(x.train, y.train, x.test, group.id, foldid = foldid) # Print the optimal value of lambda: fit2$lambda.min # Print the customized training model fit: fit2 # Compute test error using the predict function: mean((y[n + 1:m] - predict(fit2))^2) # Plot nonzero coefficients by group: plot(fit2) # Example 3: If there is no test set, but the training set is organized into # blocks, you can do cross validation with these blocks as the basis for the # customized training sets. fit3 = cv.customizedGlmnet(x.train, y.train, foldid = foldid) # Print the optimal value of lambda: fit3$lambda.min # Print the customized training model fit: fit3 # Compute test error using the predict function: mean((y[n + 1:m] - predict(fit3))^2) # Plot nonzero coefficients by group: plot(fit3)
require(glmnet) # Simulate synthetic data n = m = 150 p = 50 q = 5 K = 3 sigmaC = 10 sigmaX = sigmaY = 1 set.seed(5914) beta = matrix(0, nrow = p, ncol = K) for (k in 1:K) beta[sample(1:p, q), k] = 1 c = matrix(rnorm(K*p, 0, sigmaC), K, p) eta = rnorm(K) pi = (exp(eta)+1)/sum(exp(eta)+1) z = t(rmultinom(m + n, 1, pi)) x = crossprod(t(z), c) + matrix(rnorm((m + n)*p, 0, sigmaX), m + n, p) y = rowSums(z*(crossprod(t(x), beta))) + rnorm(m + n, 0, sigmaY) x.train = x[1:n, ] y.train = y[1:n] x.test = x[n + 1:m, ] y.test = y[n + 1:m] foldid = sample(rep(1:10, length = nrow(x.train))) # Example 1: Use clustering to fit the customized training model to training # and test data with no predefined test-set blocks fit1 = cv.customizedGlmnet(x.train, y.train, x.test, Gs = c(1, 2, 3, 5), family = "gaussian", foldid = foldid) # Print the optimal number of groups and value of lambda: fit1$G.min fit1$lambda.min # Print the customized training model fit: fit1 # Compute test error using the predict function: mean((y[n + 1:m] - predict(fit1))^2) # Plot nonzero coefficients by group: plot(fit1) # Example 2: If the test set has predefined blocks, use these blocks to define # the customized training sets, instead of using clustering. foldid = apply(z == 1, 1, which)[1:n] group.id = apply(z == 1, 1, which)[n + 1:m] fit2 = cv.customizedGlmnet(x.train, y.train, x.test, group.id, foldid = foldid) # Print the optimal value of lambda: fit2$lambda.min # Print the customized training model fit: fit2 # Compute test error using the predict function: mean((y[n + 1:m] - predict(fit2))^2) # Plot nonzero coefficients by group: plot(fit2) # Example 3: If there is no test set, but the training set is organized into # blocks, you can do cross validation with these blocks as the basis for the # customized training sets. fit3 = cv.customizedGlmnet(x.train, y.train, foldid = foldid) # Print the optimal value of lambda: fit3$lambda.min # Print the customized training model fit: fit3 # Compute test error using the predict function: mean((y[n + 1:m] - predict(fit3))^2) # Plot nonzero coefficients by group: plot(fit3)
customizedGlmnet
objectReturns predictions for the test set provided at the time of fitting the customizedGlmnet
object.
## S3 method for class 'customizedGlmnet' predict(object, lambda, type = c("response", "class"), ...)
## S3 method for class 'customizedGlmnet' predict(object, lambda, type = c("response", "class"), ...)
object |
a fitted |
lambda |
regularization parameter |
type |
Type of prediction, currently only "response" and "class" are supported. Type "response" returns fitted values for "gaussian" family and fitted probabilities for "binomial" and "multinomial" families. Type "class" applies only to "binomial" and "multinomial" families and returns the class with the highest fitted probability. |
... |
ignored |
a vector of predictions corresponding to the test data input to the model at the time of fitting
require(glmnet) # Simulate synthetic data n = m = 150 p = 50 q = 5 K = 3 sigmaC = 10 sigmaX = sigmaY = 1 set.seed(5914) beta = matrix(0, nrow = p, ncol = K) for (k in 1:K) beta[sample(1:p, q), k] = 1 c = matrix(rnorm(K*p, 0, sigmaC), K, p) eta = rnorm(K) pi = (exp(eta)+1)/sum(exp(eta)+1) z = t(rmultinom(m + n, 1, pi)) x = crossprod(t(z), c) + matrix(rnorm((m + n)*p, 0, sigmaX), m + n, p) y = rowSums(z*(crossprod(t(x), beta))) + rnorm(m + n, 0, sigmaY) x.train = x[1:n, ] y.train = y[1:n] x.test = x[n + 1:m, ] y.test = y[n + 1:m] # Example 1: Use clustering to fit the customized training model to training # and test data with no predefined test-set blocks fit1 = customizedGlmnet(x.train, y.train, x.test, G = 3, family = "gaussian") # Compute test error using the predict function: mean((y.test - predict(fit1, lambda = 10))^2) # Example 2: If the test set has predefined blocks, use these blocks to define # the customized training sets, instead of using clustering. group.id = apply(z == 1, 1, which)[n + 1:m] fit2 = customizedGlmnet(x.train, y.train, x.test, group.id) # Compute test error using the predict function: mean((y.test - predict(fit2, lambda = 10))^2)
require(glmnet) # Simulate synthetic data n = m = 150 p = 50 q = 5 K = 3 sigmaC = 10 sigmaX = sigmaY = 1 set.seed(5914) beta = matrix(0, nrow = p, ncol = K) for (k in 1:K) beta[sample(1:p, q), k] = 1 c = matrix(rnorm(K*p, 0, sigmaC), K, p) eta = rnorm(K) pi = (exp(eta)+1)/sum(exp(eta)+1) z = t(rmultinom(m + n, 1, pi)) x = crossprod(t(z), c) + matrix(rnorm((m + n)*p, 0, sigmaX), m + n, p) y = rowSums(z*(crossprod(t(x), beta))) + rnorm(m + n, 0, sigmaY) x.train = x[1:n, ] y.train = y[1:n] x.test = x[n + 1:m, ] y.test = y[n + 1:m] # Example 1: Use clustering to fit the customized training model to training # and test data with no predefined test-set blocks fit1 = customizedGlmnet(x.train, y.train, x.test, G = 3, family = "gaussian") # Compute test error using the predict function: mean((y.test - predict(fit1, lambda = 10))^2) # Example 2: If the test set has predefined blocks, use these blocks to define # the customized training sets, instead of using clustering. group.id = apply(z == 1, 1, which)[n + 1:m] fit2 = customizedGlmnet(x.train, y.train, x.test, group.id) # Compute test error using the predict function: mean((y.test - predict(fit2, lambda = 10))^2)
cv.customizedGlmnet
objectReturns predictions for test set provided at time of fitting, using regulariztion parameter which minimizes CV error
## S3 method for class 'cv.customizedGlmnet' predict(object, ...)
## S3 method for class 'cv.customizedGlmnet' predict(object, ...)
object |
a fitted |
... |
additional arguments to be passed to |
a vector of predictions corresponding to the test set provided when the model was fit. The results are for the regularization parameter chosen by cross-validation
require(glmnet) # Simulate synthetic data n = m = 150 p = 50 q = 5 K = 3 sigmaC = 10 sigmaX = sigmaY = 1 set.seed(5914) beta = matrix(0, nrow = p, ncol = K) for (k in 1:K) beta[sample(1:p, q), k] = 1 c = matrix(rnorm(K*p, 0, sigmaC), K, p) eta = rnorm(K) pi = (exp(eta)+1)/sum(exp(eta)+1) z = t(rmultinom(m + n, 1, pi)) x = crossprod(t(z), c) + matrix(rnorm((m + n)*p, 0, sigmaX), m + n, p) y = rowSums(z*(crossprod(t(x), beta))) + rnorm(m + n, 0, sigmaY) x.train = x[1:n, ] y.train = y[1:n] x.test = x[n + 1:m, ] y.test = y[n + 1:m] foldid = sample(rep(1:10, length = nrow(x.train))) # Example 1: Use clustering to fit the customized training model to training # and test data with no predefined test-set blocks fit1 = cv.customizedGlmnet(x.train, y.train, x.test, Gs = c(1, 2, 3, 5), family = "gaussian", foldid = foldid) # Print the optimal number of groups and value of lambda: fit1$G.min fit1$lambda.min # Print the customized training model fit: fit1 # Compute test error using the predict function: mean((y[n + 1:m] - predict(fit1))^2) # Plot nonzero coefficients by group: plot(fit1) # Example 2: If the test set has predefined blocks, use these blocks to define # the customized training sets, instead of using clustering. foldid = apply(z == 1, 1, which)[1:n] group.id = apply(z == 1, 1, which)[n + 1:m] fit2 = cv.customizedGlmnet(x.train, y.train, x.test, group.id, foldid = foldid) # Print the optimal value of lambda: fit2$lambda.min # Print the customized training model fit: fit2 # Compute test error using the predict function: mean((y[n + 1:m] - predict(fit2))^2) # Plot nonzero coefficients by group: plot(fit2) # Example 3: If there is no test set, but the training set is organized into # blocks, you can do cross validation with these blocks as the basis for the # customized training sets. fit3 = cv.customizedGlmnet(x.train, y.train, foldid = foldid) # Print the optimal value of lambda: fit3$lambda.min # Print the customized training model fit: fit3 # Compute test error using the predict function: mean((y[n + 1:m] - predict(fit3))^2) # Plot nonzero coefficients by group: plot(fit3)
require(glmnet) # Simulate synthetic data n = m = 150 p = 50 q = 5 K = 3 sigmaC = 10 sigmaX = sigmaY = 1 set.seed(5914) beta = matrix(0, nrow = p, ncol = K) for (k in 1:K) beta[sample(1:p, q), k] = 1 c = matrix(rnorm(K*p, 0, sigmaC), K, p) eta = rnorm(K) pi = (exp(eta)+1)/sum(exp(eta)+1) z = t(rmultinom(m + n, 1, pi)) x = crossprod(t(z), c) + matrix(rnorm((m + n)*p, 0, sigmaX), m + n, p) y = rowSums(z*(crossprod(t(x), beta))) + rnorm(m + n, 0, sigmaY) x.train = x[1:n, ] y.train = y[1:n] x.test = x[n + 1:m, ] y.test = y[n + 1:m] foldid = sample(rep(1:10, length = nrow(x.train))) # Example 1: Use clustering to fit the customized training model to training # and test data with no predefined test-set blocks fit1 = cv.customizedGlmnet(x.train, y.train, x.test, Gs = c(1, 2, 3, 5), family = "gaussian", foldid = foldid) # Print the optimal number of groups and value of lambda: fit1$G.min fit1$lambda.min # Print the customized training model fit: fit1 # Compute test error using the predict function: mean((y[n + 1:m] - predict(fit1))^2) # Plot nonzero coefficients by group: plot(fit1) # Example 2: If the test set has predefined blocks, use these blocks to define # the customized training sets, instead of using clustering. foldid = apply(z == 1, 1, which)[1:n] group.id = apply(z == 1, 1, which)[n + 1:m] fit2 = cv.customizedGlmnet(x.train, y.train, x.test, group.id, foldid = foldid) # Print the optimal value of lambda: fit2$lambda.min # Print the customized training model fit: fit2 # Compute test error using the predict function: mean((y[n + 1:m] - predict(fit2))^2) # Plot nonzero coefficients by group: plot(fit2) # Example 3: If there is no test set, but the training set is organized into # blocks, you can do cross validation with these blocks as the basis for the # customized training sets. fit3 = cv.customizedGlmnet(x.train, y.train, foldid = foldid) # Print the optimal value of lambda: fit3$lambda.min # Print the customized training model fit: fit3 # Compute test error using the predict function: mean((y[n + 1:m] - predict(fit3))^2) # Plot nonzero coefficients by group: plot(fit3)
Returns the value stored in the singleton. Intended for internal use only.
## S3 method for class 'singleton' predict(object, type = c("response", "class", "nonzero"), ...)
## S3 method for class 'singleton' predict(object, type = c("response", "class", "nonzero"), ...)
object |
an object of class |
type |
type of prediction to be returned, "response" or "class" |
... |
ignored |
The value of the singleton
customizedGlmnet
objectPrint the numbers of training observations and test observations in each
submodel of the customizedGlmnet
fit
## S3 method for class 'customizedGlmnet' print(x, ...)
## S3 method for class 'customizedGlmnet' print(x, ...)
x |
fitted |
... |
ignored |
require(glmnet) # Simulate synthetic data n = m = 150 p = 50 q = 5 K = 3 sigmaC = 10 sigmaX = sigmaY = 1 set.seed(5914) beta = matrix(0, nrow = p, ncol = K) for (k in 1:K) beta[sample(1:p, q), k] = 1 c = matrix(rnorm(K*p, 0, sigmaC), K, p) eta = rnorm(K) pi = (exp(eta)+1)/sum(exp(eta)+1) z = t(rmultinom(m + n, 1, pi)) x = crossprod(t(z), c) + matrix(rnorm((m + n)*p, 0, sigmaX), m + n, p) y = rowSums(z*(crossprod(t(x), beta))) + rnorm(m + n, 0, sigmaY) x.train = x[1:n, ] y.train = y[1:n] x.test = x[n + 1:m, ] y.test = y[n + 1:m] # Example 1: Use clustering to fit the customized training model to training # and test data with no predefined test-set blocks fit1 = customizedGlmnet(x.train, y.train, x.test, G = 3, family = "gaussian") # Print the customized training model fit: fit1 # Example 2: If the test set has predefined blocks, use these blocks to define # the customized training sets, instead of using clustering. group.id = apply(z == 1, 1, which)[n + 1:m] fit2 = customizedGlmnet(x.train, y.train, x.test, group.id) # Print the customized training model fit: fit2
require(glmnet) # Simulate synthetic data n = m = 150 p = 50 q = 5 K = 3 sigmaC = 10 sigmaX = sigmaY = 1 set.seed(5914) beta = matrix(0, nrow = p, ncol = K) for (k in 1:K) beta[sample(1:p, q), k] = 1 c = matrix(rnorm(K*p, 0, sigmaC), K, p) eta = rnorm(K) pi = (exp(eta)+1)/sum(exp(eta)+1) z = t(rmultinom(m + n, 1, pi)) x = crossprod(t(z), c) + matrix(rnorm((m + n)*p, 0, sigmaX), m + n, p) y = rowSums(z*(crossprod(t(x), beta))) + rnorm(m + n, 0, sigmaY) x.train = x[1:n, ] y.train = y[1:n] x.test = x[n + 1:m, ] y.test = y[n + 1:m] # Example 1: Use clustering to fit the customized training model to training # and test data with no predefined test-set blocks fit1 = customizedGlmnet(x.train, y.train, x.test, G = 3, family = "gaussian") # Print the customized training model fit: fit1 # Example 2: If the test set has predefined blocks, use these blocks to define # the customized training sets, instead of using clustering. group.id = apply(z == 1, 1, which)[n + 1:m] fit2 = customizedGlmnet(x.train, y.train, x.test, group.id) # Print the customized training model fit: fit2
Print the number of customized training subsets chosen by cross-validation and the number of variables selected in each training subset.
## S3 method for class 'cv.customizedGlmnet' print(x, ...)
## S3 method for class 'cv.customizedGlmnet' print(x, ...)
x |
a fitted |
... |
ignored |
require(glmnet) # Simulate synthetic data n = m = 150 p = 50 q = 5 K = 3 sigmaC = 10 sigmaX = sigmaY = 1 set.seed(5914) beta = matrix(0, nrow = p, ncol = K) for (k in 1:K) beta[sample(1:p, q), k] = 1 c = matrix(rnorm(K*p, 0, sigmaC), K, p) eta = rnorm(K) pi = (exp(eta)+1)/sum(exp(eta)+1) z = t(rmultinom(m + n, 1, pi)) x = crossprod(t(z), c) + matrix(rnorm((m + n)*p, 0, sigmaX), m + n, p) y = rowSums(z*(crossprod(t(x), beta))) + rnorm(m + n, 0, sigmaY) x.train = x[1:n, ] y.train = y[1:n] x.test = x[n + 1:m, ] y.test = y[n + 1:m] foldid = sample(rep(1:10, length = nrow(x.train))) # Example 1: Use clustering to fit the customized training model to training # and test data with no predefined test-set blocks fit1 = cv.customizedGlmnet(x.train, y.train, x.test, Gs = c(1, 2, 3, 5), family = "gaussian", foldid = foldid) # Print the optimal number of groups and value of lambda: fit1$G.min fit1$lambda.min # Print the customized training model fit: fit1 # Compute test error using the predict function: mean((y[n + 1:m] - predict(fit1))^2) # Plot nonzero coefficients by group: plot(fit1) # Example 2: If the test set has predefined blocks, use these blocks to define # the customized training sets, instead of using clustering. foldid = apply(z == 1, 1, which)[1:n] group.id = apply(z == 1, 1, which)[n + 1:m] fit2 = cv.customizedGlmnet(x.train, y.train, x.test, group.id, foldid = foldid) # Print the optimal value of lambda: fit2$lambda.min # Print the customized training model fit: fit2 # Compute test error using the predict function: mean((y[n + 1:m] - predict(fit2))^2) # Plot nonzero coefficients by group: plot(fit2) # Example 3: If there is no test set, but the training set is organized into # blocks, you can do cross validation with these blocks as the basis for the # customized training sets. fit3 = cv.customizedGlmnet(x.train, y.train, foldid = foldid) # Print the optimal value of lambda: fit3$lambda.min # Print the customized training model fit: fit3 # Compute test error using the predict function: mean((y[n + 1:m] - predict(fit3))^2) # Plot nonzero coefficients by group: plot(fit3)
require(glmnet) # Simulate synthetic data n = m = 150 p = 50 q = 5 K = 3 sigmaC = 10 sigmaX = sigmaY = 1 set.seed(5914) beta = matrix(0, nrow = p, ncol = K) for (k in 1:K) beta[sample(1:p, q), k] = 1 c = matrix(rnorm(K*p, 0, sigmaC), K, p) eta = rnorm(K) pi = (exp(eta)+1)/sum(exp(eta)+1) z = t(rmultinom(m + n, 1, pi)) x = crossprod(t(z), c) + matrix(rnorm((m + n)*p, 0, sigmaX), m + n, p) y = rowSums(z*(crossprod(t(x), beta))) + rnorm(m + n, 0, sigmaY) x.train = x[1:n, ] y.train = y[1:n] x.test = x[n + 1:m, ] y.test = y[n + 1:m] foldid = sample(rep(1:10, length = nrow(x.train))) # Example 1: Use clustering to fit the customized training model to training # and test data with no predefined test-set blocks fit1 = cv.customizedGlmnet(x.train, y.train, x.test, Gs = c(1, 2, 3, 5), family = "gaussian", foldid = foldid) # Print the optimal number of groups and value of lambda: fit1$G.min fit1$lambda.min # Print the customized training model fit: fit1 # Compute test error using the predict function: mean((y[n + 1:m] - predict(fit1))^2) # Plot nonzero coefficients by group: plot(fit1) # Example 2: If the test set has predefined blocks, use these blocks to define # the customized training sets, instead of using clustering. foldid = apply(z == 1, 1, which)[1:n] group.id = apply(z == 1, 1, which)[n + 1:m] fit2 = cv.customizedGlmnet(x.train, y.train, x.test, group.id, foldid = foldid) # Print the optimal value of lambda: fit2$lambda.min # Print the customized training model fit: fit2 # Compute test error using the predict function: mean((y[n + 1:m] - predict(fit2))^2) # Plot nonzero coefficients by group: plot(fit2) # Example 3: If there is no test set, but the training set is organized into # blocks, you can do cross validation with these blocks as the basis for the # customized training sets. fit3 = cv.customizedGlmnet(x.train, y.train, foldid = foldid) # Print the optimal value of lambda: fit3$lambda.min # Print the customized training model fit: fit3 # Compute test error using the predict function: mean((y[n + 1:m] - predict(fit3))^2) # Plot nonzero coefficients by group: plot(fit3)
Speaker independent recognition of the eleven steady state vowels of British English using a specified training set of lpc derived log area ratios.
A data frame with 990 observations on the following 12 variables.
y
Class label indicating vowel spoken
subset
a factor with levels test
train
x.1
a numeric vector
x.2
a numeric vector
x.3
a numeric vector
x.4
a numeric vector
x.5
a numeric vector
x.6
a numeric vector
x.7
a numeric vector
x.8
a numeric vector
x.9
a numeric vector
x.10
a numeric vector
The speech signals were low pass filtered at 4.7kHz and then digitised to 12 bits with a 10kHz sampling rate. Twelfth order linear predictive analysis was carried out on six 512 sample Hamming windowed segments from the steady part of the vowel. The reflection coefficients were used to calculate 10 log area parameters, giving a 10 dimensional input space. For a general introduction to speech processing and an explanation of this technique see Rabiner and Schafer [RabinerSchafer78].
Each speaker thus yielded six frames of speech from eleven vowels. This gave 528 frames from the eight speakers used to train the networks and 462 frames from the seven speakers used to test the networks.
The eleven vowels, along with words demonstrating their sound, are: i (heed) I (hid) E (head) A (had) a: (hard) Y (hud) O (hod) C: (hoard) U (hood) u: (who'd) 3: (heard)
https://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/vowel/
D. H. Deterding, 1989, University of Cambridge, "Speaker Normalisation for Automatic Speech Recognition", submitted for PhD.
data(Vowel) summary(Vowel)
data(Vowel) summary(Vowel)