PLS_Toolbox Documentation: crossval< crosscor datahat >

crossval

Purpose

Cross-validation for PCA, PLS, MLR, and PCR.

Synopsis

 

results = crossval(x,y,rm,cvi,ncomp,options)

[press,cumpress,rmsecv,rmsec,cvpred,misclassed] = crossval(x,y,rm,cvi,ncomp,options)

Description

CROSSVAL performs cross-validation for linear regression (PCR, PLS, MLR) and principal components analysis (PCA). Inputs are the predictor variable matrix x, predicted variable y (y is empty [] for rm = 'pca'), regression method rm, cross-validation method cvi, and maximum number of latent variables / components ncomp.

rm  = 'pca'  performs cross-validation for PCA,

rm  = 'mlr'  performs cross-validation for MLR,

rm  = 'pcr'  performs cross-validation for PCR,

rm  = 'nip'  performs cross-validation for PLS using NIPALS,

rm  = 'sim'  performs cross-validation for PLS using SIMPLS, and

rm  = 'lwr'  performs cross-validation for LWR.

cvi can be 1) a cell containing one of the cross-validation methods below with the appropriate parameters {cvm splits iter}, or 2) a vector representing user-defined cross-validation groups.

 

cvi = {'loo'};             leave-one-out cross-validation,

cvi = {'vet' splits};      venetian blinds (every n-th sample together),

cvi = {'con' splits};      contiguous blocks, and

cvi = {'rnd' splits iter}; random subsets.

 

Except for leave-one-out, all methods require the number of data splits splits to be provided. Random data subsets ('rnd') also requires number of iterations iter.

For user-defined cross-validation, cvi is a vector with the same number of elements as x has rows (i.e. length(cvi) = size(x,1); when x is class "double", or length(cvi) = size(x.data,1); when x is class "dataset") with integer elements, defining test subsets. Each cvi(i) is defined as:

cvi(i) = -2  the sample is always in the test set,

cvi(i) = -1  the sample is always in the calibration set,

cvi(i) =  0  the sample is always never used, and

cvi(i) =  1,2,3 defines each subset.

Optional input options is an options structure containing one or more of the following fields:

                    name:   'options', name indicating that this is an options structure,

       display:   [ 'off' | {'on'} ] Governs output to command window,

                  plots:   [ 'none' | {'final'} ] Governs plotting,

  preprocessing:  {[1]} Controls preprocessing. Default is mean centering (1). Can be input in two ways:

                                 a) As a single value: 0 = none, 1 = mean centering, 2 = autoscaling, or

                                 b) As {xp yp}, a cell array containing a preprocessing structure(s) for the X- and Y-blocks (see PREPROCESS). E.g. pre = {xp []}; for PCA. To include preprocessing of each subset use pre = {xp yp}; or pre = {xp []} for PCA. To avoid preprocessing of each subset use pre = {[] []}; or pre = 0 (zero).

          threshold:   {[]} Alternative PLSDA threshold level (default = [] = automatic)

structureoutput: [ {'no'} | 'yes' ] Governs output variables. 'Yes' returns a structure instead of individual variables. 'Yes' is default if only one output is requested.

                  rmsec:   [ 'no' | {'yes'} ] Governs calculation of RMSEC. When set to 'no', calculation of "all variables" model is skipped (unless specifically required for plots or requested with multiple outputs)

Outputs are the predictive residual error sum of squares (PRESS) press for each subset, the cumulative PRESS cumpress, the root mean square error of cross validation RMSECV rmsecv, the root mean square error of calibration RMSEC rmsec, the cross-valiated predictions for the y-block (if any) cvpred, and the fractional misclassifications misclassed. Misclassifications are only reported if the y-block is a logical (ie. discrete classes) vector. When options.plots is not 'none' the routine also plots both RMSECV and RMSEC.

Examples

[press,cumpress] = crossval(x,y,'nip',{'loo'},10);

[press,cumpress] = crossval(x,y,'pcr',{'vet',3},10);

[press,cumpress] = crossval(x,y,'nip',{'con',5},10);

[press,cumpress] = crossval(x,y,'sim',{'rnd',3,20},10);

 

pre = {preprocess('autoscale') preprocess('autoscale')};

opts.preprocessing = pre;

opts.plots = 'none';

[press,cumpress] = crossval(x,y,'sim',{'rnd',3,20},10,opts);

 

[press,cumpress] = crossval(x,[],'pca',{'loo'},10);

[press,cumpress] = crossval(x,[],'pca',{'vet',3},10);

[press,cumpress] = crossval(x,[],'pca',{'con',5},10);

See Also

pca, pcr, pls, preprocess, ncrossval


< crosscor datahat >