David del Val
Software Engineer Mathematician
Currently working in the aerospace sector and researching functional data analysis techniques.
Currently working in the aerospace sector and researching functional data analysis techniques.
Partial least squares (PLS) is a family of dimensionality reduction techniques formulated in the context of regression problems that take advantage of linear dependencies between the predictor and target variables. Specifically, the PLS components are determined by projecting onto directions along which the cross-covariance between projections within the spaces of the predictor and of the target variables is maximized. In doing so, PLS combines the optimization criterion of principal component analysis (PCA), which consists in maximizing the variance along directions within the space of predictor variables, and the maximization of the correlation of these projections with linear combinations of the target variables. The components extracted by PLS can then be utilized to formulate models that are simpler and, in some cases, more accurate than those based on the original observations.
In this work, a general formulation of PLS is made for regresssion problems with scalar response, assuming that the predictor variables are elements of a Hilbert space. Besides the standard (Euclidean) inner product, this space is endowed with a generalized, conjugate inner product defined under the metric induced by the inverse of the covariance operator of the predictor variables. PLS is an iterative process whose goal is to identify a sequence of subspaces of increasing dimension. These subspaces are the linear span of a set of elements in the Hilbert space that form a basis, which is not necessarily orthogonal. At each iteration, the PLS basis is enlarged by incorporating the element of the Hilbert space for which the covariance with the target variable is maximized, subject to some constraints. Depending on the types of constraints considered, different PLS bases that span the same subspace can be identified. If orthogonality with the previous basis elements is enforced, one obtains the orthogonal PLS basis computed in the NIPALS algorithm. The conjugate basis is obtained by imposing a conjugacy relation defined in terms of the generalized inner product. This basis can be constructed using the conjugate gradients algorithm. It is shown that both the orthogonal and the conjugate bases span a sequence of Krylov subspaces defined in terms of the covariances of the predictor variables and the covariances between the predictor and target variables. This allows the identification of a third PLS basis: the Krylov basis, which contains the elements obtained by repeatedly applying the regressor covariance operator onto the cross-covariance. The generality of the formulation makes it possible to apply PLS not only to multivariate and functional data, which naturally reside in Euclidean spaces, but also to more complex mathematical objects, such as graphs or texts, by mapping them (e.g., through kernel embeddings) onto elements of a Hilbert space.
Based on the connection with conjugate gradients, it is possible to analyze the convergence of PLS to ordinary least squares (OLS) in multilinear regression problems with multivariate predictors and scalar response. In particular, it is possible to derive an upper bound on the difference between the PLS and OLS regression coefficients as a function of the number of components considered in PLS. This bound depends only on the distribution of the eigenvalues of the covariance matrix of the predictor variables. When the number of components is equal to the number of distinct eigenvalues of this covariance matrix, the PLS regression coefficient coincides with the one computed using OLS. In practice, if the eigenvalues are grouped in clusters, PLS provides an accurate approximation to the OLS regression coefficient when the number of components considered equals the number of clusters in the spectrum of the regressors’ covariance matrix.
Finally, a series of experiments on real-world datasets are carried out to assess the performance of PLS as a dimensionality reduction method, especially in comparison with PCA. Both multivariate and functional datasets are considered. In the problems analyzed, assuming a linear regression model, PLS is more effective than PCA when few components are used, while the differences become smaller as the number of components considered increases. Additional experiments are carried out in which PCA and PLS are used as a preprocessing step in combination with more general predictors. Specifically, the first components that result from the analysis are used as inputs of non-linear regressors, such as support vector machines, neural networks and random forests. The results of this empirical evaluation show that PLS can be an effective dimensionality reduction method in real-world problems even when the dependencies between the predictor and the target variables are non-linear.
Functional data analysis (FDA) is the branch of statistics that studies quantities that depend on continuous parameters, such as temperature variations at a geographical location or the heart’s electrical activity in the cardiac cycle. These data are, in principle, infinitely-dimensional because the functions observed can be evaluated at any point within their domain. In this context, it can be helpful to identify components that capture relevant information. These components can then be utilized to formulate models that are simpler, typically more interpretable and, in some cases, more accurate than those based on the original observations.
In this work, two dimensionality reduction techniques are explored. In principal component analysis (PCA), the components are obtained by maximizing the explained variance for a set of variables. Partial least squares (PLS) seeks to maximize the covariance between the components of two groups or blocks of variables. These dimensionality reduction techniques can be extended to deal with functional data. They provide a way to summarize the information of trajectories in an infinitely-dimensional space by projecting onto a finite-dimensional one. Moreover, one can apply these techniques in functional regression methods to address problems such as multicollinearity or excessive roughness of the fitted coefficients. In particular, to address this last issue, roughness penalty terms can be added to the optimization criteria of the dimensionality reduction process to smooth the projection functions.
This project aims to design and implement computational tools for functional PCA and PLS, and integrate them into the library scikit-fda. Scikit-fda is a Python library for data analysis and machine learning with functional data. This library seeks to provide an alternative to popular R packages such as fda or fda.usc in one of the most commonly used languages for data analysis and machine learning. Furthermore, this library provides a compatible interface with scikit-learn. Thus, the interface of the library should be familiar to many Python scientific users, and functionality of scikit-learn can be reused.
Finally, three examples in real datasets showcase the effectiveness of these methods. The first two examples demonstrate the regularization capabilities of each of the regression methods considered. In turn, the last example focuses on how PLS might be considerably superior to PCA in some circumstances.
Dimensionality reduction techniques have become very useful tools when it comes to working with high-dimensionality datasets. When it comes to functional data analysis (FDA), they are even more powerful. They can be used to project data from an infinite-dimensional (functional) data space to a finite dimensional space, making it considerably easier to analyze. One of the methods used to accomplish these goals is partial least squares (PLS). However, the rationale behind it is rather different when compared to other dimensionality reduction methods such as principal components (PCA) or canonical correlations (CCA). This work intends to introduce PLS and explore its properties from a new point of view. In doing so, we hope to produce an easy-to-follow explanation for those already familiar with PCA or CCA, but not necessarily with PLS. Moreover, this document consolidates results that are already known but whose analysis or proofs are hard to find in the PLS literature.
One of the fields where PLS shines the most is adjusting linear regression models. Therefore, we will cover the steps required to adapt the PLS dimensionality reduction technique to perform linear regression. In particular, the scalar response model is of particular interest. In that case we will prove that PLS is equivalent to regularizing the ordinary least squares estimator (OLS). Moreover, the changes that PLS needs to handle functional data will be covered and numerical results will be presented to showcase the derived properties.