David del Val

Hi, my name is

David del Val

Software Engineer Mathematician

Currently working in the aerospace sector and researching functional data analysis techniques.

About Me

I am a software engineer passionate about artificial intelligence and machine learning. I have a background in computer science and mathematics. As my undergraduate studies progressed, so did my interest in artificial intelligence and statistics. To deepen my understanding in those fields, I enrolled in a master’s degree in data science at the Universidad Autónoma de Madrid. Concurrently, I have been working as a software engineer in the aerospace sector at GMV in Tres Cantos, Madrid. As part of my master’s thesis and continuing my undergraduate research, I am currently studying the application of dimensionality reduction techniques to high-dimensionality and functional data. I also take pictures sometimes.

Experience

Work Package Manager

GMV Aerospace and Defense

Jul 2023 - present

Work Package Manager in a project that aims to extend the functionality of the operational simulators used in the European Space Operations Center by integrating AI into them and providing them with Digital Twin capabilities. I have contributed to the concept analysis and design, and I am responsible for the implementation of the designed functionality into the operational simulator of a currently-flying Earth Observation mission

International Space Station Colombus Module Control Center, Oberpfaffenhofen, Germany

Software Engineer

GMV Aerospace and Defense

Sep 2022 - present

Software developer in operational simulators for satellites and deep space missions. I have contributed to the development of simulators for Earth Observation satellites, deep space missions and International Space Station components for the European Space Agency (ESA).

Software Developer Intern

GMV Aerospace and Defense

Jun 2022- Aug 2022

Summer internship at GMV working on deploying CD/CI systems. The resulting CD/CI systems provided the development team with a daily summary of the testing status and the software requirements tested successfully

Education

Master of Data Science

Universidad Autónoma de Madrid (UAM)

Sep 2023 - Sep 2024

I graduated in September 2024 with a GPA of 9.56/10.

Some of the subjects that I studied during my Master’s are:

Advanced Methods in Machine Learning
Reinforcement Learning
Natural Language Processing

Bayesian Methods
Temporal Information Processing
Stochastic Processes

Bachelor of Computer Science

Universidad Autónoma de Madrid (UAM)

Sep 2018 - July 2023

I graduated in July 2023 with a GPA of 9.56/10.

Bachelor of Mathematics

Universidad Autónoma de Madrid (UAM)

Sep 2018 - July 2023

I graduated in July 2023 with a GPA of 9.55/10.

High School

IES Baltasar Gracián.

Sep 2016 - June 2018

I graduated in July 2018 with a GPA of 9.83/10.

Research

Graduate Research Project: Master's Thesis

Machine Learning Group & Statistics Dept. at UAM

Jul 2023 - present

Continuation of my undergraduate research on dimensionality reduction techniques. The focus so far has been on studying the relationship between Partial Least Squares and Ordinary Least Squares regression. Moving forward, the goal is to generalize the obtained results so far to high dimensionality and functional datasets.

Undergraduate Research Project

Machine Learning Group & Statistics Dept. at UAM
Scikit-fda development team

Jun 2022 - Jul 2023

Research on the application of dimensionality reduction techniques to functional data. As part of this research, two degree theses were completed, and numerous contributions were made to the open-source python package scikit-fda (available in GitHub ), maintained by the Machine Learning Group at UAM. Among those, the most significant is the implementation of a Partial Least Squares Regression algorithm capable of accepting multivariate and functional data as regressor or response, and apply regularization to the functional blocks.

Publications

Preprints

Relation between PLS and OLS regression in terms of the eigenvalue distribution of the regressor covariance matrix

David del Val , J.R. Berrendero , A. Suárez

Submitted to the Journal of multivariate Statistics on March 2024 (Under review)

This paper describes the relation between the PLS and OLS estimators for linear regression models. In particular, the impact of the eigenvalue distribution of the regressor covariance matrix on the distance between the PLS and OLS estimators is studied.

Available at: https://arxiv.org/abs/2312.01379

Show full abstract

Partial least squares (PLS) is a dimensionality reduction technique introduced in the field of chemometrics and successfully employed in many other areas. The PLS components are obtained by maximizing the covariance between linear combinations of the regressors and of the target variables. In this work, we focus on its application to scalar regression problems. PLS regression consists in finding the least squares predictor that is a linear combination of a subset of the PLS components. Alternatively, PLS regression can be formulated as a least squares problem restricted to a Krylov subspace.
This equivalent formulation is employed to analyze the distance between

{\hat{β}}_{PLS}^{(L)}

, the PLS estimator of the vector of coefficients of the linear regression model based on

L

PLS components, and

{\hat{β}}_{OLS}

, the one obtained by ordinary least squares (OLS), as a function of

L

. Specifically,

{\hat{β}}_{PLS}^{(L)}

is the vector of coefficients in the aforementioned Krylov subspace that is closest to

{\hat{β}}_{OLS}

in terms of the Mahalanobis distance with respect to the covariance matrix of the OLS estimate. We provide a bound on this distance that depends only on the distribution of the eigenvalues of the regressor covariance matrix. Numerical examples on synthetic and real-world data are used to illustrate how the distance between

{\hat{β}}_{PLS}^{(L)}

and

{\hat{β}}_{OLS}

depends on the number of clusters in which the eigenvalues of the regressor covariance matrix are grouped.

Conferences

ESAW conference at the European Space Operations Center

Infrastructure for running Digital Twins using ESA Ground Segment

D. del Val , M.J. Prokopczyk , Y. Al-Khazraji , D. Segneri , F. Antonello

June 2024

This work was presented at the 9th European Mission Operation Data System Architecture Workshop at the European Space Operations Center (ESOC). The presentation covered the infrastructure developed to provide Operational Simulators with the functionality that characterizes digital twins, including automatic alignment with the real spacecraft and the integration of ML models.

Available at: https://atpi.eventsair.com/esaw-2024/programme and http://delvaldavid.com/files/esaw.pdf

AIMSYS: Advancing from simulators to Digital Twins extended with AI

D. del Val , L. García , A. Antúnez , M. Rollán

June 2024

This work was pressented at the 5th edition of the Space Engineering conference in Spain. We covered the steps required to improve the fidelity and functionality of current operational simulators, with the goal of transitioning from traditional operational simulators to digital twins

Dissertations

PLS Regression for Multivariate and Functional Data

David del Val , A. Suárez (Co-advisor) , J.R. Berrendero (Co-advisor)

June 2023

This work introduces a general formulation of PLS for regression problems with scalar response, assuming that the predictor variables are elements of a Hilbert space. By considering different inner products in this space, several alternative formulations of PLS can be derived. The generality of this formulation makes it possible to apply PLS not only to multivariate and functional data, which naturally reside in Hilbert spaces, but also to more complex mathematical objects, such as graphs or text, by mapping them (e.g., through kernel embeddings) onto elements of a Hilbert space.

Available at: http://delvaldavid.com/files/msc.pdf

Show full abstract

Partial least squares (PLS) is a family of dimensionality reduction techniques formulated in the context of regression problems that take advantage of linear dependencies between the predictor and target variables. Specifically, the PLS components are determined by projecting onto directions along which the cross-covariance between projections within the spaces of the predictor and of the target variables is maximized. In doing so, PLS combines the optimization criterion of principal component analysis (PCA), which consists in maximizing the variance along directions within the space of predictor variables, and the maximization of the correlation of these projections with linear combinations of the target variables. The components extracted by PLS can then be utilized to formulate models that are simpler and, in some cases, more accurate than those based on the original observations.

In this work, a general formulation of PLS is made for regresssion problems with scalar response, assuming that the predictor variables are elements of a Hilbert space. Besides the standard (Euclidean) inner product, this space is endowed with a generalized, conjugate inner product defined under the metric induced by the inverse of the covariance operator of the predictor variables. PLS is an iterative process whose goal is to identify a sequence of subspaces of increasing dimension. These subspaces are the linear span of a set of elements in the Hilbert space that form a basis, which is not necessarily orthogonal. At each iteration, the PLS basis is enlarged by incorporating the element of the Hilbert space for which the covariance with the target variable is maximized, subject to some constraints. Depending on the types of constraints considered, different PLS bases that span the same subspace can be identified. If orthogonality with the previous basis elements is enforced, one obtains the orthogonal PLS basis computed in the NIPALS algorithm. The conjugate basis is obtained by imposing a conjugacy relation defined in terms of the generalized inner product. This basis can be constructed using the conjugate gradients algorithm. It is shown that both the orthogonal and the conjugate bases span a sequence of Krylov subspaces defined in terms of the covariances of the predictor variables and the covariances between the predictor and target variables. This allows the identification of a third PLS basis: the Krylov basis, which contains the elements obtained by repeatedly applying the regressor covariance operator onto the cross-covariance. The generality of the formulation makes it possible to apply PLS not only to multivariate and functional data, which naturally reside in Euclidean spaces, but also to more complex mathematical objects, such as graphs or texts, by mapping them (e.g., through kernel embeddings) onto elements of a Hilbert space.

Based on the connection with conjugate gradients, it is possible to analyze the convergence of PLS to ordinary least squares (OLS) in multilinear regression problems with multivariate predictors and scalar response. In particular, it is possible to derive an upper bound on the difference between the PLS and OLS regression coefficients as a function of the number of components considered in PLS. This bound depends only on the distribution of the eigenvalues of the covariance matrix of the predictor variables. When the number of components is equal to the number of distinct eigenvalues of this covariance matrix, the PLS regression coefficient coincides with the one computed using OLS. In practice, if the eigenvalues are grouped in clusters, PLS provides an accurate approximation to the OLS regression coefficient when the number of components considered equals the number of clusters in the spectrum of the regressors’ covariance matrix.

Finally, a series of experiments on real-world datasets are carried out to assess the performance of PLS as a dimensionality reduction method, especially in comparison with PCA. Both multivariate and functional datasets are considered. In the problems analyzed, assuming a linear regression model, PLS is more effective than PCA when few components are used, while the diﬀerences become smaller as the number of components considered increases. Additional experiments are carried out in which PCA and PLS are used as a preprocessing step in combination with more general predictors. Specifically, the first components that result from the analysis are used as inputs of non-linear regressors, such as support vector machines, neural networks and random forests. The results of this empirical evaluation show that PLS can be an effective dimensionality reduction method in real-world problems even when the dependencies between the predictor and the target variables are non-linear.

Dimensionality reduction for functional regression

David del Val , A. Suárez (Advisor)

June 2023

This work explores the application of principal component analysis and partial least squares to functional data. In order to avoid excessive roughness in the fitted coefficients roughness penalty terms are considered as part of the optimization criteria. The developed methods have been incorporated into the Python library scikit-fda.

Available at: http://delvaldavid.com/files/tfg_cs.pdf

Show full abstract

Functional data analysis (FDA) is the branch of statistics that studies quantities that depend on continuous parameters, such as temperature variations at a geographical location or the heart’s electrical activity in the cardiac cycle. These data are, in principle, infinitely-dimensional because the functions observed can be evaluated at any point within their domain. In this context, it can be helpful to identify components that capture relevant information. These components can then be utilized to formulate models that are simpler, typically more interpretable and, in some cases, more accurate than those based on the original observations.

In this work, two dimensionality reduction techniques are explored. In principal component analysis (PCA), the components are obtained by maximizing the explained variance for a set of variables. Partial least squares (PLS) seeks to maximize the covariance between the components of two groups or blocks of variables. These dimensionality reduction techniques can be extended to deal with functional data. They provide a way to summarize the information of trajectories in an infinitely-dimensional space by projecting onto a finite-dimensional one. Moreover, one can apply these techniques in functional regression methods to address problems such as multicollinearity or excessive roughness of the fitted coefficients. In particular, to address this last issue, roughness penalty terms can be added to the optimization criteria of the dimensionality reduction process to smooth the projection functions.

This project aims to design and implement computational tools for functional PCA and PLS, and integrate them into the library scikit-fda. Scikit-fda is a Python library for data analysis and machine learning with functional data. This library seeks to provide an alternative to popular R packages such as fda or fda.usc in one of the most commonly used languages for data analysis and machine learning. Furthermore, this library provides a compatible interface with scikit-learn. Thus, the interface of the library should be familiar to many Python scientific users, and functionality of scikit-learn can be reused.

Finally, three examples in real datasets showcase the effectiveness of these methods. The first two examples demonstrate the regularization capabilities of each of the regression methods considered. In turn, the last example focuses on how PLS might be considerably superior to PCA in some circumstances.

Partial Least Squares (PLS) for multivariate and functional linear regression

David del Val , J.R. Berrendero (Advisor)

May 2023

This work introduces PLS from an alternative point of view. In doing so, we hope to produce an easy-to-follow explanation for those already familiar with PCA or CCA, but not necessarily with PLS.

Available at: http://delvaldavid.com/files/tfg_math.pdf

Show full abstract

Dimensionality reduction techniques have become very useful tools when it comes to working with high-dimensionality datasets. When it comes to functional data analysis (FDA), they are even more powerful. They can be used to project data from an infinite-dimensional (functional) data space to a finite dimensional space, making it considerably easier to analyze. One of the methods used to accomplish these goals is partial least squares (PLS). However, the rationale behind it is rather different when compared to other dimensionality reduction methods such as principal components (PCA) or canonical correlations (CCA). This work intends to introduce PLS and explore its properties from a new point of view. In doing so, we hope to produce an easy-to-follow explanation for those already familiar with PCA or CCA, but not necessarily with PLS. Moreover, this document consolidates results that are already known but whose analysis or proofs are hard to find in the PLS literature.

One of the fields where PLS shines the most is adjusting linear regression models. Therefore, we will cover the steps required to adapt the PLS dimensionality reduction technique to perform linear regression. In particular, the scalar response model is of particular interest. In that case we will prove that PLS is equivalent to regularizing the ordinary least squares estimator (OLS). Moreover, the changes that PLS needs to handle functional data will be covered and numerical results will be presented to showcase the derived properties.

Projects

Algorithms Competitive Programming

Computer Programming Repository

This repository contains solutions to more than 900 problems across many different online judges, implementations and explanations for many advanced algorithms

Available in GitHub

OpenGL 3D Modelling C++ Game Engine

Adventure Game

3D Game and game engine developed in C++. The game engine includes procedural terrain generation,a rudimentary physics model, and a rendering engine that supports 3D models and textures, including normal and specular maps. Shadows were implemented merging two shadow maps with different resolutions.

Available in GitHub

libusb Java Swing

Linear Amplifier Desktop GUI

An app to display the the status and controlls of an Expert 1.3k amplifier connected by usb. This app was developed in Java using libusb for the communication with the amplifier, and Swing for the GUI.

Available in GitHub

Achievements

Best academic record in CS + Mathematics

Best academic record of the joint program CS + Mathematics at UAM out of the 49 students that graduated in 2023.

24 honorable mentions in Computer Science

Awarded by UAM to the top ~2% of students enrolled in each course.

15 honorable mentions in Mathematics

Awarded by UAM to the top ~2% of students enrolled in each course.

Excellent academic performance scholarship

Scholarship awarded by the Autonomous Region of Madrid. Obtained in the 2019-2020, 2020-2021, 2021-2022 and 2022-2023 academic years.

Best relative position of UAM in ICPC SWERC

SWERC is the ICPC regional contest, where students from the entire southwest of Europe compete. Our placement of 16th out of 107 in 2021 was the best relative result of my university since it entered the contest in 2001.

Get in Touch

My inbox is always open. Whether you have a question or just want to say hi, I’ll try my best to get back to you!

Mail me