College of Science & Engineering
Twin Cities
Recent years have seen an increase in data-driven approaches to scientific discovery and technological innovation. Datasets of enormous size and variety are routinely used to explore new phenomena and train algorithms. As data volume and complexity grow, however, data quality often decreases. Data can be plagued by noise, outliers, missing values, and other forms of information loss. These researchers are investigating two complementary methodologies for processing corrupted data. For the first class of methods, the data will be assumed to have an underlying low-rank structure that is perturbed by both additive noise and linear filters. New results from random matrix theory will be developed for signal recovery problems in this setting, including the estimation of covariance and distance matrices. The second class of methods will exploit geometric structures in data. Novel methods from computational harmonic analysis will be devised to both learn the geometry of a dataset by uncovering relational information between data points, and to use this geometric information for clustering, regression, and other tasks.