College of Liberal Arts
Twin Cities
At the heart of data science is an uncertainty quantification, particularly for a complex modeling procedure such as deep neural networks involving tuning. This project will concern a novel data perturbation simulator (DPS) to generate synthetic data to replicate a raw sample in statistical inference, including numerical and unstructured data such as texts. Also, DPS permits an estimation of a sample's distribution or density, thus the sampling distribution of any statistic and its distributional characteristics. On this ground, these researchers are developing a Monte Carlo data perturbation inference framework (MCDP) with a statistical guarantee of its validity. In pivotal inference, MCDP yields a valid conclusion without a reference sample for estimating the data-generating mechanism as if one had performed simulations. In non-pivotal inference, MCDP uses an independent reference sample to separate the distribution estimation from inference, yielding a credible conclusion even with limited data. Finally, the group will concentrate on post-inference, generative inference, and natural language inference to demonstrate its potential as an inference tool for complex problems.