Professor George Karypis

CSENG Computer Science & Eng
College of Science & Engineering
Twin Cities

Project Title:

High-Performance and Big Data Research

This group's research spans the fields of high-performance computing, graph learning, natural language processing, machine learning methods for computational chemistry and material science, and recommender systems.

Research in the area of high-performance computing focuses on the design and implementation of scalable algorithms to tackle the bottleneck of training machine learning models on large scale datasets that arise in real-world applications. The challenge is resolved in two ways: partition and distribute large datasets to multiple computation clusters for distributed training; and reduce the memory cost of the model while maintaining performance. The group develops tools for distributed graph partitioning and memory efficient graph neural networks. These tools are used extensively for training graph neural networks on real-world graphs with billions of nodes and edges.
Research in graph learning falls into four categories: unsupervised graph representation learning; knowledge graph-based question answering; graph neural networks (GNNs) on heterogeneous graphs; and distillation approaches for GNNs. Unsupervised graph representation learning aims to effectively encode topological structure of graphs as well as node/edge features into node embeddings and graph embeddings for downstream graph-related applications. Knowledge graph-based question answering aims to extract information from knowledge graphs to answer questions that are in the form of natural languages. GNNs on heterogeneous graphs aim to capture information that is multi-hops away in a heterogeneous graph while avoiding the over-smoothing problem. Distillation for GNNs allows to transfer knowledge from graphs to graph-free models, which are much more lightweight during inference.
Natural language processing research focuses on the study of large language models (LLMs) and their applications in question answering, text-segmentation, information retrieval, and citation analysis. Since LLMs operate on unstructured texts, the researchers explore techniques for injecting structured knowledge, such as semantics from knowledge graphs, to LLMs.
In computational chemistry and material science, the researchers aim to tackle two essential challenges - neural representation learning and label efficient neural network training. They use GNNs to encode geometric structure and interatomic interactions into neural molecule/material representations for downstream quantum predictions. To reduce the cost of obtaining labels for training neural networks, they propose weakly supervised learning methods to leverage noisy labels that effectively improve the generalizability of neural networks. The group's methods achieve state-of-the-art performance and can help accelerate molecule screening and drug discovery.
In recommender systems, the researchers focus on designing and developing methods to improve the quality of recommendations served to users of the system. After thoroughly exploring large-scale datasets, they have identified certain fundamental characteristics that affect the performance of existing recommendation schemes. This has led to the development of new Top-N recommendation methods that outperform the state-of-the-art while also being efficient and readily applicable in large-scale settings. In the era of deep learning, the researchers push Top-N recommendation performance even further by leveraging cutting-edge deep learning methods for better user-modeling and item representation learning.

Project Investigators

Professor George Karypis

Petros Karypis

Konstantinos Mavromatis

Miguel Romero Calvo

Zeren Shui

Ancy Tom

Are you a member of this group? Log in to see more information.