Publications
star(*) denotes equal contribution
2022

Sign and Basis Invariant Networks for Spectral Graph Representation Learning Derek Lim*, Joshua Robinson*, Lingxiao Zhao, Tess Smidt, Suvrit Sra, Haggai Maron, and Stefanie Jegelka Under review 2022 [Abs] [PDF]
Many machine learning tasks involve processing eigenvectors derived from data. Especially valuable are Laplacian eigenvectors, which capture useful structural information about graphs and other geometric objects. However, ambiguities arise when computing eigenvectors: for each eigenvector v, the sign flipped v is also an eigenvector. More generally, higher dimensional eigenspaces contain infinitely many choices of basis eigenvectors. These ambiguities make it a challenge to process eigenvectors and eigenspaces in a consistent way. In this work we introduce SignNet and BasisNet – new neural architectures that are invariant to all requisite symmetries and hence process collections of eigenspaces in a principled manner. Our networks are universal, i.e., they can approximate any continuous function of eigenvectors with the proper invariances. They are also theoretically strong for graph representation learning – they can approximate any spectral graph convolution, can compute spectral invariants that go beyond message passing neural networks, and can provably simulate previously proposed graph positional encodings. Experiments show the strength of our networks for learning spectral graph filters and learning graph positional encodings.

From Stars to Subgraphs: Uplifting Any GNN with Local Structure Awareness Lingxiao Zhao, Wei Jin, Leman Akoglu, and Neil Shah ICLR 2022 [Abs] [PDF] [Slides] [Code]
Message Passing Neural Networks (MPNNs) are a common type of Graph Neural Network (GNN), in which each node’s representation is computed recursively by aggregating representations (“messages”) from its immediate neighbors akin to a starshaped pattern. MPNNs are appealing for being efficient and scalable, however their expressiveness is upperbounded by the 1storder WeisfeilerLehman isomorphism test (1WL). In response, prior works propose highly expressive models at the cost of scalability and sometimes generalization performance. Our work stands between these two regimes: we introduce a general framework to uplift any MPNN to be more expressive, with limited scalability overhead and greatly improved practical performance. We achieve this by extending local aggregation in MPNNs from star patterns to general subgraph patterns (e.g., kegonets): in our framework, each node representation is computed as the encoding of a surrounding induced subgraph rather than encoding of immediate neighbors only (i.e. a star). We choose the subgraph encoder to be a GNN (mainly MPNNs, considering scalability) to design a general framework that serves as a wrapper to uplift any GNN. We call our proposed method GNNAK (GNN As Kernel), as the framework resembles a convolutional neural network by replacing the kernel with GNNs. Theoretically, we show that our framework is strictly more powerful than 1&2WL, and is not less powerful than 3WL. We also design subgraph sampling strategies which greatly reduce memory footprint and improve speed while maintaining performance. Our method sets new stateoftheart performance by large margins for several wellknown graph ML tasks; specifically, 0.08 MAE on ZINC, 74.79% and 86.887% accuracy on CIFAR10 and PATTERN respectively.

Graph Condensation for Graph Neural Networks Wei Jin, Lingxiao Zhao, Shichang Zhang, Yozen Liu, Jiliang Tang, and Neil Shah ICLR 2022 [Abs] [PDF] [Code]
Given the prevalence of largescale graphs in realworld applications, the storage and time for training neural models have raised increasing concerns. To alleviate the concerns, we propose and study the problem of graph condensation for graph neural networks (GNNs). Specifically, we aim to condense the large, original graph into a small, synthetic and highlyinformative graph, such that GNNs trained on the small graph and large graph have comparable performance. We approach the condensation problem by imitating the GNN training trajectory on the original graph through the optimization of a gradient matching loss and design a strategy to condense node futures and structural information simultaneously. Extensive experiments have demonstrated the effectiveness of the proposed framework in condensing different graph datasets into informative smaller graphs. In particular, we are able to approximate the original test accuracy by 95.3% on Reddit, 99.8% on Flickr and 99.0% on Citeseer, while reducing their graph size by more than 99.9%, and the condensed graphs can be used to train various GNN architectures.
2021

Fast Attributed Graph Embedding via Density of States Saurabh Sawlani, Lingxiao Zhao, and Leman Akoglu IEEE ICDM 2021 2021 [Abs] [PDF] [Code]
Given a nodeattributed graph, how can we efficiently represent it with few numerical features that expressively reflect its topology and attribute information? We propose ADOGE, for Attributed DOSbased Graph Embedding, based on density of states (DOS, a.k.a. spectral density) to tackle this problem. ADOGE is designed to fulfill a long desiderata of desirable characteristics. Most notably, it capitalizes on efficient approximation algorithms for DOS, that we extend to blend in node labels and attributes for the first time, making it fast and scalable for large attributed graphs and graph databases. Being based on the entire eigenspectrum of a graph, ADOGE can capture structural and attribute properties at multiple (“glocal”) scales. Moreover, it is unsupervised (i.e. agnostic to any specific objective) and lends itself to various interpretations, which makes it is suitable for exploratory graph mining tasks. Finally, it pro cesses each graph independent of others, making it amenable for streaming settings as well as parallelization. Through extensive experiments, we show the efficacy and efficiency of ADOGE on exploratory graph analysis and graph classification tasks, where it significantly outperforms unsupervised baselines and achieves competitive performance with modern supervised GNNs, while achieving the best tradeoff between accuracy and runtime.

Connecting Graph Convolutional Network and GraphRegularized PCA (Extended) Lingxiao Zhao, and Leman Akoglu Under review 2021 [Abs] [PDF] [Code]
Graph convolution operator of the GCN model is originally motivated from a localized ﬁrstorder approximation of spectral graph convolutions. This work stands on a different view; establishing a mathematical connection between graph convolution and graphregularized PCA (GPCA). Based on this connection, GCN architecture, shaped by stacking graph convolution layers, shares a close relationship with stacking GPCA. We empirically demonstrate that the unsupervised embeddings by GPCA paired with a 1 or 2layer MLP achieves similar or even better performance than GCN on semisupervised node classiﬁcation tasks across ﬁve datasets including Open Graph Benchmark 1. This suggests that the prowess of GCN is driven by graph based regularization. In addition, we extend GPCA to the (semi)supervised setting and show that it is equivalent to GPCA on a graph extended with “ghost” edges between nodes of the same label. Finally, we capitalize on the discovered relationship to design an effective initialization strategy based on stacking GPCA, enabling GCN to converge faster and achieve robust performance at large number of layers. Notably, the proposed initialization is generalpurpose and applies to other GNNs.

On Using Classification Datasets to Evaluate Graph Outlier Detection: Peculiar Observations and New Insights Lingxiao Zhao, and Leman Akoglu Big Data Journal, Special Issue on Evaluation and Experimental Design in Data Mining and Machine Learning, Aug. 2021 2021 [Abs] [PDF] [Code]
It is common practice of the outlier mining community to repurpose classification datasets toward evaluating various detection models. To that end, often a binary classification dataset is used, where samples from one of the classes is designated as the ‘inlier’ samples, and the other class is substantially downsampled to create the (groundtruth) ‘outlier’ samples. Graphlevel outlier detection (GLOD) is rarely studied but has many potentially influential realworld applications. In this study, we identify an intriguing issue with repurposing graph classification datasets for GLOD. We find that ROCAUC performance of the models changes significantly (“flips” from high to very low, even worse than random) depending on which class is downsampled. Interestingly, ROCAUCs on these two variants approximately sum to 1 and their performance gap is amplified with increasing propagations for a certain family of propagation based outlier detection models. We carefully study the graph embedding space produced by propagation based models and find two driving factors: (1) disparity between withinclass densities which is amplified by propagation, and (2) overlapping support (mixing of embeddings) across classes. We also study other graph embedding methods and downstream outlier detectors, and find that the intriguing “performance flip” issue still widely exists but which version of the downsample achieves higher performance may vary. Thoughtful analysis over comprehensive results further deeper our understanding of the established issue.
2020

PairNorm: Tackling Oversmoothing in GNNs Lingxiao Zhao, and Leman Akoglu ICLR 2020, Addis Ababa, Ethiopia 2020 [Abs] [PDF] [Slides] [Code] [Video]
The performance of graph neural nets (GNNs) is known to gradually decrease with increasing number of layers. This decay is partly attributed to oversmoothing, where repeated graph convolutions eventually make node embeddings indistinguishable. We take a closer look at two different interpretations, aiming to quantify oversmoothing. Our main contribution is PAIRNORM, a novel normalization layer that is based on a careful analysis of the graph convolution operator, which prevents all node embeddings from becoming too similar. What is more, PAIRNORM is fast, easy to implement without any change to network architecture nor any additional parameters, and is broadly applicable to any GNN. Experiments on realworld graphs demonstrate that PAIRNORM makes deeper GCN, GAT, and SGC models more robust against oversmoothing, and significantly boosts performance for a new problem setting that benefits from deeper GNNs.

Connecting Graph Convolutional Network and GraphRegularized PCA Lingxiao Zhao, and Leman Akoglu ICML 2020 GRLP Workshop 2020 [Abs] [PDF]
Graph convolution operator of the GCN model is originally motivated from a localized firstorder approximation of spectral graph convolutions.This work stands on a different view; establishing a connection between graph convolution and graphregularized PCA. Based on this connection, GCN architecture, shaped by stacking graph convolution layers, shares a close relationship with stacking graphregularized PCA (GPCA). We empirically demonstrate that the unsupervised embeddings by GPCA paired with a logistic regression classifier achieves similar performance to GCN on semisupervised node classification tasks. Further, we capitalize on the discovered relationship to design an effective initialization strategy for GCN based on stacking GPCA.

Generalizing Graph Neural Networks Beyond Homophily Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu, and Danai Koutra arXiv preprint arXiv:2006.11468 2020 [Abs] [PDF]
We investigate the representation power of graph neural networks in the semisupervised node classification task under heterophily or low homophily, i.e., in networks where connected nodes may have different class labels and dissimilar features. Most existing GNNs fail to generalize to this setting, and are even outperformed by models that ignore the graph structure (e.g., multilayer perceptrons). Motivated by this limitation, we identify a set of key designs – ego and neighborembedding separation, higherorder neighborhoods, and combination of intermediate representations – that boost learning from the graph structure under heterophily, and combine them into a new graph convolutional neural network, H2GCN. Going beyond the traditional benchmarks with strong homophily, our empirical analysis on synthetic and real networks shows that, thanks to the identified designs, H2GCN has consistently strong performance across the full spectrum of lowtohigh homophily, unlike competitive prior models without them.

Graph Unrolling Networks: Interpretable Neural Networks for Graph Signal Denoising Siheng Chen, Yonina C Eldar, and Lingxiao Zhao arXiv preprint arXiv:2006.01301 2020 [Abs] [PDF]
We propose an interpretable graph neural network framework to denoise single or multiple noisy graph signals. The proposed graph unrolling networks expand algorithm unrolling to the graph domain and provide an interpretation of the architecture design from a signal processing perspective. We unroll an iterative denoising algorithm by mapping each iteration into a single network layer where the feedforward process is equivalent to iteratively denoising graph signals. We train the graph unrolling networks through unsupervised learning, where the input noisy graph signals are used to supervise the networks. By leveraging the learning ability of neural networks, we adaptively capture appropriate priors from input noisy graph signals, instead of manually choosing signal priors. A core component of graph unrolling networks is the edgeweightsharing graph convolution operation, which parameterizes each edge weight by a trainable kernel function where the trainable parameters are shared by all the edges. The proposed convolution is permutationequivariant and can flexibly adjust the edge weights to various graph signals. We then consider two special cases of this class of networks, graph unrolling sparse coding (GUSC) and graph unrolling trend filtering (GUTF), by unrolling sparse coding and trend filtering, respectively. To validate the proposed methods, we conduct extensive experiments on both realworld datasets and simulated datasets, and demonstrate that our methods have smaller denoising errors than conventional denoising algorithms and stateoftheart graph neural networks. For denoising a single smooth graph signal, the normalized mean square error of the proposed networks is around 40% and 60% lower than that of graph Laplacian denoising and graph wavelets, respectively.
2018

A Quest for Structure: Jointly Learning the Graph Structure and SemiSupervised Classification Xuan Wu*, Lingxiao Zhao*, and Leman Akoglu In Proceedings of the 27th ACM International Conference on Information and Knowledge Management 2018 [Abs] [PDF] [Slides] [Code]
How should one construct a graph from the input pointcloud data for graphbased SSL? In this work we introduce a new, parallel graph learning framework (called PGlearn) for the graph construction step of SSL. Our solution has two main ingredients: (1) a gradientbased optimization of the edge weights (more specifically, different kernel bandwidths in each dimension) based on a validation loss function, and (2) a parallel hyperparameter search algorithm with an adaptive resource allocation scheme. In essence, (1) allows us to search around a (random) initial hyperparameter configuration for a better one with lower validation loss. Since the search space of hyperparameters is huge for highdimensional problems, (2) empowers our gradientbased search to go through as many different initial configurations as possible, where runs for relatively unpromising starting configurations are terminated early to allocate the time for others. As such, PGlearn is a carefullydesigned hybrid of random and adaptive search. Through experiments on multiclass classification problems, we show that PGlearn significantly outperforms a variety of existing graph construction schemes in accuracy (per fixed time budget for hyperparameter tuning), and scales more effectively to high dimensional problems.