Publications
star(*) denotes equal contribution
2024

Pard: PermutationInvariant Autoregressive Diffusion for Graph Generation Lingxiao Zhao, Xueying Ding, and Leman Akoglu arXiv preprint arXiv:2402.03687 2024 [Abs] [PDF]
Graph generation has been dominated by autoregressive models due to their simplicity and effectiveness, despite their sensitivity to ordering. Yet diffusion models have garnered increasing attention, as they offer comparable performance while being permutationinvariant. Current graph diffusion models generate graphs in a oneshot fashion, but they require extra features and thousands of denoising steps to achieve optimal performance. We introduce PARD, a Permutationinvariant Auto Regressive Diffusion model that integrates diffusion models with autoregressive methods. PARD harnesses the effectiveness and efficiency of the autoregressive model while maintaining permutation invariance without ordering sensitivity. Specifically, we show that contrary to sets, elements in a graph are not entirely unordered and there is a unique partial order for nodes and edges. With this partial order, PARD generates a graph in a blockbyblock, autoregressive fashion, where each block’s probability is conditionally modeled by a shared diffusion model with an equivariant network. To ensure efficiency while being expressive, we further propose a higherorder graph transformer, which integrates transformer with PPGN. Like GPT, we extend the higherorder graph transformer to support parallel training of all blocks. Without any extra features, PARD achieves stateoftheart performance on molecular and nonmolecular datasets, and scales to large datasets like MOSES containing 1.9M molecules.

Improving and Unifying Discrete&Continuoustime Discrete Denoising Diffusion Lingxiao Zhao, Xueying Ding, Lijun Yu, and Leman Akoglu arXiv preprint arXiv:2402.03701 2024 [Abs] [PDF]
Discrete diffusion models have seen a surge of attention with applications on naturally discrete data such as language and graphs. Although discretetime discrete diffusion has been established for a while, only recently Campbell et al. (2022) introduced the first framework for continuoustime discrete diffusion. However, their training and sampling processes differ significantly from the discretetime version, necessitating nontrivial approximations for tractability. In this paper, we first present a series of mathematical simplifications of the variational lower bound that enable more accurate and easytooptimize training for discrete diffusion. In addition, we derive a simple formulation for backward denoising that enables exact and accelerated sampling, and importantly, an elegant unification of discretetime and continuoustime discrete diffusion. Thanks to simpler analytical formulations, both forward and now also backward probabilities can flexibly accommodate any noise distribution, including different noise distributions for multielement objects. Experiments show that our proposed USD3 (for Unified Simplified Discrete Denoising Diffusion) outperform all SOTA baselines on established datasets. We opensource our unified code at https://github.com/LingxiaoShawn/USD3.

Descriptive Kernel Convolution Network with Improved Random Walk Kernel MengChieh Lee*, Lingxiao Zhao*, and Leman Akoglu In 2024 [Abs] [PDF]
Graph kernels used to be the dominant approach to feature engineering for structured data, which are superseded by modern GNNs as the former lacks learnability. Recently, a suite of Kernel Convolution Networks (KCNs) successfully revitalized graph kernels by introducing learnability, which convolves input with learnable hidden graphs using a certain graph kernel. The random walk kernel (RWK) has been used as the default kernel in many KCNs, gaining increasing attention. In this paper, we first revisit the RWK and its current usage in KCNs, revealing several shortcomings of the existing designs, and propose an improved graph kernel RWK+, by introducing colormatching random walks and deriving its efficient computation. We then propose RWK+CN, a KCN that uses RWK+ as the core kernel to learn descriptive graph features with an unsupervised objective, which can not be achieved by GNNs. Further, by unrolling RWK+, we discover its connection with a regular GCN layer, and propose a novel GNN layer RWK+Conv. In the first part of experiments, we demonstrate the descriptive learning ability of RWK+CN with the improved random walk kernel RWK+ on unsupervised pattern mining tasks; in the second part, we show the effectiveness of RWK+ for a variety of KCN architectures and supervised graph learning tasks, and demonstrate the expressiveness of RWK+Conv layer, especially on the graphlevel tasks. RWK+ and RWK+Conv adapt to various realworld applications, including web applications such as bot detection in a webscale Twitter social network, and community classification in Reddit social interaction networks.
2023

ADAMM: Anomaly Detection of Attributed Multigraphs with Metadata: A Unified Neural Network Approach Konstantinos Sotiropoulos*, Lingxiao Zhao*, Pierre Jinghong Liang, and Leman Akoglu In 2023 IEEE International Conference on Big Data (BigData) 2023 [Abs] [PDF]
Given a complex graph database of node and edgeattributed multigraphs as well as associated metadata for each graph, how can we spot the anomalous instances? Many realworld problems can be cast as graph inference tasks where the graph representation could capture complex relational phenomena (e.g., transactions among financial accounts in a journal entry), along with metadata reflecting tabular features (e.g. approver, effective date, etc.). While numerous anomaly detectors based on Graph Neural Networks (GNNs) have been proposed, none are capable of directly handling directed graphs with multiedges and selfloops. Furthermore, the simultaneous handling of relational and tabular features remains an unexplored area. In this work we propose ADAMM, a novel graph neural network model that handles directed multigraphs, providing a unified endtoend architecture that fuses metadata and graphlevel representation learning through an unsupervised anomaly detection objective. Experiments on datasets from two different domains, namely, generalledger journal entries from different firms (accounting) as well as human GPS trajectories from thousands of individuals (urban mobility), validate ADAMM’s generality and detection effectiveness of expertguided and groundtruth anomalies. Notably, ADAMM outperforms existing baselines that handle the two data modalities (graph and metadata) separately with post hoc synthesis efforts.

DSV: an alignment validation loss for selfsupervised outlier model selection Jaemin Yoo, Yue Zhao, Lingxiao Zhao, and Leman Akoglu In Joint European Conference on Machine Learning and Knowledge Discovery in Databases 2023 [Abs] [PDF]
Selfsupervised learning (SSL) has proven effective in solving various problems by generating internal supervisory signals. Unsupervised anomaly detection, which faces the high cost of obtaining true labels, is an area that can greatly benefit from SSL. However, recent literature suggests that tuning the hyperparameters (HP) of data augmentation functions is crucial to the success of SSLbased anomaly detection (SSAD), yet a systematic method for doing so remains unknown. In this work, we propose DSV (Discordance and Separability Validation), an unsupervised validation loss to select highperforming detection models with effective augmentation HPs. DSV captures the alignment between an augmentation function and the anomalygenerating mechanism with surrogate losses, which approximate the discordance and separability of test data, respectively. As a result, the evaluation via DSV leads to selecting an effective SSAD model exhibiting better alignment, which results in high detection accuracy. We theoretically derive the degree of approximation conducted by the surrogate losses and empirically show that DSV outperforms a wide range of baselines on 21 realworld tasks.

EndtoEnd Augmentation Hyperparameter Tuning for SelfSupervised Anomaly Detection Jaemin Yoo, Lingxiao Zhao, and Leman Akoglu arXiv preprint arXiv:2306.12033 2023 [Abs] [PDF]
Selfsupervised learning (SSL) has emerged as a promising paradigm that presents selfgenerated supervisory signals to realworld problems, bypassing the extensive manual labeling burden. SSL is especially attractive for unsupervised tasks such as anomaly detection, where labeled anomalies are often nonexistent and costly to obtain. While selfsupervised anomaly detection (SSAD) has seen a recent surge of interest, the literature has failed to treat data augmentation as a hyperparameter. Meanwhile, recent works have reported that the choice of augmentation has significant impact on detection performance. In this paper, we introduce STSSAD (SelfTuning SelfSupervised Anomaly Detection), the first systematic approach to SSAD in regards to rigorously tuning augmentation. To this end, our work presents two key contributions. The first is a new unsupervised validation loss that quantifies the alignment between the augmented training data and the (unlabeled) test data. In principle we adopt transduction, quantifying the extent to which augmentation mimics the true anomalygenerating mechanism, in contrast to augmenting data with arbitrary pseudo anomalies without regard to test data. Second, we present new differentiable augmentation functions, allowing data augmentation hyperparameter(s) to be tuned endtoend via our proposed validation loss. Experiments on two testbeds with semantic class anomalies and subtle industrial defects show that systematically tuning augmentation offers significant performance gains over current practices.

Density of States for Fast Embedding NodeAttributed Graphs Lingxiao Zhao, Saurabh Sawlani, and Leman Akoglu Knowledge and Information Systems (KAIS) (Journal) 2023 [PDF]

Sign and Basis Invariant Networks for Spectral Graph Representation Learning Derek Lim*, Joshua Robinson*, Lingxiao Zhao, Tess Smidt, Suvrit Sra, Haggai Maron, and Stefanie Jegelka ICLR 2023 [Abs] [PDF]
Many machine learning tasks involve processing eigenvectors derived from data. Especially valuable are Laplacian eigenvectors, which capture useful structural information about graphs and other geometric objects. However, ambiguities arise when computing eigenvectors: for each eigenvector v, the sign flipped v is also an eigenvector. More generally, higher dimensional eigenspaces contain infinitely many choices of basis eigenvectors. These ambiguities make it a challenge to process eigenvectors and eigenspaces in a consistent way. In this work we introduce SignNet and BasisNet – new neural architectures that are invariant to all requisite symmetries and hence process collections of eigenspaces in a principled manner. Our networks are universal, i.e., they can approximate any continuous function of eigenvectors with the proper invariances. They are also theoretically strong for graph representation learning – they can approximate any spectral graph convolution, can compute spectral invariants that go beyond message passing neural networks, and can provably simulate previously proposed graph positional encodings. Experiments show the strength of our networks for learning spectral graph filters and learning graph positional encodings.
2022

A Practical, ProgressivelyExpressive GNN Lingxiao Zhao, Louis Härtel, Neil Shah, and Leman Akoglu NeurIPS 2022 [Abs] [PDF] [Poster] [Slides] [Code]
Message passing neural networks (MPNNs) have become a dominant flavor of graph neural networks (GNNs) in recent years. Yet, MPNNs come with notable limitations; namely, they are at most as powerful as the 1dimensional WeisfeilerLeman (1WL) test in distinguishing graphs in a graph isomorphism testing framework. To this end, researchers have drawn inspiration from the kWL hierarchy to develop more expressive GNNs. However, current kWLequivalent GNNs are not practical for even small values of k, as kWL becomes combinatorially more complex as k grows. At the same time, several works have found great empirical success in graph learning tasks without highly expressive models, implying that chasing expressiveness with a “coarsegrained ruler” of expressivity like kWL is often unneeded in practical tasks. To truly understand the expressivenesscomplexity tradeoff, one desires a more “finegrained ruler,” which can more gradually increase expressiveness. Our work puts forth such a proposal: Namely, we first propose the (k, c)(≤)SETWL hierarchy with greatly reduced complexity from kWL, achieved by moving from ktuples of nodes to sets with ≤k nodes defined over ≤c connected components in the induced original graph. We show favorable theoretical results for this model in relation to kWL, and concretize it via (k, c)(≤)SETGNN, which is as expressive as (k, c)(≤)SETWL. Our model is practical and progressivelyexpressive, increasing in power with k and c. We demonstrate effectiveness on several benchmark datasets, achieving several stateoftheart results with runtime and memory usage applicable to practical graphs.

Hyperparameter Sensitivity in Deep Outlier Detection: Analysis and a Scalable HyperEnsemble Solution Xueying Ding, Lingxiao Zhao, and Leman Akoglu NeurIPS 2022 [Abs] [PDF] [Code]
Outlier detection (OD) literature exhibits numerous algorithms as it applies to diverse domains. However, given a new detection task, it is unclear how to choose an algorithm to use, nor how to set its hyperparameter(s) (HPs) in unsupervised settings. HP tuning is an evergrowing problem with the arrival of many new detectors based on deep learning. While they have appealing properties such as task driven representation learning and endtoend optimization, deep models come with a long list of HPs. Surprisingly, the issue of model selection in the outlier mining literature has been "the elephant in the room"; a significant factor in unlocking the utmost potential of deep methods, yet little said or done to systematically tackle the issue. In the first part of this paper, we conduct the first largescale analysis on the HP sensitivity of deep OD methods, and through more than 35,000 trained models, quantitatively demonstrate that model selection is inevitable. Next, we design a HProbust and scalable deep hyperensemble model called ROBOD that assembles models with varying HP configurations, bypassing the choice paralysis. Importantly, we introduce novel strategies to speed up ensemble training, such as parameter sharing, batch/simultaneous training, and data subsampling, that allow us to train fewer models with fewer parameters. Extensive experiments on both image and tabular datasets show that ROBOD achieves and retains robust, stateoftheart detection performance as compared to its modern counterparts, while taking only 210% of the time by the naive hyperensemble with independent training.

Graphlevel Anomaly Detection with Unsupervised GNNs Lingxiao Zhao, Saurabh Sawlani, Arvind Srinivasan, and Leman Akoglu ICDM (Short Paper) 2022 [Abs] [PDF] [Code]
Graphbased anomaly detection ﬁnds numerous applications in the realworld. Thus, there exists extensive literature on the topic that has recently shifted toward deep detection models due to advances in deep learning and graph neural networks (GNNs). A vast majority of prior work focuses on detecting node/edge/subgraph anomalies within a single graph, with much less work on graphlevel anomaly detection in a graph database. This work aims to ﬁll two gaps in the literature: We (1) design GLAM, an endtoend graphlevel anomaly detection model based on GNNs, and (2) focus on unsupervised model selection, which is notoriously hard due to lack of any labels, yet especially critical for deep NN based models with a long list of hyperparameters. Further, we propose a new pooling strategy for graphlevel embedding, called MMDpooling, that is geared toward detecting distribution anomalies which has not been considered before. Through extensive experiments on 15 realworld datasets, we show that (i) GLAM outperforms nodelevel and twostage (i.e. not endtoend) baselines, and (ii) model selection picks a signiﬁcantly more effective model than expectation (i.e. average) –without using any labels– among candidates with otherwise large variation in performance.

From Stars to Subgraphs: Uplifting Any GNN with Local Structure Awareness Lingxiao Zhao, Wei Jin, Leman Akoglu, and Neil Shah ICLR 2022 [Abs] [PDF] [Poster] [Slides] [Code]
Message Passing Neural Networks (MPNNs) are a common type of Graph Neural Network (GNN), in which each node’s representation is computed recursively by aggregating representations (“messages”) from its immediate neighbors akin to a starshaped pattern. MPNNs are appealing for being efficient and scalable, however their expressiveness is upperbounded by the 1storder WeisfeilerLehman isomorphism test (1WL). In response, prior works propose highly expressive models at the cost of scalability and sometimes generalization performance. Our work stands between these two regimes: we introduce a general framework to uplift any MPNN to be more expressive, with limited scalability overhead and greatly improved practical performance. We achieve this by extending local aggregation in MPNNs from star patterns to general subgraph patterns (e.g., kegonets): in our framework, each node representation is computed as the encoding of a surrounding induced subgraph rather than encoding of immediate neighbors only (i.e. a star). We choose the subgraph encoder to be a GNN (mainly MPNNs, considering scalability) to design a general framework that serves as a wrapper to uplift any GNN. We call our proposed method GNNAK (GNN As Kernel), as the framework resembles a convolutional neural network by replacing the kernel with GNNs. Theoretically, we show that our framework is strictly more powerful than 1&2WL, and is not less powerful than 3WL. We also design subgraph sampling strategies which greatly reduce memory footprint and improve speed while maintaining performance. Our method sets new stateoftheart performance by large margins for several wellknown graph ML tasks; specifically, 0.08 MAE on ZINC, 74.79% and 86.887% accuracy on CIFAR10 and PATTERN respectively.

Graph Condensation for Graph Neural Networks Wei Jin, Lingxiao Zhao, Shichang Zhang, Yozen Liu, Jiliang Tang, and Neil Shah ICLR 2022 [Abs] [PDF] [Code]
Given the prevalence of largescale graphs in realworld applications, the storage and time for training neural models have raised increasing concerns. To alleviate the concerns, we propose and study the problem of graph condensation for graph neural networks (GNNs). Specifically, we aim to condense the large, original graph into a small, synthetic and highlyinformative graph, such that GNNs trained on the small graph and large graph have comparable performance. We approach the condensation problem by imitating the GNN training trajectory on the original graph through the optimization of a gradient matching loss and design a strategy to condense node futures and structural information simultaneously. Extensive experiments have demonstrated the effectiveness of the proposed framework in condensing different graph datasets into informative smaller graphs. In particular, we are able to approximate the original test accuracy by 95.3% on Reddit, 99.8% on Flickr and 99.0% on Citeseer, while reducing their graph size by more than 99.9%, and the condensed graphs can be used to train various GNN architectures.
2021

Fast Attributed Graph Embedding via Density of States Saurabh Sawlani, Lingxiao Zhao, and Leman Akoglu IEEE ICDM 2021 2021 [Abs] [PDF] [Code]
Given a nodeattributed graph, how can we efficiently represent it with few numerical features that expressively reflect its topology and attribute information? We propose ADOGE, for Attributed DOSbased Graph Embedding, based on density of states (DOS, a.k.a. spectral density) to tackle this problem. ADOGE is designed to fulfill a long desiderata of desirable characteristics. Most notably, it capitalizes on efficient approximation algorithms for DOS, that we extend to blend in node labels and attributes for the first time, making it fast and scalable for large attributed graphs and graph databases. Being based on the entire eigenspectrum of a graph, ADOGE can capture structural and attribute properties at multiple (“glocal”) scales. Moreover, it is unsupervised (i.e. agnostic to any specific objective) and lends itself to various interpretations, which makes it is suitable for exploratory graph mining tasks. Finally, it pro cesses each graph independent of others, making it amenable for streaming settings as well as parallelization. Through extensive experiments, we show the efficacy and efficiency of ADOGE on exploratory graph analysis and graph classification tasks, where it significantly outperforms unsupervised baselines and achieves competitive performance with modern supervised GNNs, while achieving the best tradeoff between accuracy and runtime.

Connecting Graph Convolutional Network and GraphRegularized PCA (Extended) Lingxiao Zhao, and Leman Akoglu Under review 2021 [Abs] [PDF] [Code]
Graph convolution operator of the GCN model is originally motivated from a localized ﬁrstorder approximation of spectral graph convolutions. This work stands on a different view; establishing a mathematical connection between graph convolution and graphregularized PCA (GPCA). Based on this connection, GCN architecture, shaped by stacking graph convolution layers, shares a close relationship with stacking GPCA. We empirically demonstrate that the unsupervised embeddings by GPCA paired with a 1 or 2layer MLP achieves similar or even better performance than GCN on semisupervised node classiﬁcation tasks across ﬁve datasets including Open Graph Benchmark 1. This suggests that the prowess of GCN is driven by graph based regularization. In addition, we extend GPCA to the (semi)supervised setting and show that it is equivalent to GPCA on a graph extended with “ghost” edges between nodes of the same label. Finally, we capitalize on the discovered relationship to design an effective initialization strategy based on stacking GPCA, enabling GCN to converge faster and achieve robust performance at large number of layers. Notably, the proposed initialization is generalpurpose and applies to other GNNs.

On Using Classification Datasets to Evaluate Graph Outlier Detection: Peculiar Observations and New Insights Lingxiao Zhao, and Leman Akoglu Big Data Journal, Special Issue on Evaluation and Experimental Design in Data Mining and Machine Learning, Aug. 2021 2021 [Abs] [PDF] [Code]
It is common practice of the outlier mining community to repurpose classification datasets toward evaluating various detection models. To that end, often a binary classification dataset is used, where samples from one of the classes is designated as the ‘inlier’ samples, and the other class is substantially downsampled to create the (groundtruth) ‘outlier’ samples. Graphlevel outlier detection (GLOD) is rarely studied but has many potentially influential realworld applications. In this study, we identify an intriguing issue with repurposing graph classification datasets for GLOD. We find that ROCAUC performance of the models changes significantly (“flips” from high to very low, even worse than random) depending on which class is downsampled. Interestingly, ROCAUCs on these two variants approximately sum to 1 and their performance gap is amplified with increasing propagations for a certain family of propagation based outlier detection models. We carefully study the graph embedding space produced by propagation based models and find two driving factors: (1) disparity between withinclass densities which is amplified by propagation, and (2) overlapping support (mixing of embeddings) across classes. We also study other graph embedding methods and downstream outlier detectors, and find that the intriguing “performance flip” issue still widely exists but which version of the downsample achieves higher performance may vary. Thoughtful analysis over comprehensive results further deeper our understanding of the established issue.

Graph Unrolling Networks: Interpretable Neural Networks for Graph Signal Denoising Siheng Chen, Yonina C Eldar, and Lingxiao Zhao IEEE Transactions on Signal Processing 2021 [Abs] [PDF]
We propose an interpretable graph neural network framework to denoise single or multiple noisy graph signals. The proposed graph unrolling networks expand algorithm unrolling to the graph domain and provide an interpretation of the architecture design from a signal processing perspective. We unroll an iterative denoising algorithm by mapping each iteration into a single network layer where the feedforward process is equivalent to iteratively denoising graph signals. We train the graph unrolling networks through unsupervised learning, where the input noisy graph signals are used to supervise the networks. By leveraging the learning ability of neural networks, we adaptively capture appropriate priors from input noisy graph signals, instead of manually choosing signal priors. A core component of graph unrolling networks is the edgeweightsharing graph convolution operation, which parameterizes each edge weight by a trainable kernel function where the trainable parameters are shared by all the edges. The proposed convolution is permutationequivariant and can flexibly adjust the edge weights to various graph signals. We then consider two special cases of this class of networks, graph unrolling sparse coding (GUSC) and graph unrolling trend filtering (GUTF), by unrolling sparse coding and trend filtering, respectively. To validate the proposed methods, we conduct extensive experiments on both realworld datasets and simulated datasets, and demonstrate that our methods have smaller denoising errors than conventional denoising algorithms and stateoftheart graph neural networks. For denoising a single smooth graph signal, the normalized mean square error of the proposed networks is around 40% and 60% lower than that of graph Laplacian denoising and graph wavelets, respectively.
2020

PairNorm: Tackling Oversmoothing in GNNs Lingxiao Zhao, and Leman Akoglu ICLR 2020, Addis Ababa, Ethiopia 2020 [Abs] [PDF] [Slides] [Code] [Video]
The performance of graph neural nets (GNNs) is known to gradually decrease with increasing number of layers. This decay is partly attributed to oversmoothing, where repeated graph convolutions eventually make node embeddings indistinguishable. We take a closer look at two different interpretations, aiming to quantify oversmoothing. Our main contribution is PAIRNORM, a novel normalization layer that is based on a careful analysis of the graph convolution operator, which prevents all node embeddings from becoming too similar. What is more, PAIRNORM is fast, easy to implement without any change to network architecture nor any additional parameters, and is broadly applicable to any GNN. Experiments on realworld graphs demonstrate that PAIRNORM makes deeper GCN, GAT, and SGC models more robust against oversmoothing, and significantly boosts performance for a new problem setting that benefits from deeper GNNs.

Connecting Graph Convolutional Network and GraphRegularized PCA Lingxiao Zhao, and Leman Akoglu ICML 2020 GRLP Workshop 2020 [Abs] [PDF] [Code]
Graph convolution operator of the GCN model is originally motivated from a localized firstorder approximation of spectral graph convolutions.This work stands on a different view; establishing a connection between graph convolution and graphregularized PCA. Based on this connection, GCN architecture, shaped by stacking graph convolution layers, shares a close relationship with stacking graphregularized PCA (GPCA). We empirically demonstrate that the unsupervised embeddings by GPCA paired with a logistic regression classifier achieves similar performance to GCN on semisupervised node classification tasks. Further, we capitalize on the discovered relationship to design an effective initialization strategy for GCN based on stacking GPCA.

Generalizing Graph Neural Networks Beyond Homophily Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu, and Danai Koutra NeurIPS 2020 [Abs] [PDF] [Code]
We investigate the representation power of graph neural networks in the semisupervised node classification task under heterophily or low homophily, i.e., in networks where connected nodes may have different class labels and dissimilar features. Most existing GNNs fail to generalize to this setting, and are even outperformed by models that ignore the graph structure (e.g., multilayer perceptrons). Motivated by this limitation, we identify a set of key designs – ego and neighborembedding separation, higherorder neighborhoods, and combination of intermediate representations – that boost learning from the graph structure under heterophily, and combine them into a new graph convolutional neural network, H2GCN. Going beyond the traditional benchmarks with strong homophily, our empirical analysis on synthetic and real networks shows that, thanks to the identified designs, H2GCN has consistently strong performance across the full spectrum of lowtohigh homophily, unlike competitive prior models without them.
2018

A Quest for Structure: Jointly Learning the Graph Structure and SemiSupervised Classification Xuan Wu*, Lingxiao Zhao*, and Leman Akoglu In Proceedings of the 27th ACM International Conference on Information and Knowledge Management 2018 [Abs] [PDF] [Slides] [Code]
How should one construct a graph from the input pointcloud data for graphbased SSL? In this work we introduce a new, parallel graph learning framework (called PGlearn) for the graph construction step of SSL. Our solution has two main ingredients: (1) a gradientbased optimization of the edge weights (more specifically, different kernel bandwidths in each dimension) based on a validation loss function, and (2) a parallel hyperparameter search algorithm with an adaptive resource allocation scheme. In essence, (1) allows us to search around a (random) initial hyperparameter configuration for a better one with lower validation loss. Since the search space of hyperparameters is huge for highdimensional problems, (2) empowers our gradientbased search to go through as many different initial configurations as possible, where runs for relatively unpromising starting configurations are terminated early to allocate the time for others. As such, PGlearn is a carefullydesigned hybrid of random and adaptive search. Through experiments on multiclass classification problems, we show that PGlearn significantly outperforms a variety of existing graph construction schemes in accuracy (per fixed time budget for hyperparameter tuning), and scales more effectively to high dimensional problems.