This is the list of all accepted papers. You can find more information about each submission by following either the Forum link or by visiting the OpenReview Workshop Website.

# Statistics

We received 53 submissions in total, of which we accepted 10 spotlights and 29 poster presentations. All submissions received at least two reviews, which were afterwards weighted, read, and considered by members of the organisation committee independently.

Congratulations to all authors!

# Spotlights

$k$-simplex2vec: a simplicial extension of node2vec (Spotlight presentation)
Celia Hacker

We present a novel method of associating Euclidean features to simplicial complexes, providing a way to use them as input to statistical and machine learning tools. This method extends the node2vec algorithm to simplices of higher dimensions, providing insight into the structure of a simplicial complex, or into the higher-order interactions in a graph.

Characterizing the Latent Space of Molecular Deep Generative Models with Persistent Homology Metrics (Spotlight presentation)
Yair Schiff • Payel Das • Vijil Chenthamarakshan • Karthikeyan Natesan Ramamurthy

Deep generative models are increasingly becoming integral parts of the in silico molecule design pipeline and have dual goals of learning the chemical and structural features that render candidate molecules viable while also being flexible enough to generate novel designs. Specifically, Variational Auto Encoders (VAEs) are generative models in which encoder-decoder network pairs are trained to reconstruct training data distributions in such a way that the latent space of the encoder network is smooth. Therefore, novel candidates can be found by sampling from this latent space. However, the scope of architectures and hyperparameters is vast and choosing the best combination for in silico discovery has important implications for downstream success. Therefore, it is important to develop a principled methodology for distinguishing how well a given generative model is able to learn salient molecular features. In this work, we propose a method for measuring how well the latent space of deep generative models is able to encode structural and chemical features of molecular datasets by correlating latent space metrics with metrics from the field of topological data analysis (TDA). We apply our evaluation methodology to a VAE trained on SMILES strings and show that 3D topology information is consistently encoded throughout the latent space of the model.

giotto-tda: A Topological Data Analysis Toolkit for Machine Learning and Data Exploration (Spotlight presentation)
Guillaume Tauzin • Umberto Lupo • Lewis Tunstall • Julian Burella Perez • Matteo Caorsi • Wojciech Reise • Anibal Maximiliano Medina-Mardones • Alberto Dassatti • Kathryn Hess

We introduce giotto-tda, a Python library that integrates high-performance topological data analysis with machine learning via a scikit-learn-compatible API and state-of-the-art C++ implementations. The library's ability to handle various types of data is rooted in a wide range of preprocessing techniques, and its strong focus on data exploration and interpretability is aided by an intuitive plotting API. Source code, binaries, examples, and documentation can be found at https://github.com/giotto-ai/giotto-tda

Hypothesis classes with a unique persistence diagram are nonuniformly learnable (Spotlight presentation)
Nicholas Bishop • Thomas Davies • Long Tran-Thanh

Persistence-based summaries are increasingly integrated into deep learning through topological loss functions or regularisers. The implicit role of a topological term in a loss function is to restrict the class of functions in which we are learning (the hypothesis class) to those with a specific topology. Although doing so has had empirical success, to the best of our knowledge there exists no result in the literature that theoretically justifies this restriction. Given a binary classifier in the plane with a Morse-like decision boundary, we prove that the hypothesis class defined by restricting the topology of the possible decision boundaries to those with a unique persistence diagram results in a nonuniformly learnable class of functions. In doing so, we provide a statistical learning theoretic justification for the use of persistence-based summaries in loss functions.

Multidimensional Persistence Module Classification via Lattice-Theoretic Convolutions (Spotlight presentation)
Hans Matthew Riess • Jakob Hansen

Multiparameter persistent homology has been largely neglected as an input to machine learning algorithms. We consider the use of lattice-based convolutional neural network layers as a tool for the analysis of features arising from multiparameter persistence modules. We find that these show promise as an alternative to convolutions for the classification of multidimensional persistence modules.

Permutation invariant networks to learn Wasserstein metrics (Spotlight presentation)
Arijit Sehanobish • Neal G Ravindra • David van Dijk

Understanding the space of probability measures on a metric space equipped with a Wasserstein distance is one of the fundamental questions in mathematical analysis. The Wasserstein metric has received a lot of attention in the machine learning community especially for its principled way of comparing distributions. In this work, we use a permutation invariant network to map samples from probability measures into a low-dimensional space such that the Euclidean distance between the encoded samples reflects the Wasserstein distance between probability measures. We show that our network can generalize to correctly compute distances between unseen densities. We also show that these networks can learn the first and the second moments of probability distributions.

Quantifying barley morphology using the Euler characteristic transform (Spotlight presentation)
Erik J Amezquita • Michelle Quigley • Tim Ophelders • Jacob Landis • Elizabeth Munch • Daniel Chitwood • Daniel Koenig

Shape is foundational to biology. Observing and documenting shape has fueled biological understanding, and from this perspective, it is also a type of data. The vision of topological data analysis, that data is shape and shape is data, will be relevant as biology transitions into a data-driven era where meaningful interpretation of large data sets is a limiting factor. We focus first on quantifying the morphology of barley spikes and seeds using topological descriptors based on the Euler characteristic. We then successfully train a support vector machine to classify 28 different varieties of barley based solely on the shape of their grains.

Sheaf Neural Networks (Spotlight presentation)
Jakob Hansen • Thomas Gebhart

We present a generalization of graph convolutional networks by generalizing the diffusion operation underlying this class of graph neural networks. These \emph{sheaf neural networks} are based on the \emph{sheaf Laplacian}, a generalization of the graph Laplacian that encodes additional relational structure parameterized by the underlying graph. The sheaf Laplacian and associated matrices provide an extended version of the diffusion operation in graph convolutional networks, providing a proper generalization for domains where relations between nodes are non-constant, asymmetric, and varying in dimension. We show that the resulting sheaf neural networks can outperform graph convolutional networks in domains where relations between nodes are asymmetric and signed.

Topo Sampler: A Topology Constrained Noise Sampling for GANs (Spotlight presentation)

This work studies disconnected manifold learning in generative models in the light of point-set topology and persistent homology. Under this formalism, the topological similarity of latent space in generative models with the underlying manifold of data distribution facilitates better generalization. To achieve this, we introduce a topology-constrained noise sampler, responsible for mapping the samples from Gaussian spheres to a latent embedding space, which in turn is constrained to be topologically similar to the manifold underlying the data distribution. We study the effectiveness of this method in GANs for learning disconnected manifolds. This is ongoing research, with the current report containing preliminary empirical experiments.

Weighting vectors for machine learning: numerical harmonic analysis applied to boundary detection (Spotlight presentation)
Eric Bunch • Daniel Dickinson • Jeffery Kline • Glenn Fung

Metric space magnitude, an active subject of research in algebraic topology, aims to quantify the effective number of distinct points in a space. The contribution of each point to a metric space’s global magnitude, which is encoded by the {\em weighting vector}, captures much of the underlying geometry of the original metric space. When the metric space is Euclidean, the weighting vector also serves as an effective tool for boundary detection. This allows the weighting vector to serve as the foundation of novel algorithms for classic machine learning tasks such as classification, outlier detection and active learning. We demonstrate, using experiments and comparisons on classic benchmark datasets, the promise of the proposed magnitude and weighting vector-based approaches.

# Accepted Posters

0-dimensional Homology Preserving Dimensionality Reduction with TopoMap
Harish Doraiswamy • Julien Tierny • Paulo J.S. Silva • Luis Gustavo Nonato • Cláudio Silva

This note presents TopoMap, a novel dimensionality reduction technique which provides topological guarantees during the mapping process. In particular, TopoMap performs the mapping from a high-dimensional space to a visual space, while preserving the 0-dimensional persistence diagram of the Rips filtration of the high-dimensional data, ensuring that the filtrations generate the same connected components when applied to the original as well as projected data. The presented case studies show that the topological guarantee provided by TopoMap not only brings confidence to the visual analytic process but also can be used to assist in the assessment of other projection methods.

Application of Topological Data Analysis to Delirium Detection
Mari Kajitani • Ken Kobayashi • Yuichi Ike • Takehiko Yamanashi • Yuhei Umeda • Yoshimasa Kadooka • Gen Shinozaki

We propose a new scoring algorithm for detecting delirium from one-channel EEG, based on topological data analysis. Numerical experiments demonstrated that our method achieved high predictive performance than the other existing methods.

Bifurcation Analysis using Zigzag Persistence
Sarah Tymochko • Elizabeth Munch • Firas Khasawneh

As bifurcations in a dynamical system are drastic behavioral changes, being able to detect when these bifurcations occur can be essential to understanding the system overall. While persistent homology has successfully been used in the field of dynamical systems, the most commonly used approaches have their limitations. Using zigzag persistence, we can simplify the methodology and capture topological changes through a collection of time series, rather that studying the topology of individual time series separately. Here we present Bifurcations using ZigZag (BuZZ), a method to detect Hopf bifurcations in dynamical systems.

Can neural networks learn persistent homology features?
Guido Montufar • Nina Otter • Yu Guang Wang

Topological data analysis uses tools from topology — the mathematical area that studies shapes — to create representations of data. In particular, in persistent homology, one studies one-parameter families of spaces associated with data, and persistence diagrams describe the lifetime of topological invariants, such as connected components or holes, across the one-parameter family. In many applications, one is interested in working with features associated with persistence diagrams rather than the diagrams themselves. In our work, we explore the possibility of learning several types of features extracted from persistence diagrams using neural networks.

Cell Complex Neural Networks
Mustafa Hajij • Kyle Istvan • Ghada Zamzmi

Cell complexes are topological spaces constructed from simple blocks called cells. They generalize graphs, simplicial complexes, and polyhedral complexes that form important domains for practical applications. They also provide a combinatorial formalism that allows the inclusion of complicated relationships of restrictive structures such as graphs and meshes. In this paper, we propose \textbf{cell complexes neural networks (CXNs)} a general, combinatorial, and unifying construction for performing neural network-type computations on cell complexes. We introduce an inter-cellular message passing scheme on cell complexes that takes the topology of the underlying space into account and generalizes message passing scheme to graphs. Finally, we introduce a unified cell complex encoder-decoder framework that enables learning representation of cells for a given complex inside the Euclidean spaces. In particular, we show how our cell complex autoencoder construction can give in the special case \textbf{cell2vec}, a generalization for node2vec.

Challenging Euclidean Topological Autoencoders
Michael Moor • Max Horn • Karsten Borgwardt • Bastian Rieck

Topological autoencoders (TopoAE) have demonstrated their capabilities for performing dimensionality reduction while at the same time preserving topological information of the input space. In its original formulation, this method relies on a Vietoris--Rips filtration of the data space, using the Euclidean metric as the base distance. It is commonly assumed that this distance is not sufficiently powerful to capture salient features of image data sets. We therefore investigate alternative choices of distances in the data space, which are generally considered to be more faithful for image data in comparison to the pixel distance. In our experiments on real-world image datasets, we find that the Euclidean formulation of TopoAE is surprisingly competitive with more elaborate, perceptually-inspired image distances.

Comparing Distance Metrics on Vectorized Persistence Summaries
Brittany Fasy • Yu Qin • Brian Summa • Carola Wenk

The persistence diagram (PD) is an important tool in topological data analysis for encoding an abstract representation of the homology of a shape at different scales. Different vectorizations of PD summary are commonly used in machine learning applications, however distances between vectorized persistence summaries may differ greatly from the distances between the original PDs. Surprisingly, no research has been carried out in this area before. In this work we compare distances between PDs and between different commonly used vectorizations. Our results give new insights into comparing vectorized persistence summaries and can be used to design better feature-based learning models based on PDs.

Deep Graph Mapper: Seeing Graphs Through the Neural Lens
Cristian Bodnar • Cătălina Cangea • Pietro Liò

Graph summarisation has received much attention lately, with various works tackling the challenge of defining pooling operators on data regions with arbitrary structures. These contrast the grid-like ones encountered in image inputs, where techniques such as max-pooling have been enough to show empirical success. In this work, we merge the Mapper algorithm with the expressive power of graph neural networks to produce topologically-grounded graph summaries. We demonstrate the suitability of Mapper as a topological framework for graph pooling by proving that Mapper is a generalisation of pooling methods based on soft cluster assignments. Building upon this, we show how easy it is to design novel pooling algorithms that obtain competitive results with other state-of-the-art methods.

Functorial Clustering via Simplicial Complexes
Dan Shiebler

We adapt previous research on topological unsupervised learning to characterize hierarchical overlapping clustering algorithms as functors that factor through a category of simplicial complexes. We first develop a pair of adjoint functors that map between simplicial complexes and the outputs of clustering algorithms. Next, we introduce the maximal and single linkage clustering algorithms as the respective composition of the flagification and connected components functors with McInnes et al's finite singular set functor. We then adapt a theorem by Culbertson et al to demonstrate that all other hierarchical overlapping clustering functors are refined by maximal linkage and refine single linkage.

Fuzzy c-Means Clustering for Persistence Diagrams
Thomas Davies • Jack Aspinall • Bryan Wilder • Long Tran-Thanh

Persistence diagrams concisely represent the topology of a point cloud whilst having strong theoretical guarantees. Most current approaches to integrating topological information into machine learning implicitly map persistence diagrams to a Hilbert space, resulting in deformation of the underlying metric structure whilst also generally requiring prior knowledge about the true topology of the space. In this paper we give an algorithm for Fuzzy c-Means (FCM) clustering directly on the space of persistence diagrams, enabling unsupervised learning that automatically captures the topological structure of data, with no prior knowledge or additional processing of persistence diagrams. We prove the same convergence guarantees as traditional FCM clustering: every convergent subsequence of iterates tends to a local minimum or saddle point. We end by presenting experiments where the fuzzy nature of our topological clustering is capitalised on: lattice structure classification in materials science and pre-trained model selection in machine learning.

Hotspot identification for Mapper graphs
Ciara Frances Loughrey • Anna Jurek-Loughrey • Nick Orr • Pawel Dlotko

Mapper algorithm can be used to build graph-based representations of high-dimensional data capturing structurally interesting features such as loops, flares or clusters. The graph can be further annotated with additional colouring of vertices allowing location of regions of special interest. For instance, in many applications, such as precision medicine, Mapper graph has been used to identify unknown compactly localized subareas within the dataset demonstrating unique or unusual behaviours. This task, performed so far by a researcher, can be automatized using hotspot analysis. In this work we propose a new algorithm for detecting hotspots in Mapper graphs. It allows automatizing of the hotspot detection process. We demonstrate the performance of the algorithm on a number of artificial and real world datasets. We further demonstrate how our algorithm can be used for the automatic selection of the Mapper lens functions.

Interpretable Phase Detection and Classification with Persistent Homology
Gregory Loges • Alex Cole • Gary Shiu

We apply persistent homology to the task of discovering and characterizing phase transitions, using lattice spin models from statistical physics for working examples. Persistence images provide a useful representation of the homological data for conducting statistical tasks. To identify the phase transitions, a simple logistic regression on these images is sufficient for the models we consider, and interpretable order parameters are then read from the weights of the regression. Magnetization, frustration and vortex-antivortex structure are identified as relevant features for characterizing phase transitions.

Learning a manifold from a teacher’s demonstrations
PEI WANG • Arash Givchi • Patrick Shafto

We consider the problem of learning a manifold from a teacher's demonstration. Extending existing approaches of learning from randomly sampled data points, we consider contexts where data may be chosen by a teacher. We analyze learning from teachers who can provide structured data such as individual examples (isolated data points) and demonstrations (sequences of points). Our analysis shows that for the purpose of teaching the topology of a manifold, demonstrations can yield remarkable decreases in the amount of data points required in comparison to teaching with randomly sampled points.

LUMAWIG: Un-bottling the bottleneck distance for zero dimensional persistence diagrams at scale
Paul Samuel Ignacio • Jay-Anne Bulauan • David Uminsky

We present LUMÁWIG, a novel efficient algorithm to compute dimension zero bottleneck distance between two persistence diagrams of a specific kind which outperforms all other publicly available algorithm in runtime and accuracy. We bypass the overwhelming matching problem in previous implementations of the bottleneck distance, and prove that the zero dimensional bottleneck distance can be recovered from a very small number of matching cases. LUMÁWIG also generally enjoys linear complexity as shown by empirical tests. This allows us to scaleTDA to data sets of sizes encountered in machine learning and utilize persistence diagrams in a manner that goes beyond the simple use of the most persistent components.

Multi-parameter hierarchical clustering and beyond
Alexander Rolle

We survey recent progress on multi-parameter hierarchical clustering, which has developed in several directions since it was introduced by Carlsson--M\'{e}moli in 2010. These lines of research show that tools originally developed in the setting of multi-parameter persistent homology can be applied more broadly, without linearizing via homology.

Multi-Parameter Persistent Homology is Practical (Extended Abstract)
Michael Kerber

Multi-parameter persistent homology is a branch of topological data analysis that is notorious for being more difficult than the standard (one-parameter) version, both in theory and for algorithmic problems. We report on three ongoing projects that demonstrates that multi-parameter method are applicable to large data sets. For instance, natural bi-filtrations generalizing Vietoris-Rips or alpha filtrations for hundred of thousands of points can be decomposed within seconds in their indecomposable parts.

Multiple Hypothesis Testing with Persistent Homology
Mikael Vejdemo-Johansson • Sayan Mukherjee

Multiple hypothesis testing requires a control procedure. Simply increasing simulations or permutations to meet a Bonferroni-style threshold is prohibitively expensive. In this paper we propose a null model based approach to testing for acyclicity, coupled with a Family-Wise Error Rate (FWER) control method that does not suffer from these computational costs.

Novel Topological Shapes of Model Interpretability
Hendrik Jacob van Veen

The most accurate models can be the most challenging to interpret. This paper advances interpretability analysis by combining insights from $\texttt{Mapper}$ with recent interpretable machine-learning research. Enforcing new visualization constraints on $\texttt{Mapper}$, we produce a globally - to locally interpretable visualization of the Explainable Boosting Machine. We demonstrate the usefulness of our approach to three data sets: cervical cancer risk, propaganda Tweets, and a loan default data set that was artificially hardened with severe concept drift.

On The Topological Expressive Power of Neural Networks
Giovanni Petri • António Leitão

We propose a topological description of neural network expressive power. We adopt the topology of the space of decision boundaries realized by a neural architecture as a measure of its intrinsic expressive power. By sampling a large number of neural architectures with different sizes and design, we show how such measure of expressive power depends on the properties of the architectures, like depth, width and other related quantities.

Passive Encrypted IoT Device Fingerprinting with Persistent Homology
Joe Collins • Michaela Iorga • Dmitry Cousin • David Chapman

Internet of things (IoT) devices are becoming increasingly prevalent. These devices can improve quality of life, but often present significant security risks to end users. In this work we present a novel persistent homology based method for the fingerprinting of IoT traffic. Traditional passive device fingerprinting methods directly inspect the packet attributes or contents within the captured traffic. Buttechniques to fingerprint devices based on inter-packet arrival time (IAT) are an important area of research, as this feature is available even in encrypted traffic.We demonstrate that Topological Data Analysis (TDA) using persistent homology over IAT packet windows is a viable approach to obtain discriminative features for device fingerprinting. The clique complex construction and weighting function we present are efficient to compute and robust to shifts of the packet window. The1-dimensional homology is calculated over the resulting filtered clique complex.We obtain competitive accuracy of 95.34% on the UNSW IoT dataset by using a convolutional neural network to classify over the corresponding persistence images.

Regularization of Persistent Homology Gradient Computation

Persistent homology is a method for computing the topological features present in a given data. Recently, there has been much interest in the integration of persistent homology as a computational step in neural networks or deep learning. In order for a given computation to be integrated in such a way, the computation in question must be differentiable. Computing the gradients of persistent homology is an ill-posed inverse problem with infinitely many solutions. Consequently, it is important to perform regularization so that the solution obtained agrees with known priors. In this work we propose a novel method for regularizing persistent homology gradient computation through the addition of a grouping term. This has the effect of helping to ensure gradients are defined with respect to larger entities and not individual points.

Research Directions to Validate Topological Models of Multi-Dimensional Data
Nello Blaser • Michael Aupetit

Various topological models of multi-dimensional data have been proposed for different applications. One of the main issues is to evaluate how correct these models are given the stochastic nature of the data source typical of exploratory data analysis and machine learning settings. We propose research directions to validate the quality of the Mapper and the Generative Simplicial Complex, two models that compute simplicial complexes from multi-dimensional data.

Simplicial 2-Complex Convolutional Neural Networks
Eric Bunch • Qian You • Glenn Fung • Vikas Singh

Recently, neural network architectures have been developed to accommodate when the data has the structure of a graph or, more generally, a hypergraph. While useful, graph structures can be potentially limiting. Hypergraph structures in general do not account for higher order relations between their hyperedges. Simplicial complexes offer a middle ground, with a rich theory to draw on. We develop a convolutional neural network layer on simplicial 2-complexes.

Simplicial Neural Networks
Stefania Ebli • Michaël Defferrard • Gard Spreemann

We present simplicial neural networks (SNNs), a generalization of graph neural networks to data that live on a class of topological spaces called simplicial complexes. These are natural multi-dimensional extensions of graphs that encode not only pairwise relationships but also higher-order interactions between vertices—allowing us to consider richer data, including vector fields and $n$-fold collaboration networks. We define an appropriate notion of convolution that we leverage to construct the desired convolutional neural networks. We test the SNNs on the task of imputing missing data on coauthorship complexes. Code and data are available at https://github.com/stefaniaebli/simplicial_neural_networks.

Teaspoon: A comprehensive python package for topological signal processing
Audun D Myers • Melih Yesilli • Sarah Tymochko • Firas Khasawneh • Elizabeth Munch

The emerging field of topological signal processing brings methods from Topological Data Analysis (TDA) to create new tools for signal processing by incorporating aspects of shape. In this paper, we present an overview of the python package teaspoon, which brings together available software for computing persistent homology, the main workhorse of TDA, with modules that expand the functionality of teaspoon as a state-of-the-art topological signal processing tool. These modules include methods for incorporating tools from machine learning, complex networks, information, and parameter selection along with a dynamical systems library to streamline the creation and benchmarking of new methods. All code is open source with up to date documentation, making the code easy to use, in particular for signal processing experts with limited experience in topological methods.

Topological Convolutional Neural Networks
Ephy Love • Benjamin Filippenko • Vasileios Maroulas • Gunnar E. Carlsson

There is considerable interest in making convolutional neural networks (CNNs) that learn on less data, are better at generalizing, and are more easily interpreted. This work introduces the Topological CNN (TCNN), which encompasses several topologically defined convolutional methods. Manifolds with important relationships to the natural image space are used to parameterize image filters which are used as convolutional weights in a TCNN. These manifolds also parameterize slices in layers of a TCNN across which the weights are localized. We show evidence that TCNNs learn faster, on less data, with fewer learned parameters, and with greater generalizability and interpretability than conventional CNNs.

Topological Echoes of Primordial Physics in the Universe at Large Scales
Alex Cole • Matteo Biagetti • Gary Shiu

We present a pipeline for characterizing and constraining initial conditions in cosmology via persistent homology. The cosmological observable of interest is the cosmic web of large scale structure, and the initial conditions in question are non-Gaussianities (NG) of primordial density perturbations. We compute persistence diagrams and derived statistics for simulations of dark matter halos with Gaussian and non-Gaussian initial conditions. For computational reasons and to make contact with experimental observations, our pipeline computes persistence in sub-boxes of full simulations and simulations are subsampled to uniform halo number. We use simulations with large NG ($f_{\rm NL}^{\rm loc}=250$) as templates for identifying data with mild NG ($f_{\rm NL}^{\rm loc}=10$), and running the pipeline on several cubic volumes of size $40~(\textrm{Gpc/h})^{3}$, we detect $f_{\rm NL}^{\rm loc}=10$ at $97.5\%$ confidence on $\sim 85\%$ of the volumes for our best single statistic. Throughout we benefit from the interpretability of topological features as input for statistical inference, which allows us to make contact with previous first-principles calculations and make new predictions.

TOTOPO: Classifying univariate and multivariate time series with Topological Data Analysis
Pilyugina Polina • Rodrigo Rivera-Castro • Evgeny Burnaev

This work is devoted to a comprehensive analysis of topological data analysis for time series classification. Previous works have significant shortcomings, such as lack of large-scale benchmarking or missing state-of-the-art methods. In this work, we propose TOTOPO for extracting topological descriptors from different types of persistence diagrams. The results suggest that TOTOPO significantly outperforms existing baselines in terms of accuracy. TOTOPO is also competitive with the state-of-the-art, being the best on 20\% of univariate and 40\% of multivariate time series datasets. This work validates the hypothesis that TDA-based approaches are robust to small perturbations in data and are useful for cases where periodicity and shape help discriminate between classes.

Using topological autoencoders as a filtering function for global and local topology
Filip Cornell

Choosing a suitable filtering function for the Mapper algorithm can be difficult due to its arbitrariness and domain-specific requirements. Finding a general filtering function that can be applied across domains is therefore of interest, since it would improve the representation of manifolds in higher dimensions. In this extended abstract, we propose that topological autoencoders is a suitable candidate for this and report initial results strengthening this hypothesis for one set of high-dimensional manifolds. The results indicate a potential for an easier choice of filtering function when using the Mapper algorithm, allowing for a more general and descriptive representation of high-dimensional data.

Witness Autoencoder: Shaping the Latent Space with Witness Complexes
Simon Till Schönenberger • Anastasiia Varava • Vladislav Polianskii • Jen Jen Chung • Danica Kragic • Roland Siegwart

We present a Witness Autoencoder (W-AE) – an autoencoder that captures geodesic distances of the data in the latent space. Our algorithm uses witness complexes to compute geodesic distance approximations on a mini-batch level, and leverages topological information from the entire dataset while performing batch-wise approximations. This way, our method allows to capture the global structure of the data even with a small batch size, which is beneficial for large-scale real-world data. We show that our method captures the structure of the manifold more accurately than the recently introduced topological autoencoder (TopoAE).