Abstracts: Spike Sorting and Reproducibility for Next Generation Electrophysiology

24. and 25. June 2019, Informatics Forum, University of Edinburgh

Talks

James Jun: Drift-resistant, real-time spike sorting based on anatomical similarity for high channel-count silicon probes

Extracellular electrophysiology can record the activity of a large neural population with single spike resolution. Spike sorting is then needed to resolve individual cellular activities by grouping together similar spike waveforms distributed at a subset of adjacent electrodes. Silicon probes are widely used to measure the spiking activities from behaving animals, but the probes can drift in the brain due to animal movements or tissue relaxation following a probe penetration. Probe drift often causes errors in conventional spike sorting methods that assume stationarity in spike waveforms and amplitudes. However, some of the latest silicon probes offer whole-shank coverage by electrodes of sufficient density that algorithms could potentially compensate for such drift along the probe axis. We introduce a drift-resistant spike sorting algorithm, IronClust, for high channel-count, high-density silicon probes, which accurately handles both gradual and rapid drift. We exploit the fact that a linearly drifting probe revisits anatomical locations at later times. We apply density-based clustering to temporal subsets of the spiking events when the probe occupied similar anatomical locations. This anatomical similarity between disjoint time segments is determined from activity histograms, which capture spatial structures in the spike amplitude distribution on each electrode, prior to clustering. For each spiking event, the clustering algorithm (DPCLUS) computes the distances to a subset of its neighbors selected by their peak channel locations, and by anatomical similarity. Based on the k-nearest neighbors, one then finds the density peaks based on the local density values, and the nearest distances to the higher-density neighbors, and recursively propagates the cluster memberships toward a decreasing density gradient. The accuracy of our algorithm was evaluated using validation datasets generated using a biophysically detailed neural network simulator (BioNet), which generated stationary, slow monotonic drift, sinusoidal, and fast random drift.

IronClust achieved ~8% error on the stationary dataset, and ~10% error on the gradual or random drift datasets, which significantly outperformed existing algorithms. We also found that additional columns of electrodes improve the sorting accuracy in all cases. By exploiting GPU code, IronClust achieves over 11x real-time sorting speed for 60 channels, which is over 2x faster than other leading algorithms. In conclusion, we present an accurate and scalable spike sorting tool that is resistant to probe drift, by taking advantage of anatomically-aware clustering and parallel computing.

Olivier Winter: Spike Sorting Roadmap for Large Scale Neurophysiology at IBL

The International Brain Laboratory (IBL) is a collaboration amongst 21 neuroscience laboratories to explore neural basis of decision making in the mice brain. A standard neurophysiology experiment is replicated throughout the collaboration to provide statistical power and wide brain coverage. We expect the order of 600-900 yearly experimental sessions using 2 neuropixel probes providing 384 recording sites each. We’ve built a data architecture to support centralize and standardize the storage and availability of this electrophysiology data.

Spike sorting is paramount for the accuracy of subsequent analysis, but the larger than usual scope of the collaboration poses several challenges to overcome:

First spike sorting results need to be replicable and consistent across subjects and labs so we can combine experiments in an overarching analysis. The dispersion of spike sorting results also has to be quantified using robust and useful metrics for quality control.
Another crucial point is to reduce the need for manual curation, not only to spare a cumbersome task to researchers, but also to avoid a subjective and non-reproducible part of the spike-sorting process.
At last, we need to support several versions of spike sorting corresponding to different algorithms and manual curations, as we need to assume state of the art will evolve during the project lifespan.

Hernan Rey: How do we pick the right spike sorting approach for a given application?

In the last 15 years there have been unprecedented advances in the development of new technology to perform electrophysiological recordings, and there are more to come in the near future. The high volume of data that will be generated imposes great challenges to spike sorting algorithms. As new approaches are proposed, I’ll emphasise in this talk on the need of tools to compare their performances efficiently. However, I’ll also show that for a given application, certain features might be more important than others when it comes to choosing a particular sorting approach. Moreover, I’ll show that despite the increase in the channel count of new probes, there are still several applications in which single-electrode recordings are routinely used, such as invasive human recordings performed with depth electrodes implanted in patients with epilepsy or chronic implants for brain-machine interface.

Espen Hagen: Forward-model based generation of test data with ground truth for spike-sorting algorithms using ViSAPy and LFPy

New, silicon-based multielectrodes comprising hundreds of electrode contacts such as NeuroPixels probes (Jun et al. (2017) Nature 551:232-236) offer the possibility to record spike trains from thousands of neurons simultaneously. Automated methods for spike sorting data from such devices are being developed, but their validation requires benchmarking data sets with known ground-truth spike times. I will first present a general simulation tool for computing benchmarking data for evaluation of spike-sorting algorithms entitled ViSAPy (Virtual Spiking Activity in Python, https://github.com/espenhgn/ViSAPy, Hagen et al. (2015) J Neurosci Meth 245:182-204). The tool is based on a well-established biophysical forward-modeling scheme and is implemented as a Python package built on top of the neuronal simulator NEURON (https://neuron.yale.edu) and the Python tool LFPy (https://lfpy.rtfd.io; https://github.com/LFPy/LFPy; Linden et al. (2014) Front Neuroinf 7:41; Hagen et al. (2018) Front Neuroinf 12:92). In contrast to other efforts, history-dependent spike waveforms are accounted for by continuous simulations of activity in biophysically detailed multicompartment neuron models. I will also talk about the potential to generate test data with recurrent networks of multiple classes of such neuron models using a recent extension to networks in LFPy.

André Marques Smith: Recording from the same neuron using Neuropixel probes and patch-clamp - a ground-truth dataset

Extracellular recordings have a long history in neuroscience but methods to analyse them and sort action potentials from each neuron are notoriously challenging to develop and use. Recent advances in technology are enabling neuroscientists to record from up to 1,000 channels per probe, which is exciting but has also made solving analytical problems a key priority. For this reason, we obtained and shared a ground-truth dataset where we recorded from the same neuron in anaesthetised rats using patch-clamp and Neuropixels CMOS probes. This dataset is openly available and our goal is for it to be used to help develop and improve spike-sorting algorithms and other aspects of extracellular neurophysiology. In my talk I will present the results of initial exploratory and descriptive analyses on this dataset, particularly on the detectability of patch-clamp spikes on the extracellular probe, within-unit reliability of spike features and spatiotemporal dynamics of the action potential waveform across multiple channels.

Olivier Marre: Spike sorting: a few lessons from the retina

The retina has been termed “an approachable part of the brain”. Here I will show that this applies very well for spike sorting: many methods developed to sort spikes recorded with multi-electrode arrays in the retina have proven useful later to process cortical recordings. I will review several methodological findings in the retina from our group and others, and show how they are also valid also when addressing the more difficult problem of sorting data from cortical recordings. In particular, I will argue that clustering approaches need to be complemented by template matching, especially to reconstruct properly cross-correlograms between pairs of neurons. I will also discuss the caveats associated with the generation of “hybrid” ground truth datasets, which might be only fully resolved by the acquisition of real ground truth datasets. Finally, I will discuss why the quantitative comparison of spike sorting might need to be complemented by an active discussion on the performance metrics.

Pierre Yger: A real-time spike sorting software for hundreds electrodes

Recent technological advances have made it possible to record simultaneously from tens to thousands electrodes packed with high density. To analyze these extracellular data, scalable, accurate and semi-automated algorithms have been proposed to sort spikes from hundreds of recorded cells. However, these algorithms do not allow tracking the activity of individual neurons during the experiment, since the entire processing is run offline. This is a limitation for experiments where some decisions of the experimentalist could be guided by the recent activity of the recorded cells, and more generally for the design of closed loop experiments. To address this issue we designed an online spike sorting software that accurately sorts spikes in real time for up to hundreds of electrodes. Our algorithm identifies the template waveforms and their spike times by combining a clustering algorithm and a greedy template matching algorithm. It handles well-known spike sorting issues such as misalignments in the spike detection or overlapping spike waveforms. The online clustering procedure allows dealing with slow changes of the templates over time, due to slow drifts of the electrodes. Depending on the number of electrodes to process, it can be run on a few desktop computers that communicate together through ethernet ports.

Jeremy Magland: SpikeForest: a web-based spike sorting validation platform and analysis framework

As the collection of automated spike sorting software packages continues to grow, there is much uncertainty and folklore about the quality of their performance in various experimental conditions. Several papers report comparisons on a case-by-case basis, but there is a lack of standardized measures and validation data. Furthermore, there is a potential for bias, such as sub-optimal tuning of competing algorithms, and a focus on one brain region or probe type. Without a fair and transparent comparison, genuine progress in the field remains difficult.

We have addressed this challenge by developing SpikeForest, a reproducible, continuously updating platform which benchmarks the performance of spike sorting codes across a large curated database of electrophysiological recordings with ground truth. With contributions from over a dozen participating labs, our database includes hundreds of recordings, in various brain regions, with thousands of ground truth units (and growing). As well as extracellular recordings with paired intracellular ground truth, we include state-of-the-art simulated recordings, and hybrid synthetic datasets.

In collaboration with the SpikeInterface project, we have wrapped many popular sorting algorithms (including HerdingSpikes2, IronClust, JRCLUST, KiloSort, Kilosort2, Klusta, MountainSort4, SpyKING CIRCUS, Tridesclous, and YASS) under a common Python interface that performs automatic caching of results and guarantees reproducibility via singularity containers and transparency of parameter choices. This also enables researchers themselves to install and run all tested sorters with a single interface.

The large scale of our analysis demands an automatically updating nightly batch on a high performance compute cluster, where hundreds of sorting jobs are run in parallel (> 20000 CPU/GPU hours). Results are uploaded to a MongoDB database which is then accessed by our public-facing web site. This site allows intuitive comparison of metrics (precision, recall, overall accuracy, and runtime) across all sorters and recordings, and interactive visual “drilling down” into each sorting output at the single unit or event channel-trace level. The web technology is built on Node.js/React. In cooperation with SpikeInterface, the SpikeForest framework will continuously validate community progress in automated spike sorting, and guide neuroscientists to an optimal choice of sorter and parameters for a wide range of probes and brain regions.

With James Jun, Liz Lovero, Alex Morley, Alex Barnett.

Alessio Paolo Buccino: MEArec: a fast and customizable testbench simulatorfor ground-truth extracellular spiking activity

When recording neural activity from extracellular electrodes, both in-vivo and in-vitro, spike sorting is a required and essential processing step that allows for identification of single neurons’ activity. In recent years there has been an extensive development of methods and software packages that attempt to solve the spike sorting problem. However, spike sorting validation is complicated because it is an unsupervised problem in nature and it is hard to find universal metrics to evaluate its performance. While simultaneous recordings that combine extracellular and patch-clamp or juxtacellular techniques can provide useful ground-truth data, their use is limited by the fact that only a few cells can be measured at the same time. Alternatively, simulated ground-truth recordings can be used in order to evaluate and rank the performance of spike sorters. In this talk I will present MEArec, a Python based software which permits a flexible and fast simulation of extracellular recordings. MEArec allows users to generate extracellular recordings on various customizable electrode designs and can replicate various problematic aspects for spike sorting, such as bursting modulation and drifting. I will first introduce the design scheme of the simulator and then show some examples for using it as a testbench platform, in combination with SpikeInterface, for spike sorting development and evaluation.

Gerrit Hilgen: Large-scale, high-density recordings of light responses from mouse retinal ganglion cells

The advent of large-scale, high-density multielectrode arrays (MEAs) allows not only simultaneous recording from thousands of neurons but also sampling from a large area. These devices give unprecedented opportunities to decipher the encoding in small and large networks, without variability caused by pooling data from multiple experiments. Using the high-density large-scale CMOS-based Active Pixel Sensor (APS) MEA (Biocam, 3Brain, Landquart, CH) featuring 4,096 electrodes (42 µm spacing) arranged in a 64x64 configuration that covers an active area of 2.67x2.67 mm, we can record nearly at pan-retinal level from the RGC layer in the mouse retina. Here I present past and recent studies which studied light-evoked activity recorded simultaneously from 100s-1,000s of individual RGCs at pan‐retinal level in the neonatal and adult mouse retina. I will show the ontogeny, from eye opening to maturity, of light-driven responses in mouse retinal ganglion cells. The responses to different stimuli not only revealed a complex developmental profile for ON, OFF and ON-OFF responses, but also unveiled differences between dorsal and ventral RGC responses, likely reflecting ecological requirements. Further, I will present a novel, interdisciplinary approach for classifying RGCs using pharmacogenetics combined with large-scale retinal electrophysiology and posthoc anatomy. This approach allows us to unravel how RGCs sharing gene expression respond to complex visual scenes. These findings were only possible because of the large scale of the MEA and the high-density distribution of the electrodes.

The projects received financial support from the Leverhulme Trust (RPG-2016-315), the BBSRC (BB/I023526/ and BB/P018440/1) and the 7th Framework Programme for Research of the European Commission, under Grant agreement no 600847: RENVISION project of the Future and Emerging Technologies (FET) programme.

Samuel Garcia: Reproducible evaluation of different spike sorters using spiketoolkit - concept and examples

Following recent progress in CMOS technology, a wave of scalable spike sorting algorithms have been developed and their respective papers published. In each paper, the authors benchmark their algorithm and compare its results to those of other popular sorting solutions. As the number and complexity of sorting algorithms increase, however, fair, exhaustive comparisons become difficult to achieve. This has led to a fuzzy understanding for the users (and even developers) about the strengths and weaknesses of new algorithms. In order to address this problem, we introduce SpikeToolkit, a python package developed within the SpikeInterface framework. SpikeToolkit offers users a standardized way to launch and retrieve results from many popular spike sorting algorithms. Currently, we include wrappers for Kilosort, Kilosort2, Klusta, SpyKING CIRCUS, Ironclust, HerdingSpikes2, Moutainsort4, and Tridesclous, allowing developers to run new solutions with little effort. SpikeToolkit also provides a collection of standardized metrics and tools to compare the results of different sorters with and without ground-truth. Taken together, we provide a simple, fair, and efficient pipeline for comparing different spike sorters.

This presentation will include two examples of the SpikeToolkit pipeline applied to public, ground-truth datasets: an in-vitro recording (from a mouse retina) and in-silico recording (simulated with MEArec).

David Jaeckel: Challenges and Opportunities of Analyzing MEA Recordings of iPSC-derived Neurons

The availability of induced pluripotent stem cells (iPSCs) has had an enormous impact on the field of biological research and drug development. Providing an in-vitro model of human tissue, iPSCs have become an important tool for modeling and investigating human diseases. Today, cultured iPSC-derived neurons from healthy and diseased donors are widely used for drug screening, drug testing and to assess human safety of compounds for brain diseases. Several techniques exist to study and characterize cultured iPSC-derived neurons. High-density microelectrode arrays (HD-MEAs) allow measuring the network activity over weeks and at high spatiotemporal resolution. In the talk, I will present how the CMOS-based HD-MEA MaxOne can be used to measure activity from iPSC-derived neurons at various levels. I will discuss the importance of spike sorting for the analysis of neuronal data in the context of drug research. Furthermore, I will describe the challenges analyzing data from iPSC-derived neurons, as compared to recordings from other preparations, such as primary neuronal cultures. The talk will be concluded describing the requirements, which a spike sorter for high-throughput drug research must meet.

Felix Franke: Template-matching-based spike sorting for large scale high-density electrode arrays. Past and future challenges

Cole Hurwitz: SpikeInterface: An open-source framework for sorting, analysis, and evaluation of extracellular recordings

Recent breakthroughs in microelectronics have enabled high precision extracellular recording of thousands of neurons both in vitro and in vivo. While the increased data volume and complexity offers unprecedented opportunities for understanding brain function, it also heightens the need for standardized, reproducible analysis techniques. To this end, we developed SpikeInterface, a framework for extracting and analyzing relevant information from both raw and spike-sorted extracellular datasets of any established file format. SpikeInterface was designed to standardize how data is retrieved from files, rather than how it is stored, allowing users to access, sort, and analyze extracellular datasets with the same tools, regardless of the underlying file format. With this framework, we hope to facilitate standardized analysis and visualization of extracellular data, promote straightforward reuse of extracellular datasets, increase the reproducibility of electrophysiological studies using spike sorting software, and address issues of file format compatibility within electrophysiology research without creating yet another file format

Posters

Carving out functional types of retinal ganglion cells from large-scale recordings of marmoset retinas

Fernando Rozenblit, Dario Protti, Tim Gollisch, University Medical Center Göttingen

Online template-matching based spike sorting for microelectrode arrays with hundreds of channels

Baptiste Lefebvre, Olivier Marre and Pierre Yger, Institut de la vision

Understanding how assemblies of neurons encode information requires recording of large populations of cells in the brain. In recent years, multi-electrode arrays and large silicon probes have been developed to record simultaneously from thousands of electrodes packed with a high density. These devices challenge the classical way to perform spike sorting. We developed a fast and accurate spike sorting algorithm, SpyKING CIRCUS, validated both with in vivo and in vitro ground truth experiments. This software appears as a general solution to sort, offline, spikes from large-scale extracellular recordings.

Here we present an implementation of this algorithm in an “online” mode, sorting spikes in real time while the data are acquired, to allow closed-loop experiments for high density electrophysiology. To achieve such a goal, we built a robust architecture for distributed asynchronous computations and we proposed a modified algorithm that is composed of two concurrent processes running continuously: “a template-finding” process to extract the cell templates, and a “template-matching” process to identify the spike times. This software appears as a promising solution for closed-loop experiments involving recordings with hundreds of electrodes.

Distributed extracellular recordings from 11 cortical regions in freely moving rats

Lorenza Calcaterra, UCL

SWAP: Slow Waves Analysis Pipeline

Giulia De Bonis, Istututo Nazionale di Fisica Nucleare (INFN)

Cortical slow oscillations are an emergent property of the cortical network, a hallmark of low complexity brain states like sleep, and represent a default activity pattern. We present a methodological approach for quantifying the spatial and temporal properties of this emergent activity. The analysis pipeline, named SWAP (Slow Waves Analysis Pipeline), detects Up and Down states, enables the characterization of the spatial dependency of their statistical properties, and supports the comparison of different subjects. The SWAP is implemented in a data-independent way, allowing its application to other data sets (acquired from different subjects, or with different recording tools), as well as to the outcome of numerical simulations. By using SWAP, we report statistically significant differences across cortical areas and cortical sites in slow oscillations (SO) observed in traces recorded from a custom 32-channel multi-electrode array in wild-type isoflurane-anesthetized mice. Computing cortical maps by interpolating the features of SO acquired at the electrode positions, we give evidence of gradients at the global scale along an oblique axis directed from fronto-lateral towards occipito-medial regions, further highlighting some heterogeneity within cortical areas.

Skellam Process with Resetting: A Statistical model for Simultaneously Recorded Neural Spike Trains

Reza Ramezan, University of Waterloo

With recent developments in data recording, the need for efficient multivariate statistical models for neural spike trains is both crucial and pressing. Among limited multivariate point process models for neurophysiological data is the newly developed biologically justified Skellam Process with Resetting (SPR). Motivated by the spike generation process and neural integration, SPR provides a flexible framework for the analysis of multiple spike trains. It is shown, through simulations and real data analyses, that SPR is a powerful point process model for the analysis of simultaneously recorded spike trains. The effect of misclassification due to spike sorting on the performance of the algorithm is also discussed.

Human pluripotent stem cell derived neuronal networks

Juha Heikkil, Tampere University

YASS

Peter Lee, Columbia University

Scalable Spike Source Localization in Dense Recordings using Amortized Variational Inference

Cole Hurwitz, University of Edinburgh

Tridesclous and pyacq: a real time spike sorting engine

Christophe Pouzat, Samuel Garcia, Centre de Recherche en Neuroscience de Lyon

Trideslcous is a new package for spike sorting, suitable both for online and offline use. tridesclous is based on template matching, which resolves the problem of spike collision. The software also includes a user interface specifically designed top optimize the results. With critical elements of the processing chain written with OpenCL, which enables GPU computing, Trideslcous is fast and scalable, and enables spike sorting for high channel counts (>100).

pyacq is a framework for distributed acquisition/processing/visualzation. It allows to distribute processing “Nodes” across multiple machines, and interfaces several devices commonly used in the field of multi electrode electrophysiology (Blackrock, Multichannel, measurement computing, nation instrument, and soon the Intantech board). This makes it possible to grab in real time multi channel signal chunks, distribute then across an dedicated network and sort them with the tridesclous in real time. These two packages together provide experimentalist with a customisable system for real time spike sorting without sacrificing the sorting quality.

https://github.com/tridesclous/tridesclous

General Framework for Tracking Sparse and Drifting Neurons Over Long Term Recordings

Fernando J Chaure and Hernan G Rey, University of Leicester