As I did for past COSYNE‘s (2009, 2010, 2011), this is a summary of my personal experience this year. I loved both the main meeting and workshops. It’s definitely one of the best conferences. I had to present my own posters, so I couldn’t see many others. Therefore, this selection is severely subsampled. Also, there might be details that I am not remembering correctly. If you spot any mistake, please let me know.

## Neural dynamics in neural coding

II-67. Jeffrey Seely, Matthew T. Kaufman, John Cunningham, Stephen Ryu, Krishna Shenoy, Mark Churchland. Dimensionality in motor cortex: differences between models and experiment

Is population activity in the motor cortex well explained by tuning curves for each neuron, or is it better explained by linear dynamics? To answer this question, they collapsed each experimental condition (motor output) to a temporally interpolated histogram of same length. A large 3-D matrix A(n,c,t) for n neurons, c conditions, t time bins is constructed, and sliced with 2 different possible low rank approximations (PCA): one is with conditions which implies tuning curve like characteristics, and the other is with time which represents the dynamic modes (basis solutions to a differential equation). They sequentially chose the component, either condition or dynamics, that explains the most variance, subtracted it, and repeated. They showed that real data is mostly dynamics while tuning based models generate data that are mostly condition based. This is a pretty convincing argument using only very basic tools.

Workshop talk: David Sussillo. Rethinking gating: selective integration of sensory signals through network dynamics

Frontal Eye Field (FEF) spiking responses to a colored random dots task where a contextual cue determines whether the monkey has to use the dots direction or majority of color to make the decision is analyzed in a dynamical system framework. (This talk is related to Valerio Mante’s poster II-58, but I missed it in the main meeting.) The question is how does the monkey switch context: is it some sort of gating mechanism that controls if the motion stimulus or color stimulus reaches FEF? Or is all information gets to FEF and decision is formed? Directions in the population firing rate (state) space is extracted by regression on the conditional firing rates: $r(t) = \beta_1(t) * choice + \beta_2(t) * color + \beta_3(t) * motion + \mbox{`interaction terms'}$ (the neurons were recorded one at a time; they assume independence). The trajectory in the neural state space (reconstructed from $\beta$‘s) shows integration-like behavior on the relevant-stimulus while encoding, but not integrating the irrelevant dimension. They further built a recurrent neural network model trained with [James Martens, Ilya Sutskever, Learning Recurrent Neural Networks with Hessian-Free Optimization, ICML 2011 pdf related post] and saw similar performance and dynamics by tuning only the stimulus noise level. He further showed a fixed point analysis of the trained network and a non-normal matrix decomposition to explain the integration on the line atractor. A similar talk given jointly by Mante and Sussillo at Santa Fe Institute can be found online. EDIT: they gave another related talk at redwood center with online video.

## Learning to fire a precise temporal pattern from spike train input

I was pleased to find 3 very cool posters related to learning synaptic weights of an integrate-and-fire neuron for spike train input. It is the come back of tempotron (Robert Gütig, Nature 2006)!

I-22. Robert Gütig. The multi-class tempotron: a neuron model for processing of sensory streams

Tempotron is a classifier that either emits a spike or not to indicate the class given an input spike pattern from many neurons. Robert extended the tempotron to allow not just one or zero spike, but to learn to fire a prescribed number of spikes. This is done by considering the rank-ordered membrane voltage peaks simultaneously, where the original tempotron only deals with the maximum peak. He showed that this could be used to detect an event or feature in time. The training is done by providing just the number of occurrences (does not require tagging time series with precise timings).

II-23. Raoul-Martin Memmesheimer, Ran Rubin, Haim Sompolinsky. Learning precisely timed spiking responses

One of the main advantages of tempotron is that it can use time as an extra degree of freedom, allowing a higher capacity (# of patterns / synpase) compared to traditional perceptron. However, this is also a disadvantage because the precise timing of the spikes are not controlled. This poster describes a couple of simple iterative procedures for updating the weights and threshold of an IF to produce a desired spiking pattern. The tricky part of such task is the complication of reset after erroneous spikes. Two algorithms, (1) first error learning, and (2) high threshold learning are proposed to overcome this difficulty. They showed that the algorithm converges to the solution in a similar fashion to perceptron, if there is a solution.

II-39. Ran Rubin, Raoul-Martin Memmesheimer, Haim Sompolinskyo. Support Vector Machines in Spiking Neurons with Non-Linear Dendrites

This is a companion poster to II-23. They extend the method to find a robust solution by maximizing the margin. Using an auxiliary voltage trace assuming the resets happened in a small time before the desired spike occurred (as in the high threshold learning algorithm), they formulated the problem as an SVM-like optimization problem with constraints. They also proposed active dendrites as nonlinear positive semi-definite kernels (point nonlinearity on the original inner product).

## Probabilistic modeling based neural/stimulus distances

Measuring similarity given a generative system for the data can be done with divergences. Given a probabilistic spiking neuron population model, one can measure the similarity between the stimuli or between the population responses; there were two posters for each idea using the Ising model.

I-7. Elad Ganmor, Ronen Segev, Elad Schneidman. Semantic organization of a neural population codebook and accurate decoding using a neural thesaurus

Trial to trial variability of the population response $P(r|s)$ was captured by an Ising model. Using the Bayes rule, they measured the Jensen-Shannon divergence:  $d(r_1, r_2) = D_{JS}(P(s|r_1), P(s|r_2))$ (not a metric unless sqrt is taken). They only consider instantaneous response (20 neurons, 10 ms bin, binarized), and no temporal structure. Using hierarchical clustering (forming the codebook) on the test response patterns, they showed that such method captures most of the mutual information with just a few clusters.

I-35. Gasper Tkacik, Einat Granot-Atedgi, Ronen Segev, Elad Schneidman. Retinal metric: a stimulus distance measure derived from population neural responses

They used symmetric Kullback-Leibler divergence between the stimulus conditioned response distances as a similarity measure between stimuli: $d(s_1, s_2) = D_{KL}^{sym}(P(r|s_1);P(r|s_2))$ for similarity in the stimulus space. Conditional distribution was modeled with a stimulus driven maximum entropy ising model where the higher order interaction terms do not depend on the stimulus: $P(r|s) = \frac{1}{Z} \exp\left(h(s)r + \sum_{i \neq j} J_{ij} r_i r_j\right)$. They did not use JS divergence because it is difficult to compute it from the Ising model. This similarity reveals which features of the stimulus the population really cares about.

## Extending Spike Triggered Covariance

There were 3 very related talks about spike triggered covariance (STC) analysis in the Characterizing Neural Responses to Structured and Naturalistic Stimuli workshop organized by Kanaka Rajan and William Bialek.

Jonathan Pillow

His talk was focused on Empirical Bayes (EB) methods for the inference of hierarchical models. The first part was about Mijung Park’s work on spatio-temporal and frequency localized prior design for receptive fields. The second part, which was brief due to time constraints, was about a Bayesian extension of STC where the number of receptive fields is inferred by EB.

William Bialek

He talked about the full history from reverse correlation (Boer 1968), to STC, to maximizing mutual information. He introduced Kanaka’s work (arXiv:1201.0321v1 [q-bio.NC], also poster III-34, but I missed it) on maximizing mutual information between a quadratic projections of the stimulus to the response. This is an interesting extension of MID (see Sharpee’s talk below). MID tends to degrade as the number of dimensions to extract increases, but their method seems to work better.

Tatyana Sharpee

Maximally informative dimension (MID) aims at finding receptive fields of a linear-nonlinear cascade regardless of the nonlinearity by maximizing mutual information. This is an ideal goal, however, estimation and maximization of mutual information is very difficult in practice (as in the case of information bottleneck), and implementation suffers from local minima and (histogram) parameterization. She presented an approach from the opposite direction to minimize mutual information (or equivalently, maximize conditional entropy of response given stimulus). Using a maximum entropy model with first two moments constrained, she derived that a quadratic form of logistic regression as a model to fit for binary spiking response: $P(spike|s) = \frac{1}{1+exp(a+h\cdot s+s^\top \cdot J \cdot s)}$. This is closely related to our BSTC work which has similar quadratic form of Poisson regression model as a special case. (I missed the related poster by Ryan Rowekamp et al. II-35) Ref: J.D. Fitzgerald, R. J. Rowekamp, L.C. Sincich and T.O. Sharpee, (2011) “Second order dimensionality reduction using minimum and maximum mutual information models”, PLoS Computational Biology, 7(10): e1002249 doi:10.1371/journal.pcbi.1002249

III-36. Brett Vintch, Andrew D Zaharia, Tony Movshon, Eero P Simoncelli. Fitting receptive fields in V1 and V2 as linear combinationsof nonlinear subunits

I missed this one, but this one is also highly related. They have a low complexity model that generates a set of filters that are generally obtained from STC on V1 complex cells.