This was my first time at CNS (computational neuroscience conference, not to be confused with the cognitive neuroscience one with the same acronym). I was invited to give a talk at the “Examining the dynamic nature of neural representations with the olfactory system workshop” organized by Chris Buckley, Thomas Nowotny, and Taro Toyoizumi. I presented my bursting olfactory receptor neurons can form instantaneous memory about the temporal structure of odor plume encounter story and a bit of related Calcium imaging study. Below is my summary of the workshop talks I went to (system identification workshop, information theory workshop on the first day, and olfactory workshop on the second day).
Garrett Stanley talked about system identification of the rat barrel cortex response from whisker deflection. He started by criticizing the white-noise Volterra series approach; it requires too much data. Instead, by designing a sequence of parametric stimuli that will directly show 2nd order and 3rd order interactions, he could fit a parametric form of firing rate response with good predictive powers . As far as I can tell, it seemed like a rank-1 approximation of the 3rd order Volterra kernel. However, this model was lacking the fine-temporal latency, as well as stimulus intensity dependent bimodal responses, which was later fixed by a better model with feedback .
Vladimir Brezina talked about modeling of feedback from muscle contractions onto a rhythmic central pattern generator in the crab heart. He used LNL and LN models to fit the response of 9 neurons and muscles in the crab heart. For the LNL system, he used a bilinear optimization of the squared error. However, for the spiking response of the LN model, instead of using the Bernoulli or Poisson likelihood (the GLM model), he used least squares to fit the parameters.
Matthieu Louis gave a talk about optogenetically controlling drosophila larva’s olfactory sensory neurons. They built an impressive closed loop system that can control the larva’s behavior as if it were in an odor gradient. They modeled the system as a black box with odor input and behavior as output, skipping the model of the nervous system, and successfully predicted the behavior and control it .
Daniel Coca talked about how fly photoreceptors can act as a nonlinear temporal filter that is optimized for detecting edges. He fit a NARMAX (nonlinear ARMA-X) model and analyzed it in the frequency domain and found that the phase response is consistent with phase congruency detection model for edge detection. Also, he explained how the system “linearizes” when stimulated with white Gaussian noise, although I couldn’t follow the details due to my lack of knowledge in nonlinear frequency domain analysis.
Tatyana Sharpee talked about sphere packing in the context of receptive fields of retina, and conditional population firing rates of song birds. For the receptive fields, she showed that to maximize the mutual information per unit lattice between a point source of light and the (binary) neural response of ganglion cells, if the lattice is not-perfect, elliptical shapes of receptive fields can help. For the song bird case, she showed that the noise correlation can change with training to improve separation (classification performance) of the conditional distributions while the irrelevant stimuli became less separable.
Rava Azeredo da Silveira talked about how finely tuned correlation structure can immensely increase performance. Given two population of neurons, each tuned to a class weakly (slightly higher firing rate for the preferred class), if cross-population correlation is slightly higher than otherwise, the population response as a whole can be very certain about the class identity. He also talked about many other related things such as asymptotics on required population size vs noise.
Shy Shoham talked about Linear-Nonlinear-Poisson (LNP) and Linear-Nonlinear-Hawkes (LNH) models, and how to relate spike train (output) correlations to gaussian (input) correlation [4,5]. LNH has a similar form to GLM but the feedback is added outside the nonlinearity. He referred to the procedure of inferring the underlying latent AR process as correlation-distortion, and proposed to use it for studying neural point processes as AR models; hence apply Granger causality, and other signal processing tools. He also talked about semi-blind system identification where the goal is to infer the linear kernel of the model given the autocorrelation of the input and the autocorrelation of the population spike trains are given (the phase ambiguity of the filter is resolved by choosing the minimal phase filter.)
Maxim Bazhenov talked about modeling the transient synchronization in the locust olfactory system as a network phenomena (interaction between projection neurons (PNs) and local inter-neurons (LNs)). The pattern of synchronization of PNs over multiple LFP cycles is repeatable, and his model reproduces it. He showed an interesting illustration of the connectivity between LNs posed as the graph coloring problem . Each cluster of LNs targets everybody outside their cluster, enabling synchrony within. The connectivity matrix is effectively a block diagonal of zeros, and the off-diagonals are ones, because they are inhibitory neurons.
Nitin Gupta gave a talk on lateral horn (LH) cells. The normative model has been that the inhibitory neurons in LH acts as feed-forward inhibition to limit the integration time within the Kenyon cells (KCs). He identified a heterogeneous population of neurons in LH (see  for beautifully filled neurons). Among the ones that project to mushroom body (where KCs are), he found no evidence of GABA co-location, suggesting that there is no feed-forward inhibition through LH. He proposed an alternative model for limiting integration time in KCs, namely the feedback inhibition through (non-spiking) GGNs.
Thomas Nowotny talked about how odor plume structure can help in separating mixture of different sources, based on the the results of . He proposed a simple model of lateral inhibition circuit among the glomeruli. The model showed counter-intuitive results for temporal mixtures of odor when linear decoding is used.
Kevin C. Daly gave a data packed talk on Manduca sexta (moth) olfactory system . The oscillation he observed had a frequency modulation; starts at a high frequency and quickly falls, and it is odor dependent. He criticized the use of continuous odor application which may result in pathological responses (my wording), and instead he showed response to odor-puffs. (Interestingly, the blank puffs decreased the response.) He also emphasized the importance of not cutting the head of the animal, which preserves a pair of histamine neurons.
Aurel A. Lazar talked about precise odor delivery system using laminar flows that can produce a diverse temporal pattern of odor concentration with around 1% of error. Using this system, they showed that the firing response of the first two stages of drosophila; receptor neurons and projection neurons are both temporally differentiating. This was not simultaneously recorded, but thanks to the repeatable stimuli and response, it is well supported.
- R. M. Webber and G. B. Stanley. Transient and steady-state dynamics of cortical adaptation, J. Neurophys., 95:2923-2932, 2006.
- A. S. Boloori, R. A. Jenks, Gaelle Desbordes, and G. B. Stanley. Encoding and decoding cortical representations of tactile features in the vibrissa system, J. Neurosci., 30(30):9990-10005, 2010.
- Gomez-Marin A, Stephens GJ, Louis M. Active sampling and decision making in Drosophila chemotaxis. Nature Communications 2:441. doi: 10.1038/ncomms1455 (2011).
- Michael Krumin, Shy Shoham. Generation of Spike Trains with Controlled Auto- and Cross-Correlation Functions. Neural Computation. June 2009, Vol. 21, No. 6, Pages 1642-1664
- Michael Krumin, Inna Reutsky, Shy Shoham. Correlation-Based Analysis and Generation of Multiple Spike Trains Using Hawkes Models with an Exogenous Input. Front Comput Neurosci. 2010; 4: 147
- Assisi C, Stopfer M, Bazhenov M. Using the structure of inhibitory networks to unravel mechanisms of spatiotemporal patterning. Neuron. 2011 Jan 27;69(2):373-86.
- Nitin Gupta, Mark Stopfer. Functional Analysis of a Higher Olfactory Center, the Lateral Horn. Journal of Neuroscience, 13 June 2012, 32(24): 8138-8148; doi: 10.1523/JNEUROSCI.1066-12.2012
- Paul Szyszka, Jacob S. Stierle, Stephanie Biergans, C. Giovanni Galizia. The Speed of Smell: Odor-Object Segregation within Milliseconds. PLoS ONE, Vol. 7, No. 4. (27 April 2012), e36096, doi:10.1371/journal.pone.0036096
- Daly KC, Galán RF, Peters OJ and Staudacher EM (2011) Detailed characterization of local field potential oscillations and their relationship to spike timing in the antennal lobe of the moth Manduca sexta. Front. Neuroeng. 4:12. doi: 10.3389/fneng.2011.00012
Il Memming Park: On halting problem route to incompleteness
Kenneth Latimer: On Roger Penrose’s Emperor’s new mind
Michael Buice: Algebra of Probable Inference
Ryan Usher: An Incomplete, Inconsistent, Undecidable and Unsatisfiable Look at the Colloquial Identity and Aesthetic Possibilities of Math or Logic
Jonathan Pillow: Do we live inside a Turing machine?
- Simulated human brain brings consciousness (“substance independence”)
- Large scale simulation of human brain + physical world around human is possible
- Alan Turing. (1936) On computable numbers with an application to the Entscheidungsproblem. Proceedings of the London Mathematical Society. 2 42: 230
- Michael Sipser. Introduction to Computation (Memming’s halting problem proof followed this one)
- Roger Penrose. Emperor’s new mind
- Torkel Franzén. Godel’s Theorem: An Incomplete Guide to Its Use and Abuse (recommended by Ryan)
- Richard T. Cox. Algebra of Probable Inference
- Cox, R. (1946). Probability, frequency and reasonable expectation. American Journal of Physics, 14(1), 1–13.
- E.T. Jaynes. Probability Theory: The Logic of Science
- Martin Davis. The Undecidable (Collection of papers) The Undecidable: Basic Papers on Undecidable Propositions, Unsolvable Problems and Computable Functions (Dover Books on Mathematics)
- Martin Davis, Computability and Unsolvability (Michael Buice: One of the most beautiful books written by humankind; introduction to recursive function theory and computability, turing machines. One of the few books which does so in a complete and rigorous manner, also covers Logic and Gödel’s theorem.)
- Bostrom, N. , 2003, Are You Living in a Computer Simulation?, Philosophical Quarterly (2003), Vol. 53, No. 211, pp. 243-255.
Primary olfactory receptor neurons (ORN) bind to odor molecules in the medium and sends action potentials to the brain. This signaling is not simply ON and OFF, but each ORN has delicate sensitivity to various odors and shows diverse temporal activation patterns. Using both electrophysiology and Calcium-sensitive dye imaging, my collaborators Yuriy V. Bobkov and Kirill Y. Ukhanov studied the temporal aspect of Lobster ORNs. The heterogeneous response patterns are well presented in a recent paper published in PLoS One. I was particularly interested in a special type of ORN called bursting ORNs. Bursting ORNs are spontaneously oscillating, and the Calcium imaging data allows population analysis. I was involved in the analysis to see if there’s any sign of synchrony using resampling based burst-triggered averaging technique. It turns out that they rarely interact, if any. Moreover, they have a wide range of periods of oscillation. Since they are coupled through the environment (a filament of odor molecules in the medium), in natural environments or under controlled odor stimulation they sometimes synchronize which is a subject of another paper under review.
Note: the publication actually has my first name as Ill instead of Il which is silly and sick. I asked for a correction, but it seems PLoS One will only publish a note for the correction and not correct the actual article (because of the inconsistency it will cause for other indexing systems ). This could have been fixed in the proof, if PLoS did proofs before final publications, but they don’t (presumably to lower costs). In my opinion, this is a flaw of PLoS journals. EDIT: there’s a note saying that my name is misspelled now.
As I did for past COSYNE‘s (2009, 2010, 2011), this is a summary of my personal experience this year. I loved both the main meeting and workshops. It’s definitely one of the best conferences. I had to present my own posters, so I couldn’t see many others. Therefore, this selection is severely subsampled. Also, there might be details that I am not remembering correctly. If you spot any mistake, please let me know.
Neural dynamics in neural coding
II-67. Jeffrey Seely, Matthew T. Kaufman, John Cunningham, Stephen Ryu, Krishna Shenoy, Mark Churchland. Dimensionality in motor cortex: differences between models and experiment
Is population activity in the motor cortex well explained by tuning curves for each neuron, or is it better explained by linear dynamics? To answer this question, they collapsed each experimental condition (motor output) to a temporally interpolated histogram of same length. A large 3-D matrix A(n,c,t) for n neurons, c conditions, t time bins is constructed, and sliced with 2 different possible low rank approximations (PCA): one is with conditions which implies tuning curve like characteristics, and the other is with time which represents the dynamic modes (basis solutions to a differential equation). They sequentially chose the component, either condition or dynamics, that explains the most variance, subtracted it, and repeated. They showed that real data is mostly dynamics while tuning based models generate data that are mostly condition based. This is a pretty convincing argument using only very basic tools.
Workshop talk: David Sussillo. Rethinking gating: selective integration of sensory signals through network dynamics
Frontal Eye Field (FEF) spiking responses to a colored random dots task where a contextual cue determines whether the monkey has to use the dots direction or majority of color to make the decision is analyzed in a dynamical system framework. (This talk is related to Valerio Mante’s poster II-58, but I missed it in the main meeting.) The question is how does the monkey switch context: is it some sort of gating mechanism that controls if the motion stimulus or color stimulus reaches FEF? Or is all information gets to FEF and decision is formed? Directions in the population firing rate (state) space is extracted by regression on the conditional firing rates: (the neurons were recorded one at a time; they assume independence). The trajectory in the neural state space (reconstructed from ‘s) shows integration-like behavior on the relevant-stimulus while encoding, but not integrating the irrelevant dimension. They further built a recurrent neural network model trained with [James Martens, Ilya Sutskever, Learning Recurrent Neural Networks with Hessian-Free Optimization, ICML 2011 pdf related post] and saw similar performance and dynamics by tuning only the stimulus noise level. He further showed a fixed point analysis of the trained network and a non-normal matrix decomposition to explain the integration on the line atractor. A similar talk given jointly by Mante and Sussillo at Santa Fe Institute can be found online. EDIT: they gave another related talk at redwood center with online video.
Learning to fire a precise temporal pattern from spike train input
I was pleased to find 3 very cool posters related to learning synaptic weights of an integrate-and-fire neuron for spike train input. It is the come back of tempotron (Robert Gütig, Nature 2006)!
I-22. Robert Gütig. The multi-class tempotron: a neuron model for processing of sensory streams
Tempotron is a classifier that either emits a spike or not to indicate the class given an input spike pattern from many neurons. Robert extended the tempotron to allow not just one or zero spike, but to learn to fire a prescribed number of spikes. This is done by considering the rank-ordered membrane voltage peaks simultaneously, where the original tempotron only deals with the maximum peak. He showed that this could be used to detect an event or feature in time. The training is done by providing just the number of occurrences (does not require tagging time series with precise timings).
One of the main advantages of tempotron is that it can use time as an extra degree of freedom, allowing a higher capacity (# of patterns / synpase) compared to traditional perceptron. However, this is also a disadvantage because the precise timing of the spikes are not controlled. This poster describes a couple of simple iterative procedures for updating the weights and threshold of an IF to produce a desired spiking pattern. The tricky part of such task is the complication of reset after erroneous spikes. Two algorithms, (1) first error learning, and (2) high threshold learning are proposed to overcome this difficulty. They showed that the algorithm converges to the solution in a similar fashion to perceptron, if there is a solution.
II-39. Ran Rubin, Raoul-Martin Memmesheimer, Haim Sompolinskyo. Support Vector Machines in Spiking Neurons with Non-Linear Dendrites
This is a companion poster to II-23. They extend the method to find a robust solution by maximizing the margin. Using an auxiliary voltage trace assuming the resets happened in a small time before the desired spike occurred (as in the high threshold learning algorithm), they formulated the problem as an SVM-like optimization problem with constraints. They also proposed active dendrites as nonlinear positive semi-definite kernels (point nonlinearity on the original inner product).
Probabilistic modeling based neural/stimulus distances
Measuring similarity given a generative system for the data can be done with divergences. Given a probabilistic spiking neuron population model, one can measure the similarity between the stimuli or between the population responses; there were two posters for each idea using the Ising model.
I-7. Elad Ganmor, Ronen Segev, Elad Schneidman. Semantic organization of a neural population codebook and accurate decoding using a neural thesaurus
Trial to trial variability of the population response was captured by an Ising model. Using the Bayes rule, they measured the Jensen-Shannon divergence: (not a metric unless sqrt is taken). They only consider instantaneous response (20 neurons, 10 ms bin, binarized), and no temporal structure. Using hierarchical clustering (forming the codebook) on the test response patterns, they showed that such method captures most of the mutual information with just a few clusters.
I-35. Gasper Tkacik, Einat Granot-Atedgi, Ronen Segev, Elad Schneidman. Retinal metric: a stimulus distance measure derived from population neural responses
They used symmetric Kullback-Leibler divergence between the stimulus conditioned response distances as a similarity measure between stimuli: for similarity in the stimulus space. Conditional distribution was modeled with a stimulus driven maximum entropy ising model where the higher order interaction terms do not depend on the stimulus: . They did not use JS divergence because it is difficult to compute it from the Ising model. This similarity reveals which features of the stimulus the population really cares about.
Extending Spike Triggered Covariance
There were 3 very related talks about spike triggered covariance (STC) analysis in the Characterizing Neural Responses to Structured and Naturalistic Stimuli workshop organized by Kanaka Rajan and William Bialek.
His talk was focused on Empirical Bayes (EB) methods for the inference of hierarchical models. The first part was about Mijung Park’s work on spatio-temporal and frequency localized prior design for receptive fields. The second part, which was brief due to time constraints, was about a Bayesian extension of STC where the number of receptive fields is inferred by EB.
He talked about the full history from reverse correlation (Boer 1968), to STC, to maximizing mutual information. He introduced Kanaka’s work (arXiv:1201.0321v1 [q-bio.NC], also poster III-34, but I missed it) on maximizing mutual information between a quadratic projections of the stimulus to the response. This is an interesting extension of MID (see Sharpee’s talk below). MID tends to degrade as the number of dimensions to extract increases, but their method seems to work better.
Maximally informative dimension (MID) aims at finding receptive fields of a linear-nonlinear cascade regardless of the nonlinearity by maximizing mutual information. This is an ideal goal, however, estimation and maximization of mutual information is very difficult in practice (as in the case of information bottleneck), and implementation suffers from local minima and (histogram) parameterization. She presented an approach from the opposite direction to minimize mutual information (or equivalently, maximize conditional entropy of response given stimulus). Using a maximum entropy model with first two moments constrained, she derived that a quadratic form of logistic regression as a model to fit for binary spiking response: . This is closely related to our BSTC work which has similar quadratic form of Poisson regression model as a special case. (I missed the related poster by Ryan Rowekamp et al. II-35) Ref: J.D. Fitzgerald, R. J. Rowekamp, L.C. Sincich and T.O. Sharpee, (2011) “Second order dimensionality reduction using minimum and maximum mutual information models”, PLoS Computational Biology, 7(10): e1002249 doi:10.1371/journal.pcbi.1002249
III-36. Brett Vintch, Andrew D Zaharia, Tony Movshon, Eero P Simoncelli. Fitting receptive fields in V1 and V2 as linear combinationsof nonlinear subunits
I missed this one, but this one is also highly related. They have a low complexity model that generates a set of filters that are generally obtained from STC on V1 complex cells.
Shannon’s entropy is a fundamental statistic that measures the uncertainty of a (discrete) distribution. It is a building block for mutual information which has numerous applications in statistics, communication, signal processing, machine learning and so on. In the context of neuroscience, entropy can measure the maximum capacity of a neuron, quantify the amount of noise, and also serve as a cost function for theoretical derivation of learning rules. Amount of information coded by neural spike trains about a stimulus can be measured by mutual information, and provides a fundamental limit for neural codes.
Unfortunately, estimating entropy or mutual information is notoriously difficult, especially when the number of observations is less than the number of possible symbols . For the neural data, this is often the case, due to the combinatorial nature of the symbols under consideration. If we consider binning a 100 ms window of spike trains from 10 neurons with a resolution of 1 ms bin, the total number of possible symbols become . Just to observe that many symbols, one needs years. Therefore, we must be clever. The question is how to extrapolate when you may have a severely under-sampled distribution.
In the literature, there have been many entropy estimators, and mutual information estimators based on them. We extend one of the best known entropy estimators called the NSB estimator [2,3], which is a Bayesian estimator with an approximately non-informative prior on entropy. This is achieved by mixing Dirichlet distributions appropriately. We have extended the procedure to a situation where the number of symbols with non-zero probability is unknown or arbitrarily large by mixing Pitman-Yor process as priors. The limit of the NSB estimator for infinite bins can be captured by Dirichlet process mixture prior. Pitman-Yor process is an extension of Dirichlet process with an extra parameter. Advantages of using Pitman-Yor mixture is that it can fit heavy-tailed distributions, and neural data (as well as many other natural phenomena) has heavy-tailed distribution. Our estimator shows significantly smaller bias for power-law tailed generation process as well as spiking neural data.
If you’re at COSYNE 2012, details are presented as a poster titled “Bayesian entropy estimation for infinite neural alphabets” by Evan Archer, myself and Jonathan Pillow. Look for III-31 (Feb 25th, Saturday)
Update: preprint of this work can be found on the arXiv: Evan Archer*, Il Memming Park*, Jonathan Pillow. Bayesian Entropy Estimation for Countable Discrete Distributions. arXiv:1302.0328 (2013) (* equal contribution)
- Liam Paninski. Estimation of Entropy and Mutual Information. Neural Computation, Vol. 15, No. 6. (1 June 2003), pp. 1191-1253, doi:10.1162/089976603321780272
- I Nemenman, F Shafee, and W Bialek. Entropy and inference, revisited. NIPS 2001
- I Nemenman, W Bialek, and R de Ruyter van Steveninck. Entropy and information in neural spike trains: Progress on the sampling problem. Phys. Rev. E, 69:056111, 2004.
In optimal experiment design (or active learning) one seeks an online strategy for function approximation (or system identification). It is particularly useful in situations where it is costly to obtain each sample. But, what if the goal is to optimize a certain target instead of learning the entire function? For problems where parameter adjustment for maximum efficiency is required, for example, drug combination, neural micro-stimulation parameters or aircraft design, one is often not interested in recovering the full system response, but only the optimal set of parameters. Therefore it makes sense to do active learning about the locations of optimal set of parameters, but not on learning the full function.
So we decided to work on the problem under a Bayesian inference framework, and named the problem Active Bayesian Optimization (ABO). The main issue is the complexity of the posterior on the minimizer that we want to learn. Our effort based on approximation is briefly presented in this arXiv paper . However, unfortunately, we were not the first to think the ABO problem. Villemonteix and colleagues  have presented the problem in a similar setup using sampling techniques instead of approximation. We got to know this from the NIPS Bayesian optimization workshop (2011) where the referees told us about previous works. At the workshop, we also found another recent solution to ABO problem by Henning and Schuler . They used a clever approximation to the multi-modal posterior of the minimizer with EP (expectation propagation). Approximate Bayesian inference techniques or clever prior design are definitely needed for ABO, and the initial solutions in [1-3] are somewhat slow and can be computationally intractable. This is an exciting area that has a great potential to grow.
- Il Memming Park, Marcel Nassar, Mijung Park. Active Bayesian Optimization: Minimizing Minimizer Entropy. arXiv:1202.2143v1 [stat.ME]
- Julien Villemonteix, Emmanuel Vazquez, Eric Walter. An informational approach to the global optimization of expensive-to-evaluate functions. arXiv:cs/0611143v2 [cs.NA] (published in Journal of Global Optimization 2008)
- Philipp Hennig and Christian J. Schuler. Entropy search for Information-Efficient global optimization. December 2011, arXiv:1112.1217
This was my second NIPS (see last year’s NIPS summary). It had a lower acceptance rate of 22% (I served as a reviewer last year and this year). I felt like there were more computational neuroscience related posters than last year (perhaps due to the location in Europe). Non-parametric Bayes, reinforcement learning (MDP), and sparse learning were still big while kernel related posters were less. This post is a summary of my experience, and any error is due to myself (please let me know if you find any).
Dynamical segmentation of single trials from population neural data
Biljana Petreska, M. Sahani, B. Yu, J. Cunningham, S. Ryu, K. Shenoy, Gopal Santhanam
A randomly switching piecewise-linear dynamical system model is constructed via discrete latent states. Given a state, the dynamics of spiking neurons are assumed to be linear. This model is fit to 105 simultaneously recorded neurons (Utah array) during a motor task. Number of states were chosen heuristically. This is an unsupervised method that automatically captures the structure of the dynamics. The results suggest that neurons tend to be in a linear dynamical state both when waiting for the go-cue, and during early movement, and goes through nonlinear dynamical transitions in between.
Inferring spike-timing-dependent plasticity from spike train data
Ian H. Stevenson, Konrad P. Kording
Different synapses have different form of STDP, and while spike train data are abundant, in vivo whole cell recordings are very difficult. Hence, learning the synaptic plasticity rule from just spike train observation is of great importance. This is one of my long-term goals as well. They fit a unidirectionally coupled GLM model with a binned weight modulation function as a function of timing to previous presynaptic spike. The results are promising for simulated models. I’d love to see it applied to a well controlled real data.
Active dendrites: adaptation to spike-based communication
Balázs B Ujfalussy, Máté Lengyel
In the presence of correlated presynaptic population activity, to compute a function of presynaptic voltage online from spikes, the neuron has to be nonlinear. In particular, this paper links it to the nonlinear summation property of the dendrite. In previous work by Pfister, J., Dayan, P., Lengyel, M. (2010), they explained the role of short-term plasticity (dynamical synapse model) as optimal predictor for presynaptic membrane potential for a single neuron. This work expands it to the population case.
From stochastic nonlinear integrate-and-fire to generalized linear models
Skander Mensi, Richard Naud, Wulfram Gersnter
This poster shows that given a stochastic (adaptive-exponential) leaky-integrate-and-fire-neuron model, it is possible to construct a nearly equivalent GLM model (as a form of spike response model (SRM) with escape noise). Sub-threshold dynamics is linearized to provide the linear filter (corresponding to impulse response) and the reset/refractoriness part of the history filter, while the spike-adaptation is captured as a slower time scale component of the history filter. Then the link function can be estimated through empirical observation that is close to being linear. (I was totally thrown off by the notation which was the probability of spiking given a membrane potential, not the marginal distribution of voltage distribution of the model.)
Gaussian process modulated renewal processes
Vinayak Rao, Yee Whye Teh
This is an extension of R. P. Adams, I. Murray and D. J.C. MacKay’s work which was on Poisson intensity estimation to hazard rate modulated renewal process. Basic ideas are similar; use a sigmoidal link function, and use point process thinning like procedure to exactly sample.
Learning in Hilbert vs. Banach spaces: A measure embedding viewpoint
Bharath K. Sriperumbudur, Kenji Fukumizu, Gert R. G. Lanckriet
Kernel embedding of probability distribution and induced divergence is an emerging direction of kernel methods. The divergence is related to Bayes risk of Parzen window classifier in particular, and this paper extends the results to Banach spaces. For a Banach space with a norm that is uniformly Fretchet differentiable, and uniformly convex, there is a semi-inner product inducing an reproducing kernel Banach space (RKBS) which has analogous properties to RKHS. They showed that kernel embedding is injective when the kernel is a Fourier transform of a signed measure (c.f. Bochner’s theorem requires a positive measure for positive definiteness). The resulting divergence is not computable, unless the semi-metric is of special form, and the convergence rate turns out to be at best same as the RKHS case.
Modelling genetic variations with fragmentation-coagulation processes
Yee Whye Teh, Charles Blundell, Lloyd T. Elliott
Similar to Chinese restaurant process (CRP) for clustering, a temporal evolution of clusters by fragmentation (breaking a table into two tables) and coagulation (merging two tables) can be described as a Fragmentation-Coagulation Process (FCP). They show that FCP is exchangeable, reversible, and has asymptotic distribution of CRP.
Priors over recurrent continuous time processes [code]
Ardavan Saeedi, Alexandre Bouchard-Cŏté
This paper received the best student paper award this year, and Ardavan is only a masters student! The problem he is interested in is a discrete latent state dependent continuous time series with partial observation process. For example, a recurrent disease with coarsely quantified states. He introduces the Gamma-exponential process, where an infinite Markovian transition rate matrix prior is given, extends to hierarchical case, and shows how to do inference.
Kernel Beta process
Lu Ren, Yingjian Wang, David Dunson, Lawrence Carin (none of the authors made it to the conference)
Beta process is a distribution over discrete random measures where each “stick” is in , but does not sum to 1 as Dirichlet process (DP) does. In this paper, they smooth the sticks in relation to covariates through a kernel, such that their heights are correlated. Kernel here does not have to be positive definte, but only bounded positive functions (like pdf’s).
I’m curious if a similar approach can be taken for DP. This was originally done in similar fashion for DP by Dunson and Park (2008) (‘kernel stick breaking process’).
Sparse estimation with structured dictionaries
Given an ill-posed problem , where the dictionary , and observation is known, under sparsity assumption this can be solved with regularization, when is incoherent (roughly independent columns). However, when the dictionary is more structured, it can cause problems. This paper alleviates this problem by transforming the sparse variables which effectively re-normalizes them. It turns out the solution is similar to iteratively reweighted with a different penality. [Workshop version recording]
Sequence learning with hidden units in spiking neural networks
Johanni Brea, Walter Sen, Jean-Pascal Pfister
Given a point process, the problem is to train a spiking neural network composed of GLM units (including hidden units) that would generate the training patterns. Minimization of KL-divergence between the given point process, and the one parameterized by GLM is done by online gradient descent. The gradient requires marginalization over the spikes of the hidden units: , so they developed an importance sampling scheme where the samples from hidden units are obtained given the training spikes. The resulting training rule is Hebbian, and analogous to STDP. The results are shown when the given distribution is a delta, that is, when the network has to produce exactly one pattern, and that pattern only.
I presented Bayesian Spike Triggered Covariance analysis as a poster:
Empirical models of spiking in neural populations
Jakob H. Macke, Lars Büsing, John P. Cunningham, Byron M. Yu, Krishna V. Shenoy, Maneesh Sahani
A comparison study between coupled GLM model and latent variable model (Poisson linear dynamical system) to fit the motor cortex observations (preparation phase only). While GLM explicitly allows only coupled input between the output of the population spiking history, the latent variable model allowed a low dimensional hidden common input source with linear dynamics. They show that the latent variable model fits better and could reconstruct the cross-correlations while the GLM couldn’t. There were quite a bit of discussions on the floor after the oral presentation. The difference in performance was probably due to (1) relatively large bin size (10 ms), (2) neurons were recorded by Utah array which means low probability of direct connectivity. The coupled GLM was successfully applied to retina where the coupling is local, and the sampling of the neurons were very high with 0.1 ms bin size. It would be interesting to see further developments of latent variable models and GLMs in modeling such motor system data.
Hierarchical algorithms for χ-armed bandits
This was a non-Bayesian invited talk for the Bayesian optimization, experimental design and bandits workshop. He talked about his paper for the main conference “Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness”. In this case, smoothness assumption comes from , where is a semi-metric, and hence the function is bounded from below around the global maximum by the semi-metric. Then using a hierarchical partitioning of the input space that respects the semi-metric, one can get a bound of the function. (This assumption is not weaker nor stronger than Lipschitz continuity, since the absolute value is missing and it is only from the maximum.) When the knowledge of semi-metric is perfect, the convergence rate of the simple regret (best function value) can be exponential (depending on the semi-metric; multiple semi-metrics can give the bound but the convergence rate can differ). When the semi-metric is unknown, and one overestimates the exponent, for example, global convergence is not guaranteed.
When parallel experiments are possible, experimental design with batch sampling can improve the efficiency, but sequential design often performs better than batch design. Under the assumption that the maximum of the function has a known bound, and using the GP predictive covariance, they choose a set of points that are loosely independent, and could improve the criterion.
Future information minimization as PAC Bayes regularization in Reinforcement Learning
This was the last invited talk for the New frontiers in model order selection workshop. Tishby talked about reinforcement learning in a POMDP setup, but I couldn’t fully follow (in fact it went over my head mostly). In a perception-action cycle, the Bellman equation describes the world evolution and associated reward, and he describes a counter part for the agent (mental state?) using an associated Bellman equation with information-to-go (mutual information with respect to a goal). Then he describes reinforcement learning as a coding problem (relating to Kraft’s inequality, which says subtree of an optimal coding tree is an optimal coding tree). At some point, he reaches PAC-Bayesian bound, and claims that reinforcement learning self-regularizes.
This was the first invited talk for the Philosophy and machine learning workshop. He talked about a broad range of philosophers (of science) and a couple of examples of interaction between ML. The first example was Karl Popper‘s idea of complexity of theory in terms of falsifiable dimensions and its similarity to VC dimension (see their paper in 2009 for details). The second example was Judea Pearl’s use of counterfactual (by David Lewis), and its impact on philosophy of science. He talked about what kinds of sciences can be benefited from ML, certainly the ones with lots of data. He also went through many philosopher’s ideas including: Popper, Carnap, Kuhn and Lakatos, Feyerabend. It is certainly a very fascinating area, but my impression was that we don’t have much to talk about yet.