Evan and I wrote a summary of the COSYNE 2014 workshop we organized!
Originally posted on Scalable models for high-dimensional neural data:
[ This blog post is collaboratively written by Evan and Memming ]
The Scalable Models workshop was a remarkable success! It attracted a huge crowd from the wee morning hours till the 7:30 pm close of the day. We attracted so much attention that we had to relocate from our original (tiny) allotted room (Superior A) to a (huge) lobby area (Golden Cliff). The talks offered both philosophical perspectives and methodological aspects, reflecting diverse viewpoints and approaches to high-dimensional neural data. Many of the discussions continued the next day in our sister workshop. Here we summarize each talk:
Konrad Körding – Big datasets of spike data: why it is coming and why it is useful
Shannon’s entropy is the fundamental building block of information theory – a theory of communication, compression, and randomness. Entropy has a very simple definition, , where is the probability of i-th symbol. However, estimating entropy from observations is surprisingly difficult, and is still an active area of research. Typically, one does not have enough samples compared to the number of possible symbols (so called “undersampled regime”), there’s no unbiased estimator [Paninski 2003], and the convergence rate of a consistent estimator could be arbitrarily slow [Antos and Kontoyiannis, 2001]. There are many estimators that aim to overcome these difficulties to some degree. Deciding which estimator to use can be overwhelming, so here’s my recommendation in the form of a flow chart:
Let me explain one by one. First of all, if you have continuous (analogue) observation, read the title of this post. CDM, PYM, DPM, NSB are Bayesian estimators, meaning that they have explicit probabilistic assumptions. Those estimators provide posterior distributions or credible intervals as well as point estimates of entropy. Note that the assumptions made by these estimators do not have to be valid to make them good entropy estimators. In fact, even if they are in the wrong class, these estimators are consistent, and often give reasonable answers even in the undersampled regime.
Nemenman-Shafee-Bialek (NSB) uses a mixture of Dirichlet prior to have an approximately uninformative implied prior on entropy. This reduces the bias of estimator significantly for the undersampled regime, because a priori, it could have any entropy.
Centered Dirichlet mixture (CDM) is a Bayesian estimator with a special prior designed for binary observations. It comes in two flavors depending if your observation is close to independent (DBer) or the total number of 1′s is a good summary statistic (DSyn).
Pitman-Yor mixture (PYM) and Dirichlet process mixture (DPM) are for infinite or unknown number of symbols. In many cases, natural data have a vast number of possible symbols, as in the case of species samples or language, and have power-law (or scale-free) distributions. Power-law tails can hide a lot of entropy in their tails, in which case PYM is recommended. If you expect an exponentially decaying tail probabilities when sorted, then DPM is appropriate. See my previous post for more.
Non-Bayesian estimators come in many different flavors:
Best upper bound (BUB) estimator is a bias correction method which bounds the maximum error in entropy estimation.
James-Stein (JS) estimator regularizes entropy by estimating a mixture of uniform distribution and the empirical histogram with the James-Stein shrinkage. The main advantage of JS is that it also produces an estimate of the distribution.
Unseen estimator uses a Poissonization of fingerprint and linear programming to find the likely underlying fingerprint, and use the entropy as an estimate.
Other notable estimators include (1) a bias correction method by Panzeri & Travis (1995) which has been popular for a long time, (2) Grassberger estimator, and (3) asymptotic expansion of NSB that only works in extremely undersampled regime and is inconsistent [Nemenman 2011]. These methods are faster than the others, if you need speed.
There are many software packages available out there. Our estimators CDMentropy and PYMentropy are implemented for MATLAB with BSD license (by now you surely noticed that this is a shameless self-promotion!). For R, some of these estimators are implemented in a package called entropy (in CRAN; written by the authors of JS estimator). There’s also a python package called pyentropy. Targeting a more neuroscience specific audience, Spike Train Analysis Toolkit contains a few of estimators implemented in MATLAB/C.
- A. Antos and I. Kontoyiannis. Convergence properties of functional estimates for discrete distributions. Random Structures & Algorithms, 19(3-4):163–193, 2001.
- E. Archer*, I. M. Park*, and J. Pillow. Bayesian estimation of discrete entropy with mixtures of stick-breaking priors. In P. Bartlett, F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 2024–2032. MIT Press, Cambridge, MA, 2012. [PYMentropy]
- E. Archer*, I. M. Park*, J. Pillow. Bayesian Entropy Estimation for Countable Discrete Distributions. arXiv:1302.0328, 2013. [PYMentropy]
- E. Archer, I. M. Park, and J. Pillow. Bayesian entropy estimation for binary spike train data using parametric prior knowledge. In C.J.C. Burges and L. Bottou and M. Welling and Z. Ghahramani and K.Q. Weinberger}, editors, Advances in Neural Information Processing Systems 26, 2013. [CDMentropy]
- A. Chao and T. Shen. Nonparametric estimation of Shannon’s index of diversity when there are unseen species in sample. Environmental and Ecological Statistics, 10(4):429–443, 2003. [CAE]
- P. Grassberger. Estimating the information content of symbol sequences and efficient codes. Information Theory, IEEE Transactions on, 35(3):669–675, 1989.
- J. Hausser and K. Strimmer. Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks. The Journal of Machine Learning Research, 10:1469–1484, 2009. [JS]
- I. Nemenman. Coincidences and estimation of entropies of random variables with large cardinalities. Entropy, 13(12):2013–2023, 2011. [Asymptotic NSB]
- I. Nemenman, F. Shafee, and W. Bialek. Entropy and inference, revisited. In Advances in Neural Information Processing Systems 14, pages 471–478. MIT Press, Cambridge, MA, 2002. [NSB]
- I. Nemenman, W. Bialek, and R. Van Steveninck. Entropy and information in neural spike trains: Progress on the sampling problem. Physical Review E, 69(5):056111, 2004. [NSB]
- L. Paninski. Estimation of entropy and mutual information. Neural Computation, 15:1191–1253, 2003. [BUB]
- S. Panzeri and A. Treves. Analytical estimates of limited sampling biases in different information measures. Network: Computation in Neural Systems, 7:87–107, 1996.
- P. Valiant and G. Valiant. Estimating the Unseen: Improved Estimators for Entropy and other Properties. In Advances in Neural Information Processing Systems 26, pp. 2157-2165, 2013. [UNSEEN]
- V. Q. Vu, B. Yu, and R. E. Kass. Coverage-adjusted entropy estimation. Statistics in medicine, 26 (21):4039–4060, 2007. [CAE]
This year, NIPS (Neural Information Processing Systems) had a record registration of 1900+ (it has been growing over the years) with 25% acceptance rate. This year, most of the reviews and rebuttals are also available online. I was one of the many who were live tweeting via #NIPS2013 throughout the main meeting and workshops.
Compared to previous years, it seemed like there were less machine learning in the invited/keynote talks. Also I noticed more industrial engagements (Zuckerberg from facebook was here (also this), and so was the amazon drone) as well as increasing interest in neuroscience. My subjective list of trendy topics of the meeting are low-dimension, deep learning (and drop out), graphical model, theoretical neuroscience, computational neuroscience, big data, online learning, one-shot learning, calcium imaging. Next year, NIPS will be at Montreal, Canada.
I presented 3 papers in the main meeting (hence missed the first two days of poster session), and attended 2 workshops (High-Dimensional Statistical Inference in the Brain, Acquiring and analyzing the activity of large neural ensembles; Terry Sejnowski gave the first talk in both). Following are the talks/posters/papers that I found interesting as a computational neuroscientist / machine learning enthusiast.
He described how theoretical quantities in reinforcement learning such as TD-error correlate with neuromodulators such as dopamine. Then he went on to Q (max) and SARSA (mean) learning rules. The third point of the talk was the difference between model-based vs model-free reinforcement learning. Model-based learning can use how the world (state) is organized and plan accordingly, while model-free learns values associated with each state. Human fMRI evidence shows an interesting mixture of model-based and model-free learning.
A Memory Frontier for Complex Synapses
Subhaneil Lahiri, Surya Ganguli
Despite its molecular complexity, most systems level neural models describe it as a scalar valued strength. Biophysical evidence suggests discrete states within the synapse and discrete levels of synaptic strength, which is troublesome because memory will be quickly overwritten for discrete/binary-valued synapses. Surya talked about how to maximize memory capacity (measured as area under the SNR over time) with synapses with hidden states over all possible Markovian models. Using the first-passage time, they ordered states, and derived an upper bound. Area is bounded by where M and N denote number of internal states per synapse and synapses, respectively. Therefore, less synapses with more internal state is better for longer memory.
A theory of neural dimensionality, dynamics and measurement: the neuroscientist and the single neuron (workshop)
Several recent studies showed low-dimensional state-space of trial-averaged population activities (e.g., Churchland et al. 2012, Mante et al 2013). Surya asks what would happen to the PCA analysis of neural trajectories if we record from 1 billion neurons? He defines the participation ratio as a measure of dimensionality, and through a series of clever upper bounds, estimates the dimensionality of neural state-space that would capture 95% of the variance given task complexity. In addition, assuming incoherence (mixed or complex tuning), neural measurements can be seen as random projections of the high-dimensional space; along with low-dimensional dynamics, the data recovers the correct true dimension. He claims that in the current task designs, the neural state-space is limited by task-complexity, and we would not see higher dimensions as we increase the number of simultaneously observed neurons.
Distributions of high-dimensional network states as knowledge base for networks of spiking neurons in the brain (workshop)
In a series of papers (Büsing et al. 2011, Pecevski et al. 2011, Habenschuss et al. 2013), Maass showed how noisy spiking neural networks can perform probabilistic inferences via sampling. From Boltzmann machines (maximum entropy models) to constraint satisfaction problems (e.g. Sudoku), noisy SNN’s can be designed to sample from the posterior, and converges exponentially fast from any initial state. This is done by irreversible MCMC sampling of the neurons, and it can be generalized to continuous time and state space.
Epigenetics in Cortex (workshop)
Using an animal model of schizophrenia using ketamine that shows similar decreased gamma-band activity in the prefrontal cortex, and decrease in PV+ inhibitory neurons, it is known that Aza and Zeb (DNA methylation inhibitors) prevents this effect of ketamine. Furthermore, in Lister 2013, they showed a special type of DNA methylation (mCH) in the brain grows over the lifespan, coincides with synaptogenesis, and regulates gene expressions.
Optimal Neural Population Codes for High-dimensional Stimulus Variables
Zhuo Wang, Alan Stocker, Daniel Lee
They extend previous year’s paper to high-dimensions.
What can slice physiology tell us about inferring functional connectivity from spikes? (workshop)
Our ability to infer functional connectivity among neurons is limited by data. Using current-injection, he investigated exactly how much data is required for detecting synapses of various strength under the generalized linear model (GLM). He showed interesting scaling plots both in terms of (square root of) firing rate and (inverse) amplitude of the post-synaptic current.
Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream (main)
Mechanisms Underlying visual object recognition: Humans vs. Neurons vs. machines (tutorial)
Daniel L. Yamins*, Ha Hong*, Charles Cadieu, James J. DiCarlo
They built a model that can predict (average) activity of V4 and IT neurons in response to objects. Current computer vision methods do not perform well under high variability induced by transformation, rotation, and etc, while IT neuron response seems to be quite invariant to them. By optimizing a collection of convolutional deep networks with different hyperparameter (structural parameter) regimes and combining them, they showed that they can predict the average IT (and V4) responds reasonably well.
Instead of maximizing mutual information between the features and target variable for dimensionality reduction, they propose to minimize the dependence between the non-feature space and the joint of target variable and feature space. As a dependence measure, they use HSIC (Hilbert-Schmidt independence criterion: squared distance between joint and the product of marginals embedded in the Hilbert space). The optimization problem is non-convex, and to determine the dimension of the feature space, a series of hypothesis testing is necessary.
Dimensionality, dynamics and (de)synchronisation in the auditory cortex (workshop)
Maneesh compared the underlying latent dynamical systems fit from synchronized state (drowsy/inattentive/urethane/ketamine/xylazine) and desyncrhonized state (awake/attentive/urethane+stimulus/fentany/medtomidine/midazolam). From the population response, he fit a 4 dimensional linear dynamical system, then transformed the dynamics matrix into a “true Schur form” such that 2 pairs of 2D dynamics could be visualized. He showed that the dynamics fit from either state were actually very similar.
Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions (main)
Extracting information from calcium imaging data (workshop)
Eftychios A. Pnevmatikakis, Liam Paninski
Eftychios have been developing various methods to infer spike trains from calcium image movies. He showed a compressive sensing framework for spiking activity can be inferred. A plausible implementation can use a digital micromirror device that can produce “random” binary patterns of pixels to project the activity.
Andreas Tolias (workshop talk)
Noise correlations in the brain are small (0.01 range; e.g., Renart et al. 2010). Anesthetized animals have higher firing rate and higher noise correlation (0.06 range). He showed how latent variable model (GPFA) can be used to decompose the noise correlation into that of the latent and the rest. Using 3D acousto-optical deflectors (AOD), he is observing 500 neurons simultaneously. He (and Dimitri Yatsenko) used latent-variable graphical lasso to enforce a sparse inverse covariance matrix, and found that the estimate is more accurate and very different from raw noise correlation estimates.
Whole-brain functional imaging and motor learning in the larval zebrafish (workshop)
Using light-sheet microscopy, he imaged the calcium activity of 80,000 neurons simultaneously (~80% of all the neurons) at 1-2 Hz sampling frequency (Ahrens et al. 2013). From the big data while the fish was stimulated with visually, Jeremy Freeman and Misha analyzed the dynamics (with PCA) and orienting stimuli tuning, and make very cool 3D visualizations.
Normative models and identification of nonlinear neural representations (workshop)
In the first half of his talk, Matthias talked about probabilistic models of natural images (Theis et al. 2012) which I didn’t understand very well. In the later half, he talked about an extension of the GQM (generalized quadratic model) called STM (spike-triggered mixture). The model is a GQM with quadratic term , if the spike-triggered and non-spike-triggered distributions are Gaussian with covariances and . When both distributions are allowed to be mixture-of-Gaussians, then it turns out the nonlinear function becomes a soft-max of quadratic terms making it an LNLN model. [code on github]
Inferring neural population dynamics from multiple partial recordings of the same neural circuit
Srini Turaga, Lars Buesing, Adam M. Packer, Henry Dalgleish, Noah Pettit, Michael Hausser, Jakob Macke
Under certain observability conditions, they stitch together partially overlapping neural recordings to recover the joint covariance matrix. We read this paper earlier in UT Austin computational neuroscience journal club.
Using “Poissonization” of the fingerprint (a.k.a. Zipf plot, count histogram, pattern, hist-hist, collision statistics, etc.), they find a simplest distribution such that the expected fingerprint is close to the observed fingerprint. This is done by first splitting the histogram into “easy” part (many observations; more than square root # of observations) and “hard” part, then applying two linear programs to the hard part to optimize the (scaled) distance and support. The algorithm “UNSEEN” has a free parameter that controls the error tolerance. Their theorem states that the total variations is bounded by with only samples where n denotes the support size. The resulting estimate of the fingerprint can be used to estimate entropy, unseen probability mass, support, and total variations. (code in appendix)
A simple example of Dirichlet process mixture inconsistency for the number of components
Jeffrey W. Miller, Matthew T. Harrison
They already showed that the number of clusters inferred from DP mixture model is inconsistent (at ICERM workshop 2012, and last year’s NIPS workshop). In this paper they show theoretical examples, one of which says: If the true distribution is a normal distribution, then the probability that # of components inferred by DPM (with ) is 1 goes to zero, as a function of # of samples.
A Kernel Test for Three-Variable Interactions
Dino Sejdinovic, Arthur Gretton, Wicher Bergsma
To detect a 3-way interaction which has a ‘V’-structure, they made a kernelized version of the Lancaster interaction measure. Unfortunately, Lancaster interaction measure is incorrect for 4+ variables, and the correct version becomes very complicated very quickly.
B-test: A Non-parametric, Low Variance Kernel Two-sample Test
Wojciech Zaremba, Arthur Gretton, Matthew Blaschko
This work brings both test power and computational speed (Gretton et al. 2012) to MMD by using a blocked estimator, making it more practical.
Robust Spatial Filtering with Beta Divergence
Wojciech Samek, Duncan Blythe, Klaus-Robert Müller, Motoaki Kawanabe
Supervised dimensionality reduction technique. Connection between generalized eigenvalue problem and KL-divergence, generalization to beta-divergence to gain robustness to outlier in the data.
Optimizing Instructional Policies
Robert Lindsey, Michael Mozer, William J. Huggins, Harold Pashler
This paper presents a meta-active-learning problem where active learning is used to find the best policy to teach a system (e.g., human). This is related to curriculum learning, where examples are fed to the machine learning algorithm in a specially designed order (e.g., easy to hard). This gave me ideas to enhance Eleksius!
Reconciling priors” & “priors” without prejudice?
Remi Gribonval, Pierre Machart
This paper connects the Bayesian least squares (MMSE) estimation and MAP estimation under Gaussian likelihood. Their theorem shows that MMSE estimate with some prior is also a MAP estimate under some other prior (or equivalently, a regularized least squares).
Computational NeuroScience (CNS) conference is held annually alternating in America and Europe. This year it was held in Paris, next year is Québec City, Canada. There are more theoretical and simulation based studies, compared to experimental studies. Among the experimental studies, there were a lot of oscillation and synchrony related subjects.
Disclaimer: I was occupied with several things, so I was not 100% attending the conference, so my selection is heavily biased. These notes are primarily for my future reference.
Simon Laughlin. The influence of metabolic energy on neural computation (keynote)
There are three main categories of energy cost in the brain: (1) maintenance, (2) spike generation, and (3) synapse. Assuming a finite energy budget for the brain, the optimal efficient coding strategy can vary from small number of neurons with high rate to large population with sparse coding [see Fig 3, Laughlin 2001]. Variation of cost ratios across animals may be associated with different coding strategies to optimize for energy/bits. He illustrated the balance through various laws of diminishing return plots. He emphasized reverse engineering the brain, and concluded with the 10 principles of neural design (transcribed from the slides thanks to the photo by @neuroflips):
(1) save on wire, (2) make components irreducibly small, (3) send only what is needed, (4) send at the lowest rate, (5) sparsify, (6) compute directly with analogue primitives, (7) mix analogue and digital, (8) adapt, match and learn, (9) complexify (elaborate to specialize), (10) compute with chemistry??????. (question marks are from the original slide)
Sophie Denev. Rescuing the spike (keynote)
She proposed that the observation of high trial-to-trial variability in spike trains from single neurons is due to degeneracy in the population encoding. There are many ways the presynaptic population can evoke similar membrane potential fluctuations of a linear readout neuron, and hence she claims that through precisely controlled lateral inhibition, the neural code is precise in the population level, but seems variable if we only observe a single neuron. She briefly mentioned how a linear dynamical system might be implemented in such a coding system, but it seemed limited as to what kind of computations can be achieved.
There were several noise correlation (joint variability in the population activity) related talks:
Joel Zylberberg et al. Consistency requirements determine optimal noise correlations in neural populations
The “sign rule” says that if the signal correlation is opposite of the noise correlation, linear Fisher information (and OLE performance) is improved (see Fig 1, Averbeck, Latham, Pouget 2006). They showed a theorem confirming the sign rule in general setup, and furthermore showed the optimal noise correlation does NOT necessarily obey the sign rule (see Hu, Zylberberg, Shea-Brown 2013). Experiments from the retina does not obey the sign rule; noise correlation is positive even for cells tuned to the same direction, however, it is still near optimal according to their theory.
Federico Carnevale et al. The role of neural correlations in a decision-making task
During a vibration detection task, cross-correlations among neurons in the premotor cortex (in a 250 ms window) were shown to be dependent on behavior (see Carnevale et al. 2012). Federico told me that there were no sharp peaks in the cross-correlation. He further extrapolated the choice probability to the network level based on multivariate Gaussian approximation, and a simplification to categorize neurons into two classes (transient or sustained response).
Alex Pouget and Peter Latham each gave talks in the Functional role of correlations workshop.
Both were on Fisher information and effect of noise correlations. Pouget’s talk was focused on “differential correlation” which is the noise in the direction of the manifold that tuning curves encode information (noise that looks like signal). Peter talked about why there are so many neurons in the brain with linear Fisher information and additive noise (but I forgot the details!)
On the first day of the workshop, I participated in the New approaches to spike train analysis and neuronal coding workshop organized by Conor Houghton and Thomas Kreuz.
Florian Mormann. Measuring spike-field coherence and spike train synchrony
He emphasized on using nonparametric statistics for testing circular variable of interest: the phase of LFP oscillation conditioned on spike timings. In the second part, he talked about spike-distance (see Kreuz 2012) which is a smooth, time scale invariant measure of instantaneous synchrony among spike trains.
Rodrigo Quian Quiroga. Extracting information in time patterns and correlations with wavelets
Using Haar wavelet time bins as the feature space, he proposed scale free linear analysis of spike trains. In addition, he proposed discovering relevant temporal structure through a feature selection using mutual information. The method doesn’t seem to be able to find higher order interactions between time bins.
Ralph Andrzejak. Detecting directional couplings between spiking signals and time-continuous signals
Using distance based directional coupling analysis (see Chicharro, Andrzejak 2009; Andrzejak, Kreuz 2011), he showed that it is possible to find unidirectional coupling between continuous signals and spike trains via spike train distances. He mentioned the possibility of using spectral Granger causality for a similar purpose.
Adrià Tauste Campo. Estimation of directed information between simultaneous spike trains in decision making
Bayesian conditional information estimation through the use of context-tree weighting was used to infer directional information (analogous to Granger causality, but with mutual information). A compact Markovian structure is learned for binary time series.
I presented a poster on Bayesian entropy estimation in the main meeting, and gave a talk about nonparametric (kernel) methods for spike trains in the workshop.
Last Sunday (April 28th, 2013) was the 8th Black board day (BBD), which is a small informal workshop I organize every year. It started 8 years ago on my hero Kurt Gödel‘s 100th birthday. This year, I found out that April 30th (1916) is Claud Shannon‘s birthday so I decided the theme would be his information theory.
Andrew Tan: Holographic entanglement entropy
Andrew wanted to connect how space-time structure can be derived from holographic entanglement entropy, and furthermore to link it to graphical models such as the restricted Boltzmann machine. He gave overviews of quantum mechanics (deterministic linear dynamics of the quantum states), density matrix, von Neumann entropy, and entanglement entropy (entropy of a reduced density matrix, where we assume partial observation and marginalization over the rest). Then, he talked about the asymptotic behaviors of entropy for the ground state and critical regime, and introduced a parameterized form of Hamiltonian that gives rise to a specific dependence structure in space-time, and sketched what the dimension of boundary and area of the dependence structure are. Unfortunately, we did not have enough time to finish what he wanted to tell us (see Swingle 2012 for details).
Jonathan Pillow: Information Schminformation
Information theory is widely applied to neuroscience and sometimes to machine learning. Jonathan sympathized with Shannon’s note (1956) called “the bandwagon”, criticized the possible abuse/overselling of information theory. First, Jonathan focused on the derivation of a “universal” rate-distortion theory based on the “information bottleneck principle”. Then, he continued with his recent ideas in optimal neural codes under different Bayesian distortion functions. He showed a multiple-choice exam example where maximizing mutual information can be worse, and a linear neural coding example for different cost functions.
- C. E. Shannon. A Mathematical Theory of Communication. Bell System Technical Journal 27 (3): 379–423. 1948
- E. T. Jaynes. Information Theory and Statistical Mechanics. Physical Review Online Archive (Prola), Vol. 106, No. 4. (15 May 1957), pp. 620-630
- Brian Swingle. Entanglement renormalization and holography. Physical Review D, Vol. 86 (Sep 2012), 065007
- MIT Open CourseWear lectures: Statistical Mechanics I: Statistical Mechanics of Particles, Statistical Mechanics II: Statistical Physics of Fields (recommended by Andrew Tan)
- C. E. Shannon. The Bandwagon. IRE Transactions on Information Theory, 1956
- N. Tishby, F. Pereira, W. Bialek. The Information Bottleneck Method. In Proceedings of the 37-th Annual Allerton Conference on Communication, Control and Computing (1999), pp. 368-377
Feb 28–Mar 5 was the 10th COSYNE meeting, and my 6th participation. Thanks to my wonderful collaborators, I had a total of 4 posters in the main meeting (Jonathan Pillow had 7 which was a tie for the most number of abstracts with Larry Abbott). Hence, I didn’t have a chance to sample enough posters for the first two nights (I also noticed a few presentations that overlapped NIPS 2012). I tried to be a bit more social this year; I organized a small (unofficial) Korean social (with the help of Kijung Yoon), a tweet-up, and enjoyed many social drinking nights. Following are my notes on what I found interesting.
Main meeting—Day 1
William Bialek. Are we asking the right questions?
Not all sensory information are equally important. Rather, Bialek claims that the information that can predict the future are the important bits. Since neurons only have access to the presynaptic neurons’ spiking pattern, this should be achieved by neural computation that predicts its own future patterns (presumably under some constraints to prevent trivial solutions). When such information is measured over time, at least in some neurons in the fly visual system, its decay is very slow: “Even a fly is not Markovian”. This indicates that the neuronal population state may be critical. (see Bialek, Nemenman, Tishby 2001)
Evan Archer, Il Memming Park, Jonathan W Pillow. Semi-parametric Bayesian entropy estimation for binary spike trains [see Evan's blog]
Jacob Yates, Il Memming Park, Lawrence Cormack, Jonathan W Pillow, Alexander Huk. Precise characterization of multiple LIP neurons in relation to stimulus and behavior
Jonathan W Pillow, Il Memming Park. Beyond Barlow: a Bayesian theory of efficient neural coding
Main meeting—Day 2
Eve Marder. The impact of degeneracy on system robustness
She stressed about how there could be multiple implementations of the same functionality, a property she refers to as degeneracy. Her story was centered around modeling the Lobster STG oscillation (side note: connectome is not enough to predict behavior). Since there are rapid decay and rebuilding of receptors and channels, there must be homeostatic mechanisms that constantly tune parameters for the vital oscillatory bursting in STG. There are multiple stable fixed points in the parameter space and single cell RNA quantification supports it.
Mark H Histed, John Maunsell. The cortical network can sum inputs linearly to guide behavioral decisions
Using optogenetics in a behaving mice, they tried to resolve the synchrony vs rate code debate. He showed that behaviorally, the population showed almost perfect integration to weak input, and not sensitive to synchrony. Hence, he claims that the brain may just well operate on linear population codes.
Arnulf Graf, Richard Andersen. Learning to infer eye movement plans from populations of intraparietal neurons
Spike trains from monkey area LIP were used for an “eye-movement intention” based brain—machine interface. During the brain–control period, LIP neurons changed their tuning. Decoding was done with a MAP decoder which was updated online through the trials. To encourage(?) the monkey, the brain–control period had different target distribution, and the decoder took this “behavioral history” or “prior” into account. Neurons with the lowest performance enhanced the most, demonstrating the ability of LIP neurons to swiftly change their firing pattern.
Il Memming Park, Evan Archer, Nicholas Priebe, Jonathan W Pillow. Got a moment or two? Neural models and linear dimensionality reduction
David Pfau Eftychios A. Pnevmatikakis Liam Paninski. Robust learning of low dimensional dynamics from large neural ensembles
Estimation of latent dynamics with arbitrary noise process is recovered from high dimensional spike train observation using low-rank optimization techniques (convex relaxation). Even spike history filter can be included by assuming low-rank matrix corrupted by sparse noise. Nice method that I look forward for its application to real data.
Main meeting—Day 3
Carlos Brody. Neural substrates of decision-making in the rat
Using the rats trained in a psychophysics factory on the Poisson click task, he showed that rats are noiseless integrators by fitting a detailed drift diffusion model with 8 (or 9?) parameters. From the model, he extracted detailed expected decision variable statistics related to activity in PPC and FOF (analogue of LIP and FEF in monkeys), which showed FOF is more threshold like, and PPC is integrator like in their firing rate representation. However, upon pharmacologically disabling either area, the rat psychophysics was not harmed, which indicates that the accumulation of sensory evidence is somewhere earlier in the information processing. (Jeffrey Erlich said it might be auditory cortex during the workshop.) [EDIT: Brody's science paper is out.]
N. Parga, F. Carnevale, V. de Lafuente, R. Romo. On the role of neural correlations in decision-making tasks
I had hard time understanding the speaker, but it was interesting to see how spike count correlation and Gaussian assumption for decision making could accurately predict the choice probability.
Jonathan Aljadeff, Ronen Segev, Michael J. Berry II, Tatyana O. Sharpee. Singular dimensions in spike triggered ensembles of correlated stimuli
Due to large concentration in the eigenvalues in stimulus covariance natural scenes, they show that spike triggered covariance analysis (using the difference between the raw STC and stimulus covariance) result contains a spurious component that corresponds to the largest eigenvalue. They claim this using random matrix theory, and proposed a correction by projecting out the spurious dimension before STC analysis, and surprisingly, they recover more dimensions with larger than surrogate eigenvalue. I wonder if a model based approach like GQM (or BSTC) would do a better job for those ill-conditioned stimulus distributions.
Gergo Orban, Pierre-Olivier Polack, Peyman Golshani, Mate Lengyel. Stimulus-dependence of membrane potential and spike count variability in V1 of behaving mice
It is well known that the Fano factor of spike trains is reduced when stimulus is given (e.g. Churchland et al. 2010). Gergo measured contrast dependent trial-to-trial variability of V1 membrane potentials in awake mice. By computing the statistics from 5 out of 6 cycles of repeated stimulus, he found that the variability is reduced as the contrast gets stronger. The spikes were clipped from the membrane potential for this analysis.
Jakob H Macke, Iain Murray, Peter Latham. How biased are maximum entropy models of neural population activity?
This was based on their NIPS 2011 paper with the same title. If you use maximum entropy model as an entropy estimator, like all entropy estimators, your estimate of entropy will be biased. They have an exact form of the bias which is inversely proportional to the number of samples, if the model class is right.
Ryan P Adams, Geoffrey Hinton, Richard Zemel. Unsupervised learning of latent spiking representations
By taking a limit of small bin size of an RBM, they built a point process model with continuous coupling with a hidden point process. The work seems to be still preliminary. They used Gaussian process to constrain the coupling to be smooth.
Main meeting—Day 4
D. Acuna, M. Berniker, H. Fernandes, K. Kording. An investigation of how prior beliefs influence decision–making under uncertainty in a 2AFC task.
Subjects performing optimal Bayesian inference could be using several different strategies to generate behavior from the posterior; sampling from the posterior vs MAP inference are compared. Different strategies predict the just-noticeable-difference (JND) as a function of prior uncertainty. However, they find that human subjects were consistent with MAP inference and not sampling.
Phillip N. Sabes. On the duality of motor cortex: movement representation and dynamical machine
He poses the question of whether activities in the motor cortex is a representation (tuning curve) of motor related variables or the motor cortex is just generating dynamics for motor output. Also, he says jPCA applied to feed-forward non-normal dynamics shows similar results to Churchland et al. 2012: not necessarily oscillating. He suggested that dynamics is the way to interpret them, but the neurons were after all also tuned in the end.
Randy Bruno. The neocortical circuit is two circuits. (Why so many layers and cell types? workshop)
By studying the thalamic input to the cortex in a sedated animal, he discovered that they synapse 80% to layer 4 and 20% to layer 5/6 (more 5 than 6; see Oberlaender et al. 2012). He blocked the L4 and above activity which did not change the L5/6 membrane potential response to whisker deflection. He suggested that L1/2/3/4 and L5/6 are two different circuits that can function independently.
Alex Huk. Temporal dynamics of sensorimotor integration in the primate dorsal stream (Neural mechanism for orienting decisions across the animal kingdom workshop)
Matteo Carandini. Adaptation to stimulus statistics in visual cortex (Priors in perception, decision-making and physiology workshop)
Matteo showed adaptations in LGN and V1 due to changes in the input statistics. For LGN, the position of stimulus was used which in turn shifted V1 receptive fields (V1 didn’t adapt, it just didn’t know about the adaptation in LGN). For V1, random full field orientation was used (as in Beniucci et al. 2009) but with a sudden change in distribution over orientations. The effect on V1 tuning could be explained by changes in gain of each neuron, and each stimulus orientation. This equalized the population (firing rate) response. [EDIT: this is published in Nat Neurosci 2013]
Eero Simoncelli. Implicit embedding of prior probabilities in optimally efficient neural populations (Priors in perception, decision-making and physiology workshop)
Eero presented an elegant theory (work with Deep Ganguli presented in NIPS 2010; Evan’s review) of optimal tuning curves given the prior distribution. He showed that 4 visual and 2 auditory neurophysiology and psychophysics can be explained well with it.
Albert Lee. Cellular mechanisms underlying spatially-tuned firing in the hippocampus (Dendritic computation in neural circuits workshop)
Among the place cells there are also silent neurons in CA1. Using impressive whole cell patch on CA1 cells in awake freely moving mice, he showed that not only they don’t spike, the silent cells do not have tuned membrane fluctuation. However, by injecting current into the cell so that it would have a higher membrane potential (closer to the threshold), they successfully activated the silent cell and and made them place cells (Lee, Lin and Lee 2012).
Marina Garrett. Functional and structural mapping of mouse visual cortical areas (A new chapter in the study of functional maps in visual cortex workshop)
She used intrinsic imaging to find continuous retinotopical maps. Using the gradient of the retinotopy, in combination with the eccentricity map, she defined boarders of visual areas. She defined 9(or 10?) areas surrounding V1 (Marshel et al. 2011). Several areas had temporal selectivity, while others had temporal selectivity, which are the hallmark of parietal and temporal pathways (dorsal and ventral in primates). She also found connectivity patterns which showed increasingly multi-modal for higher areas.
I highly recommend these beautifully written texts.
1. The Selfish Gene (1976) by Richard Dawkins [worldcat][amazon]
This was my introduction to the meme of Darwinism, and to memes themselves. I read it when I was 13 or 14 years old. My world view was fundamentally changed ever since.
2. The Emperor’s New Mind (1989) by Roger Penrose [worldcat][amazon]
I read this book when I was 18 years old. I liked the book so much that I bought many copies of this book and gave it to my friends as a present. It motivated me to study logic and computation theory as means to understand the mind. Although the core idea is flawed, the book overall brought me great joy of thinking about what human minds can do, and how they can do it.
3. The Myth of Sisyphus (1942) by Albert Camus [worldcat][amazon]
In times of despair, when I though I couldn’t understand this seemingly illogical world and frustrated by its complexity, this book talked to be dearly. I was 19 or 20 years old.
4. I am a Strange Loop (2007) by Douglas Hofstadter [worldcat][amazon]
Before this book, I was a pure reductionist (since I was little; my father is a physicist), trying to understand the world by going into the smaller scale of things. Now, I also think about what abstraction can bring to the table—understanding in a different, more humane level. I was in graduate school when it came out.