Correntropy RKHS and space of expectations

2009 November 16
by memming

Correntropy is a statistical quantity that captures nonlinear similarity between an indexed set of random variables, defined as V(i, j) = E[\kappa(X_i, X_j)], where X_i are the random variables, and \kappa(\cdot, \cdot) is a symmetric positive-semi-definite kernel [1]. Due to the reproducing kernel Hilbert space (RKHS) theory, any symmetric positive semi-definite kernel induces an RKHS, that is it can be used as an inner product in an extended space [2]. It is easy to show that correntropy is symmetric positive-definite, and hence it defines the so-called correntropy RKHS.

Each element in the correntropy RKHS is a functional that maps from an index of a random variable to a real value. In general, due to representer theorem, any functional in the space can be represented as, g(j) = \sum_i \alpha_i E[\kappa(X_i, X_j)], a linear combination of the mapped random variable vectors. When the random variables that constitute the space is rich enough, then the one can approximate expectations by estimating \alpha_i’s. However, unfortunately, this has little practical value, because the richness has to come from the joint of pairs of random variables. It is an interesting subject to ponder.

Note: correntropy RKHS should not be confused with the correntropy space which is an analogue of Parzen’s random process where the covariance is equivalent to an arbitrary Gaussian process.

  1. I. Santamaria, P. Pokharel, J. C. Príncipe. Generalized Correlation Function: Definition, Properties, and Application to Blind Equalization. IEEE Transactions on Signal Processing, Vol. 54, No. 6. (2006), pp. 2187-2197.
  2. Nachman Aronszajn. Theory of Reproducing Kernels. Transactions of the American Mathematical Society, Vol. 68, No. 3. (1950), pp. 337-404.

Hypothesis test on spike trains with IOCANE

2009 October 31
tags:
by memming
IOCANE logo represented as a balance

IOCANE project logo

IOCANE is an open source project I recently started. It is a set of MATLAB code that implements various divergence measures for spike train observation. Currently, I am focusing on multi-trial observation scenario where some external stimulus is given to the neural system, and trial-to-trial variability (noise) is observed. The question is whether we can tell if two sets of spike train observations are distinct. One can use summarizing statistics such as mean firing rate, PSTH, Fano factor, inter-spike interval distribution, etc and compare these. However, these methods implicitly assume a model, hence they are parametric.

I have recently developed a non-parametric divergence measure based on finite point process model (presented at SfN). The following table compares the performance in terms of statistical power (= 1- type II error given fixed type I error) on a hypothesis test to distinguish between two conditions. As you can see, for different artificial data sets I – IV, most methods don’t universally work; they are sensitive to what they are designed to be sensitive to. On the other hand, my proposed method (HL100 and HL10, the number denotes the smoothing bandwidth) are sensitive to all situation though sometimes it doesn’t perform the best.

Table of statistical power for 4 experiments

Non-parametric method is a generalist, while each of the parametric method is a specialist. One needs a generalist when the change is subtle or unknown. When large dataset with various possibility is given, a generalist would be a good choice to use in detecting changes. Given infinite number of samples, the proposed method is guaranteed to detect the change if there is a change. If you have a spike train dataset that defies standard analysis methods, why not try iocane? :D

P.S.1. I have updated iocane today to include the experiments that generated the above table.

P.S.2. Yes, iocane is the (fictional) toxin from princess bride. I was laughing so hard on that scene, that I had to have my next project named after it.

SfN 2009 Chicago

2009 October 26
by memming

Oct 17-21, 2009. Chicago was where I attended my first society for neuroscience conference. It was exciting to see over 32,000 people, all doing something related to neuroscience. It was diverse both in terms of research topics and quality of presentation. Many different kinds of people, some trying to hide their name tag, some trying to impress others, showed up. It was a great place to socialize and talk to others, although it was so big that I missed many posters and sessions that I wanted to listen to. Here’s my summary of the academic experiences.

Saturday

I hovered around the STDP session. I talked to Brent D. Doiron about how STDP in the dorsal cochlear nucleus (DCN) can maintain the temporal code (time to first spike) while changing the gain (threshold) [1]. It was interesting to see how the direct excitatory input and indirect inhibitory input gives rise to compound temporal response to the fusiform cells, and as the authors claim, specific firing pattern can change this response. One of the questions I had is how the higher order brain areas can induce STDP, since the post synaptic (fusiform) cells can be directly driven by the parallel fibers as well as the auditory nerve, creating timing patterns that induces STDP on those synapse could be challenging.

The presidential lecture was on abstract knowledge by Elizabeth S. Spelke. The ability of abstraction is one of the key qualities of intelligence that I would like to study in the future, so I was very excited. The talk was about studies of infants and animals using response time (RT) change when novel things are presented. Basically many animals have the ability to count small numbers up to 3 and tell the cardinality (ratio) difference of a set of objects for numbers. I was hoping she would talk how these abilities are obtained, and how they could be generalized, but perhaps I had too high expectations.

Sunday

I found a flavoprotein autofluorescence study in rat that showed single formant and complex formant activates different regions of the auditory cortex, and learning strengthened the response [2]. This suggests that the auditory system is specialized for different frequency structures at least in rats.

In retinal ganglion cells, the firing pattern is well described by the pairwise interaction and maximum entropy distributions (like Bialek’s group showed it in general). The poster by Andrea K. Barreiro and coworkers examined a simple model that could potentially have higher-order interaction between three neurons to see how well the theory holds [3,4,5]. Various input distributions were provided to see the deviation from pairwise maximum entropy model, and with the choice of bimodal input distribution, she could achieve significant higher-order correlation that is not explained by pairwise interactions. I thought this is of particular interest to my collaborator Sohan Seth.

Ted Berger’s lab had several interesting posters. Adjusted extracellular stimulation paradigm (using Volterra-Laguerre), unscented Kalman filter, weight analysis were nice things to see. (Posters FF117-FF121)

Monday

I had a poster presentation in the morning, but being the second author, I could still roam around. Chen presented an interesting poster of altering the population bursting activity of cultured neurons by using different ratio of inhibitory neurons (GABAnergics from striatum) and excitatory neurons (tagged and selected) [6]. They showed that when more inhibitory neurons are added, the burst structure enlongates, and become less periodic.

Mark Humphries’s poster on a new spike train clustering algorithm was very related to my research about spike train spaces in collaboration with Antonio Paiva [7].  Basically, he was applying the community finding algorithm by Newman [8] to the normalized spike train similarity measures, and using a heuristic he could find how many clusters should be found.

Tuesday

I was looking for some unsupervised learning work, and I found a poster on a computational model for unsupervised/supervised learning in the visual hierarchy, presented by Luis Bettencourt [9]. Using unsupervised Hebbian learning with feed-forward neural network comprised of S and C cells (simple and complex?), they investigated the number of images required for the classifier. I’d like to see extensions of the unsupervised part to spatio-temporal patterns.

In human learning nanosymposium, I was inspired by a talk by Pavao on online learning abstract concepts from examples [10]. They presented same class examples and different class examples and measured how well they learned. Each presentation of a pair is designed to provide one bit of information, because they would be different by a single feature. Interestingly, the subjects learned better from different class examples. Of course, in my mind I came up with another variant of Eleksius where the god presents such examples continuously.

Wednesday

I had my poster and a flight to catch, so I had to miss a lot the last day. I’d like to thank all the people that visited my poster. I’ve never had such a big crowd.

If you find any errors in this post, please let me know. Memming thanks the office of research and graduate studies and J. C. Pruitt family department of biomedical engineering for the partial support of his travel.

  1. B. D. Doiron, Y. Zhao, T. Tzounopoulos. Cell-specific synaptic plasticity modulates response threshold while preserving temporal coding in an auditory pathway. Program No. 41.3/C69. Society for Neuroscience, 2009
  2. M. Kudoh, G. Ogawa. Plastic changes in the anterior and dorsal auditory fields induced by discrimination learning of synthetic vowels in rats. Program No. 163.5/T3. Society for Neuroscience, 2009
  3. A. K. Barreiro, E. Shea-Brown, F. Rieke, J. Gjorgjieva. When are microcircuits well-modeled by pairwise interactions? Program No. 165.13/U29. Society for Neuroscience, 2009
  4. Jonathon Shlens, Greg Field, Jeff Gauthier, Matthew Grivich, Dumitru Petrusca, Alexander Sher, Alan Litke, E.J. Chichilnisky. The structure of multi-neuron firing patterns in primate retina. Journal of Neuroscience, 2006: 26, 8254-8266
  5. Jonathon Shlens, Greg Field , Jeff Gauthier, Martin Greschner, Alexander Sher, Alan Litke, E.J. Chichilnisky. The structure of large-scale synchronized firing in primate retina. Journal of Neuroscience, 2009: 29, 5022-5031
  6. X. Chen, R. Dzakpasu. Dynamical effects from altering the balance between excitation and inhibition in dissociated hippocampal networks. Program No. 322.7/D19. Society for Neuroscience, 2009
  7. M. D. Humphries. Spike-train communities: Finding groups of similar spike trains. Program No. 322.11/D23. Society for Neuroscience, 2009
  8. M. E. J. Newman, M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004)
  9. L. M. Bettencourt, S. Brumby, V. Gintautas, M. I. Ham, S. Barr, P. Loxley, K. Sanbonmatsu, S. Swaminarayan, J. George, G. Kenyon, I. Nemenman. Image categorization through large-scale hierarchical models of the primate visual cortex. Program No. 652.2/Y13. Society for Neuroscience, 2009
  10. R. Pavao, J. Savietto, J. Sato, G. Xavier, A.F. Helene. Modeling sequence learning with informational entropy. Program No. 503.6. Society for Neuroscience 2009

Significance test for spike trains based on finite point process estimation

2009 October 15

SfN2009_Significance_test_posterI have been working on developing methods to test if two sets of spike trains come from a same probability law (point process) or not. I am going to presentation of the idea in public for the first time in society for neuroscience 2009, Chicago. It’s going to be on Oct 21st morning session, poster# 789.1/GG119. If you have spike train like data but suffering from their noisiness for analysis in terms of change detection, you maybe interested in this one. Please come visit my poster!

I am also a second author of another poster, 349.9/V18 on Oct 19th morning. This one is about the population coding in olfactory primary sensory system. This time the model prediction is made on the single neuron recordings.

Renewal process and perfect integrate-and-fire model with random threshold

2009 October 13

The family of integrate-and-fire (IF, aka IAF) models is a popular class I point neuron modeling. A typical IF neuron integrates the input current, and when the membrane potential reaches the fixed threshold it generates an action potential and resets its membrane potential. Since membrane potential is the only dynamic variable, each inter-spike interval (ISI) is independent from each other. Hence IF neurons are generators for renewal processes. This link from IF to renewal process is not one-to-one. There could be many models that generate the same renewal process. In some special cases with stationary input and noise statistics, the analytical solution for the inter-spike interval distribution of an IF model can be obtained through solving the first passage time density [1].

If we consider perfect integrate-and-fire (PIF) neuron with random threshold, a stationary renewal process can be easily mapped in a unique manner. A perfect integrator can be represented as \frac{\mathrm{d}\phi}{\mathrm{d}t} = \omega. We normalize the scale such that the phase like variable \phi represents time, and the integration time becomes \omega = 1. When \phi reaches the threshold, an action potential generation is marked and \phi is reset to 0. Without any randomness, this is the simplest oscillator. Now, let X be the random variable for the inter-spike interval of a stationary random process of interest. One can think of the process as a noisy oscillator. If we reset the threshold of the PIF neuron to be drawn from X each time it fires, it is easy to show that the PIF behaves exactly as the renewal process. Hence, the PIF with random threshold and stationary renewal process are one-to-one mapped (inspired by [2]).

A major advantage of PIF model compared to the renewal model is that we can implement (potentially non-stationary) input dependence naturally. Since the \phi behaves as the phase variable of a noisy oscillator, we can use the phase response curve (PRC) formalism. A PRC Z(\phi) is defined as the phase change to a infinitesimal perturbation of the oscillator. Therefore the PIF with small input x(t) can be well approximated with, \frac{\mathrm{d}\phi}{\mathrm{d}t} = \omega + Z(\phi) x(t) (e.g. [3]).

This model has two parameters to estimate: X and Z(\phi). X can be estimated from ISI distribution of spontaneous activity. Once we have X, Z(\phi) can be estimated by the distortion of the firing time distribution given small perturbation at different phases. However, for large \phi, it is difficult to estimate this quantity, simply because of the rarity of events with longer interval in general. Hence, a parametric approach is preferable for estimating the PRC.

References:

  1. Lindner’s lecture 2009
  2. Lars Wolff, Benjamin Lindner. Method to calculate the moments of the membrane voltage in a model neuron driven by multiplicative filtered shot noise. Physical Review E (Statistical, Nonlinear, and Soft Matter Physics), Vol. 77, No. 4. (2008), 041913.doi:10.1103/PhysRevE.77.041913
  3. Roberto F. Galan, Nicolas Fourcaud-Trocme, G. Bard Ermentrout, Nathaniel N. Urban. Correlation-Induced Synchronization of Oscillations in Olfactory Bulb Neurons. J. Neurosci., Vol. 26, No. 14. (5 April 2006), pp. 3646-3655. doi:10.1523/JNEUROSCI.4605-05.2006

Computational Neuroscience course at BCCN-Göttingen 2009 (Part 1)

2009 October 2

Last week (Sep 21th – 25th, 2009), I participated a five day computational neuroscience course organized by the Bernstein center for computational neuroscience. It was the 7th fall CNS course they organized at the Max-Planck-Institute for Dynamics and Self-Organization, Göttingen, Germany. The topics were mostly focused on theoretical approaches to modeling, and in vivo spontaneous activities. Each lecturer spoke for 3 hours, then students had to study papers and present.

Day 1: Dr. Jason Kerr: Population imaging in vivo: from the awake to the anesthetized

Using in vivo calcium imaging techniques combined with fast two-photon imaging, he showed how biased we are in terms of observing the neural activities. The traditional extracellular recording lead to bias on high firing rate and well tunned neurons, because the experimentalist will look for such cells. The hypothesis of rate coding and population tuning are mostly based on these biased sampling of cortical neurons. Similarly, blind patch clamp tends to find neurons with bigger somata (pl. for soma). He presented a figure (Figure 9 from [1]) to show how naturally different cortical layers have different firing rate profile; in general layer 4 and 6 have higher rate. Imaging techniques can be better sample the population, and he showed a method to extract action potential timings in the time scale of 64~128 ms from calcium transient signal. (His group has overcome interesting engineering challenges). They have combined patch clamp and calcium imaging to obtain the ground truth and training data and applied to other neurons to obtain the population activity (See [2] for details of his method). He also showed miniature imaging backpack system mounted on rodents for semi-freely behaving animals (fiber scope). In the second part of his lecture, he showed analysis of barrel cortex imaging results. Interestingly, the correlation structure found in spontaneous activity and evoked activity were very similar, and no spatial pattern was consistently appearing (15 neurons time binned with 128 ms window).

The paper assigned to my group for presentation was about the silent neurons in the brain [3]. The paper was straightforward and I found myself mostly agreeing with the authors. Various evidence including the imaging results show that more than 60% of the neuronal population of the cortex has very low spontaneous firing rate and will be ignored by conventional extracellular recording techniques. Usually the spike sorting method requires multiple instance of the same neuron to fire sufficiently often so that they can form a cluster; this creates a bias to ignore low firing rate neurons. These neurons may firing highly selectively as in the case of auditory or higher visual cortices, but in general their role is not clear. These neurons indicate that temporal codes rather than rate codes maybe of high importance for understanding these neurons. This was somewhat surprising because of the presence of homeostatic gain control mechanisms and developmental apoptosis of neurons that do not fire (I don’t have references to these claims, if anybody knows please let me know). According to various arguments such as maximizing mutual information, and reducing the metabolic cost of the firing of the neuron lead to more sparse encoding of the input, hence the temporal code in higher order areas can be expected, but when I asked a question, Dr. Kerr gave me the example of high firing rate neurons in working memory tasks.

Day 2: Dr. Benjamin Lindner: Interspike interval statistics and response properties of neurons in the fluctuation driven regime

Dr. Lindner lectured on statistics and power spectral density for point processes, and diffusion approximation for integrate-and-fire family of neurons. Also he talked about renewal processes, and serial correlation. A few facts that I was not familiar with: the inter-spike interval serial correlation  coefficient is defined as, \rho_k = \frac{\langle (X_{i+j} - \langle I_i \rangle) (I_i - \langle I_i \rangle) \rangle}{(I_i - \langle I_i \rangle)^2} and related to the power spectral density at 0 frequency as: S(0) = r_0 {CV}^2 \left[ 1 + 2 \sum_{k=1}^\infty \rho_k \right] where r_0 is the mean firing rate, and CV is the coefficient of variation of the (marginal) inter spike interval density. For renewal process, the power spectral density can be neatly expressed as: S(\omega) = r_0 \frac{1 - |\tilde p|^2}{|1 - \tilde p|^2}.

Neurons in vivo have various noise sources — mainly synaptic bombardment, and this can be approximated with a diffusion process. He assumed a fixed conductance based leaky integrate-and-fire neuron (one dimension) and Poisson stimulation with first-order synaptic smoothing as input, and then he showed a series of approximations that leads to a diffusion approximation [4,5]. This results in a standard form of stochastic differential equation C\frac{dV}{dt} = -g_0(V-E_0) + \sqrt{2D} \zeta(t) where g_0 is the effective conductance, D is the diffusion time constant, and $\zeta(t)$ is a white Gaussian noise. To simulate this Ornstein-Uhlenbeck process of the form \dot{x} = f(x) + \sqrt{2D} \zeta(t) in discrete time steps \Delta t is to use the recursive formula: x(t+\delta t) = x(t) +f(x(t)) \Delta t + \sqrt{2D \Delta t} a(t) where a(t) is a zero-mean unit variance Gaussian random variable. The probability density of the membrane voltage trace evolution is governed by the Fokker-Planck equation: \partial_t P(x,t) = \partial_x[ -f(x)P(x,t)] + D\partial_x^2 P(x,t) where the right hand side is the negative partial derivative of the probability current. By setting the correct boundary conditions, one can solve the first-passage time distribution (and hence the power spectral density), and response to periodic stimulation. Also the most important is the linear response approximation of input-output relation of firing rate which works better when the noise is strong. Noise linearizes the response [6].

Our group discussion paper was [7] where the authors show that a model with interval correlation could increase information transfer in a very cleverly chosen model. Two models A and B are introduces have almost same properties (rate and ISI distribution), except for the fact that A has serial correlation. Intuitively thinking, since the correlation will decrease the total entropy of the firing, the mutual information from the input to the output which is bounded by this value should decrease. However, the nonlinear property of the neuron gives rise to redistribution of energy in the spectral domain for spontaneous activity. Hence lower frequency input can be passed through the system better. Linear response approximation for perfect integrate-and-fire neuron and the lower bound of mutual information obtained from the coherence are used to derive the final result. The system is in a high noise region, so the decrease of output entropy is mostly for the noise and the increase in mutual information (lower bound) is very possible. I wonder how general this result is; simulations with more complex neuron models and actual computation of mutual information instead of using the lower bound would be an interesting follow up study.

  1. Sergej V. Girman, Yves Sauve, Raymond D. Lund. Receptive Field Properties of Single Neurons in Rat Primary Visual Cortex. J Neurophysiol, Vol. 82, No. 1. (1 July 1999), pp. 301-311.
  2. Jason N. D. Kerr and Winfried Denk. Imaging in vivo: watching the brain in action. Nature Reviews Neuroscience 9, 195-205 (March 2008) | doi:10.1038/nrn2338
  3. Shy Shoham, Daniel O’Connor, Ronen Segev. How silent is the brain: is there a “dark matter” problem in neuroscience? Journal of Comparative Physiology A: Neuroethology, Sensory, Neural, and Behavioral Physiology, Vol. 192, No. 8. (1 August 2006), pp. 777-784  doi:10.1007/s00359-006-0117-6
  4. Lars Wolff, Benjamin Lindner. Method to calculate the moments of the membrane voltage in a model neuron driven by multiplicative filtered shot noise. Physical Review E (Statistical, Nonlinear, and Soft Matter Physics), Vol. 77, No. 4. (2008), 041913. doi:10.1103/PhysRevE.77.041913
  5. A. Burkitt. A Review of the Integrate-and-fire Neuron Model: I. Homogeneous Synaptic Input. Biological Cybernetics, Vol. 95, No. 1. (1 July 2006), pp. 1-19. doi:10.1103/PhysRevE.77.041913
  6. L. F. Abbott, Carl van Vreeswijk. Asynchronous states in networks of pulse-coupled oscillators. Physical Review E, Vol. 48, No. 2. (1 Aug 1993), pp. 1483-1490. doi:10.1103/PhysRevE.48.1483
  7. Maurice J. Chacron, Benjamin Lindner, André Longtin. Noise Shaping by Interval Correlations Increases Information Transfer. Physical Review Letters, Vol. 92, No. 8. (25 Feb 2004), 080601.  10.1103/PhysRevLett.92.080601

Acknowledgement: Memming thanks Dr. John Harris for his generous travel support.

Point process spaces

2009 August 13

To compare, infer, and estimate point processes, it is essential to think about the space of all possible point processes. For example, Poisson process can be uniquely defined by its intensity function (firing rate profile) \lambda(t): \mathcal{T} \rightarrow \mathbb{R}^+ which is a bounded non-negative function over time. Hence the space L_\infty may seem as a nice representation space. However, L_\infty does not enforce the functions to be non-negative valued, hence it is an undesirably bigger space than all possible Poisson processes. This is similar to the well known problem of embedding probability density functions in a linear space — non-negativeness constraint and linearity doesn’t go very well with each other. We will discuss various representation, spaces and structures on the spaces for generic point processes in this post.

A simple point process (point process with no accumulation point) can have at most one event at one time. For a simple point process that we will assume for this post, the following representations all fully define the process:

  • Counting (random) process \mathbf{N}(t) which denotes the total number of events up to time t. A realization of a counting process is a non-decreasing non-negative integer valued right continuous path where the times of unit jump corresponds to events. Since \mathbf{N}(t) is a submartingale, using the Doob-Meyer decomposition, one can obtain a martingale and an increasing process (compensator process).
  • Random (counting) measure which is an analogy to a random variable where the range is a space of counting measures instead of the real line. The probability law induces a probability measure on the realizations (counting measures).
  • A probability distribution \{p_n\}_{n=0}^\infty for the probability of total number of events in the process, and a probability distribution \Pi_n(t_1, t_2, \ldots, t_n) symmetric over the permutation of its arguments (the timings are not ordered). [Ch 5, Daley and Vere-Jones]
  • Janossy measure \{J_n(t_1, \ldots, t_n)\}_{n=0}^\infty is a probability measure for space of ordered times t_1 < t_2 < \cdots < t_n. This is directly related to the above through J_n(t_1, \ldots, t_n) = p_n n! \Pi_n(t_1, \ldots, t_n). [Ch 5, Daley and Vere-Jones]
  • Conditional intensity function (conditional hazard function) \lambda^\ast(t) = \lambda(t|\mathcal{H}_t) which is really an alias for a family of conditional functions \lambda_n(t|t_1, \ldots, t_n), the intensity of the process given the events (t_n is the latest event). [Ch 7, Daley and Vere-Jones]

Space of probability measures over the realizations (either ordered, unordered, times or counting measures) is a straightforward point process space. The general theory of similarity, divergence, and similarities can be directly applied to this non-Euclidean space. For example, the family of f-divergences (such as Hellinger divergence) can be applied to these measures. This does not provide key structures such as linearity, and inner products that are often useful for practical learning tasks. However, a family of divergences known as Hilbertian metrics (Hein and Bousquet 2005; also includes Hellinger as a special case) allows not only a metic that can be embedded in a Hilbert space, but a reproducing kernel Hilbert space (RKHS) extension of the space. Although the linearity and inner product structure provided by the RKHS leads to non-probability measure (hence, non-point process) points as elements in the space, therefore making an approximation to the closest point process measure necessary.

On the other hand, it is tempting to extend the concept of L_2 space of intensity functions in the Poisson case [Paiva et. al. 2009] to the conditional intensity funciton for general class of point processes. We have defined the Hilbert space of Poisson processes with L_2 inner product between intensity functions (we assumed the intensity functions are square integrable.) Let \lambda^\ast(t) and \gamma^\ast(t) be conditional intensity functions, \int_\mathcal{T} \lambda^\ast(t) \gamma^\ast(t) \mathrm{d}t at first seems like a good candidate. However, one quickly realizes that this is a (random) quantity that depends on the realization which controls the conditioning term \mathcal{H}_t. Unlike the divergence measures we mentioned before, this quantity is dependent on the integrating measure \int_\Omega \int_\mathcal{T} \lambda^\ast(t) \gamma^\ast(t) \mathrm{d}t \mathrm{d}\mu \neq \int_\Omega \int_\mathcal{T} \lambda^\ast(t) \gamma^\ast(t) \mathrm{d}t \mathrm{d}\nu in general for \mu \neq \nu. It is possible to resolve it by converting the conditional intensity function back to the probability measure. Since the Janossy measure and conditional intensity function are closely related as J_n (t_1, \ldots, t_n) = \prod_i^n \lambda^\ast(t_i) \exp\left( - \int_\mathcal{T} \lambda^\ast(u) \mathrm{d}u \right), one can evaluate the reference measure independent Hilbertian metric or f-divergences to obtain a reference measure independent RKHS. Note that the linearity provided by the L_2 with a fixed measure is very different from that of the RKHS.

References

  • Andersen, Borgan, Gill and Keiding, “Statistical Models based on Counting Processes”, Springer-Verlag 1993
  • Snyder and Miller, “Random Point Porcesses in Time and Space”, Springer-Verlag 1991
  • Daley and Vere-Jones, “An Introduction to the Theory of Point Processes. Volume 1: Elementary Theory and Methods“, Springer 2003
  • Hein and Bousquet, “Hilbertian Metrics and Positive Definite Kernels on Probability Measures”, Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics 2005

Hellinger divergence between two homogeneous Poisson processes

2009 July 14

Hellinger divergence/distance/metric/dissimilarity measures how different two probability measures are. I am interested in comparing spike train observations, hence I apply the Hellinger divergence to probability measures over all possible spike trains. The squared Hellinger divergence is defined as D^2(P,Q) = \int \left(\sqrt{P} - \sqrt{Q}\right)^2 where P and Q are probability measures.

In case of  the simplest point process, homogeneous Poisson process, the spike train space as well as the probability measure can be partitioned according to the number of events (action potentials) in each spike train. Since, the location statistic (probability given the total number of events) is exactly same for all homogeneous Poisson process, the integral in Hellinger divergence can be simply written as the following summation,

D^2(P,Q) = \sum_{n=0}^\infty \left( \sqrt{p_n} - \sqrt{q_n} \right)^2 = 2 - 2 \sum_{n=0}^\infty \sqrt{p_n q_n}

where p_n and q_n are the probability that n events occur. Let \Lambda_P and \Lambda_Q be the corresponding mean number of events, then p_n = \frac{1}{n!} \Lambda_P^n \exp(-\Lambda_P) and similarly for q_n; they follow the Poisson distribution. Substituting and rearranging the terms, we obtain,

D^2(P,Q) = 2 - 2 \exp\left(-\frac{1}{2} \left(\sqrt{\Lambda_P}-\sqrt{\Lambda_Q}\right)^2\right).

This is essentially same as just the divergence between two Poisson distributions. Let’s plot how it looks (using Matlab).

The divergence between <img src='http://l.wordpress.com/latex.php?latex=%5CLambda+%3D+0%2C4%2C8%2C12&bg=ffffff&fg=333333&s=0' alt='\Lambda = 0,4,8,12' title='\Lambda = 0,4,8,12' class='latex' /> and other rates between 0 to 16″ width=”300″ height=”187″ /><p class=The divergence between Lambda = 0,4,8,12 and other rates between 0 to 16

Squared Hellinger divergence as we defined it earlier ranges from 0 to 2. From the figure above, we can see that it is zero when the rates coincide and tends to 2 as the difference increases. I recently proposed a nonparameteric estimator for this measure, and it is currently under review. I would like to empirically verify the asymptotic consistency of the estimator with this result.

Note that previously proposed mCI based distance between point processes [Paiva et. al. 2009], related to van Rossum’s distance of spike trains, has a quadratic form (\Lambda_P - \Lambda_Q)^2. It only applies to Poisson process when no realization is given, which is a major limitation. The Hellinger divergence can be applied to arbitrary point processes.

Eleksius is an active learning problem

2009 June 16
by memming

I just realized that the previously mentioned game Eleksius is a very difficult form of active learning. This thought was triggered by the ICML 2009 tutorial on active learning from John Langford’s blog article.  Active learning is a machine learning technique where the machine actively chooses which query to ask to the oracle. This is particularly useful in combination with costly experiments. When one has a black box that has input and output, and want to find out the relation between input and output, but to obtain an output, one needs time and/or money; ffor example such as DNA microarray, drug discovery, space, high energy experiments, or even computer simulations. Active learning can be thought of as a clever way of designing experiments that depend on the previous experimental results. The key is to search the plausible hypothesis space efficiently through probing the best spots.

In other games such as Eleusis and some sequence puzzles, one also searches the hypothesis space of logical and mathematical rules, and one needs to find a “simple” solution, however, are not active learning problem since the player does not have the choice of next item.

Let X be the set of all objects, and Q be the set of all hypotheses. Each hypothesis (or question) q in Q is a binary function from X to {‘yes’, ‘no.’}. In Twenty questions, the player tries to find an unknown element x in X by using a few (less than or equal to twenty) questions as possible. The goal of Eleksius is to find the unknown q by using as few number of objects as possible.

The set of all possible binary functions on X has the cardinality 2^X (fancy way of saying the number of elements). Hence, in principle, finding Q should be more difficult just because of the size of the set. However, this is not entirely true. Arbitrary binary function on X is meaningless. Also, the function must be balanced, there are many objects corresponding to yes, and same for no. However, the most important factor is human perception and natural language. The question in god’s mind is some simple concept that can be clearly expressed in a few words at most. Therefore there is a structure in this space Q, some are preferred and some are not. How can we measure this? Is there a natural algorithm that a human or computer player can use to solve Eleksius?

Input Driven Synchrony of Oscillating Olfactory Receptor Neurons: Computational Modeling Study

2009 April 21
Input Driven Synchrony of Oscillating Olfactory Receptor Neurons: Computational Modeling Study

Input Driven Synchrony of Oscillating Olfactory Receptor Neurons: Computational Modeling Study

I have been working on modeling and analyzing spontaneously bursting olfactory receptor neurons of Lobsters. This work is done with collaboration with Yuriy Bobkov, a nice Russian electrophysiologist. He discovered that there are spontaneously oscillating cells among the primary sensory receptor neurons which is surprising because sensory neurons are supposed to send signals that they detect, but not generate signals by themselves. We had some hypothesis, and the only way to see the plausibility was to extrapolate single cell recording results to a full population model, because we thought that the population dynamics of these cell would represent temporal ques from the environment. I built a modified renewal neuron model, extrapolated the model to the population, and built a neural decoder. So if you are interested, and happend to be around Sarasota, Florida, USA on 23 April, 2009 (AChemS), please come and visit my poster.

P.S. I like posters with less text, because I know people will not read them if there is too much, but some of my coauthors insisted on putting full figure captions, so it looks kind of compact.