To compare, infer, and estimate point processes, it is essential to think about the space of all possible point processes. For example, Poisson process can be uniquely defined by its intensity function (firing rate profile) $\lambda(t): \mathcal{T} \rightarrow \mathbb{R}^+$ which is a bounded non-negative function over time. Hence the space $L_\infty$ may seem as a nice representation space. However, $L_\infty$ does not enforce the functions to be non-negative valued, hence it is an undesirably bigger space than all possible Poisson processes. This is similar to the well known problem of embedding probability density functions in a linear space — non-negativeness constraint and linearity doesn’t go very well with each other. We will discuss various representation, spaces and structures on the spaces for generic point processes in this post.

A simple point process (point process with no accumulation point) can have at most one event at one time. For a simple point process that we will assume for this post, the following representations all fully define the process:

• Counting (random) process $\mathbf{N}(t)$ which denotes the total number of events up to time $t$. A realization of a counting process is a non-decreasing non-negative integer valued right continuous path where the times of unit jump corresponds to events. Since $\mathbf{N}(t)$ is a submartingale, using the Doob-Meyer decomposition, one can obtain a martingale and an increasing process (compensator process).
• Random (counting) measure which is an analogy to a random variable where the range is a space of counting measures instead of the real line. The probability law induces a probability measure on the realizations (counting measures).
• A probability distribution $\{p_n\}_{n=0}^\infty$ for the probability of total number of events in the process, and a probability distribution $\Pi_n(t_1, t_2, \ldots, t_n)$ symmetric over the permutation of its arguments (the timings are not ordered). [Ch 5, Daley and Vere-Jones]
• Janossy measure $\{J_n(t_1, \ldots, t_n)\}_{n=0}^\infty$ is a probability measure for space of ordered times $t_1 < t_2 < \cdots < t_n$. This is directly related to the above through $J_n(t_1, \ldots, t_n) = p_n n! \Pi_n(t_1, \ldots, t_n)$. [Ch 5, Daley and Vere-Jones]
• Conditional intensity function (conditional hazard function) $\lambda^\ast(t) = \lambda(t|\mathcal{H}_t)$ which is really an alias for a family of conditional functions $\lambda_n(t|t_1, \ldots, t_n)$, the intensity of the process given the events ($t_n$ is the latest event). [Ch 7, Daley and Vere-Jones]

Space of probability measures over the realizations (either ordered, unordered, times or counting measures) is a straightforward point process space. The general theory of similarity, divergence, and similarities can be directly applied to this non-Euclidean space. For example, the family of f-divergences (such as Hellinger divergence) can be applied to these measures. This does not provide key structures such as linearity, and inner products that are often useful for practical learning tasks. However, a family of divergences known as Hilbertian metrics (Hein and Bousquet 2005; also includes Hellinger as a special case) allows not only a metic that can be embedded in a Hilbert space, but a reproducing kernel Hilbert space (RKHS) extension of the space. Although the linearity and inner product structure provided by the RKHS leads to non-probability measure (hence, non-point process) points as elements in the space, therefore making an approximation to the closest point process measure necessary.

On the other hand, it is tempting to extend the concept of $L_2$ space of intensity functions in the Poisson case [Paiva et. al. 2009] to the conditional intensity funciton for general class of point processes. We have defined the Hilbert space of Poisson processes with $L_2$ inner product between intensity functions (we assumed the intensity functions are square integrable.) Let $\lambda^\ast(t)$ and $\gamma^\ast(t)$ be conditional intensity functions, $\int_\mathcal{T} \lambda^\ast(t) \gamma^\ast(t) \mathrm{d}t$ at first seems like a good candidate. However, one quickly realizes that this is a (random) quantity that depends on the realization which controls the conditioning term $\mathcal{H}_t$. Unlike the divergence measures we mentioned before, this quantity is dependent on the integrating measure $\int_\Omega \int_\mathcal{T} \lambda^\ast(t) \gamma^\ast(t) \mathrm{d}t \mathrm{d}\mu \neq \int_\Omega \int_\mathcal{T} \lambda^\ast(t) \gamma^\ast(t) \mathrm{d}t \mathrm{d}\nu$ in general for $\mu \neq \nu$. It is possible to resolve it by converting the conditional intensity function back to the probability measure. Since the Janossy measure and conditional intensity function are closely related as $J_n (t_1, \ldots, t_n) = \prod_i^n \lambda^\ast(t_i) \exp\left( - \int_\mathcal{T} \lambda^\ast(u) \mathrm{d}u \right)$, one can evaluate the reference measure independent Hilbertian metric or f-divergences to obtain a reference measure independent RKHS. Note that the linearity provided by the $L_2$ with a fixed measure is very different from that of the RKHS.

#### References

• Andersen, Borgan, Gill and Keiding, “Statistical Models based on Counting Processes”, Springer-Verlag 1993
• Snyder and Miller, “Random Point Porcesses in Time and Space”, Springer-Verlag 1991
• Daley and Vere-Jones, “An Introduction to the Theory of Point Processes. Volume 1: Elementary Theory and Methods“, Springer 2003
• Hein and Bousquet, “Hilbertian Metrics and Positive Definite Kernels on Probability Measures”, Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics 2005