An interesting connection between sloppy model analysis and projection operators

Sloppy Models
Projection Operators
Author

David Jordan

Published

May 13, 2024

This note concerns various ways I have been thinking about basis functions with connections between some other fields I have been interested in, namely, projection operators, Koopman and Transfer operators, coordinate transformations, “sloppy model analysis”, Hilbert spaces and Reproducing Kernel Hilbert spaces, and function approximation.

Function approximation is a useful tool, it is no coincidence that artificial neural networks of the multilayer feedforward variety 1 2 are provably universal function approximators. Projection in a Hilbert space of functions is one method of function approximation. A familiar example of a Hilbert space projection methods is Fourier analysis, where an arbitrary function is represented by its projection onto a basis set of functions, in this case sinusoids. Function projection relies on having an inner product on the function space, and in this case we will use the \(L^2\) inner product defined as, \[ \langle f(t),g(t)\rangle = \int_a^b(f(t) \cdot g(t)) dt\] With an increasing number of terms in the Fourier series, we can approximate a given function to arbitrary accuracy. Truncating the series is a form of projection, where we are projecting the infinite dimensional vector that represents the function onto the subspace spanned by only a finite set of modes, for example onto the lower frequency modes. The projection need not be a frequency cutoff, one could choose arbitrarily some subspace on which to project the function, for example, a custom compression for that function might choose the n modes with the highest power. This is an example of what I call “Concentration of Dimension”3 and may provide a basis for understanding the emergence of low-dimensionality in biological systems and in particular how these systems are capable of both canalization and plasticity.

In general we can represent an arbitrary function in any basis by projecting it onto the span of the subspace of those basis functions. It is easiest if those functions comprise an orthonormal basis, as they do in the Fourier series example, but this is not necessary, in fact we don’t even have to orthogonalize the basis first if we can compute and invert the Gram Matrix (the matrix of inner products between the basis functions). This is the basic idea behind regularization in function approximation and techniques such as kernel regression. This should sound eerily familiar re: Orthogonality!

At this point I would like to present a simple example which will also highlight the connection to sloppy model analysis. Taylor series approximation is a well known example of function approximation in a polynomial basis, usually motivated as a local equivalence between the n derivatives of the function and those of the polynomial approximation. However we can also view polynomial approximation as a projection onto polynomial basis functions. For example, the second order Taylor approximation of a function \(f(x)\) can be viewed as the projection of the function \(f(x)\) onto the subspace spanned by \(1\), \(x\), and \(x^2\). In a procedure similar to what we did wth orthogonality, we can first compute the projection onto the basis functions even though the basis functions are not orthonormal. For this example lets use \(f(x) = sin(x)\) and the interval \([a,b] = [0,1]\) to match the conditions in Sethna’s work. As a reminder, we are going to use function space projection to find the coefficients \(w_i\) in the nth order polynomial approximation \(\sum_{i=0}^n(w_it^i)\) . For the polynomial approximation, this can be written as minimization problem \[min_{w\in R^n}\left\lVert f-\sum_{i=0}^n(w_it^i) \right\rVert\] noting that \[ \left\lVert f-\sum_{i=0}^n(w_it^i) \right\rVert^2 = \langle f-\sum_{i=0}^nw_it^i,f-\sum_{i=0}^nw_it^i \rangle\]

and using the inner product defined above, we can derive the Projection in terms of the Gram Matrix with the projection is given by \[\begin{align}w^* &= G^{-1} \left [ \begin{matrix} \langle f,b_0\rangle \\ \langle f,b_1\rangle \\ \langle f,b_2\rangle \\ ... \\ \langle f,b_n\rangle \\ \end{matrix} \right]\end{align}\]

with the Gram matrix given as the \(nxn\) matrix of inner products between the basis functions. If the basis functions form an orthonormal basis, then the Gram matrix is equal to the identity matrix, and it is equal to its own inverse. However, this need to be true and in general the Gram matrix is given as \[ G = \left [ \begin{matrix} \langle b_0,b_0\rangle & \langle b_1,b_0\rangle & ... & \langle b_n,b_0\rangle \\ \langle b_0,b_1\rangle & \langle b_1,b_1\rangle & ... & \langle b_n,b_1\rangle \\ ... & ... & ... & ... \\ \langle b_0,b_n\rangle & \langle b_1,b_n\rangle & ... & \langle b_n,b_n\rangle \\ \end{matrix} \right]\]

With our definition of the inner product and the monomial basis functions, we can compute this gram Matrix explicitly for polynomial projection.

\[\begin{align} \langle b_0,b_0\rangle &= \int_0^1(1\cdot 1)dt = x|_0^1 = 1 \\ \langle b_0,b_1\rangle = \langle b_1,b_0\rangle &= \int_0^1(1\cdot x)dt = \frac{x^2}{2}|_0^1 = \frac{1}{2} \\ \langle b_1,b_1\rangle &= \int_0^1(x\cdot x)dt = \frac{x^3}{3}|_0^1 = \frac{1}{3} \\ \langle b_0,b_2\rangle = \langle b_2,b_0\rangle &= \int_0^1(1\cdot x^2)dt = \frac{x^3}{3}|_0^1 = \frac{1}{3} \\ \langle b_1,b_2\rangle = \langle b_2,b_1\rangle &= \int_0^1(x\cdot x^2)dt = \frac{x^4}{4}|_0^1 = \frac{1}{4} \\ \langle b_2,b_2\rangle &= \int_0^1(x^2\cdot x^2)dt = \frac{x^5}{5}|_0^1 = \frac{1}{5} \\ \end{align}\]

So in this case, the final weights are given by

\[ \begin{align} w^* &= G^{-1} \left [ \begin{matrix} \langle sin(x),1\rangle \\ \langle sin(x),x\rangle \\ \langle sin(x),x^2\rangle \\ \end{matrix} \right] \\ &= \left [ \begin{matrix} 1 & 1/2 & 1/3 \\ 1/2 & 1/3 & 1/4 \\ 1/3 & 1/4 & 1/5 \end{matrix} \right ]^{-1}*\left [ \begin{matrix} 0.4597 \\ 0.3012 \\ 0.2232 \end{matrix} \right ] \end{align}\]

Which gives the following results:

::: {#cell-non-linear fit .cell execution_count=1}

Code
using Plots
using QuadGK
t = range(0, 1, length=100)
p = plot(
    xlabel="time", ylabel="y",
    title="y = sin(t)",
    legend=true,
    background_color=:black,
    background_color_outside=:black,
    fg_color=:white,
    titlefontcolor=:white,
    guidefontcolor=:white,
    tickfontcolor=:white
)
plot!(p,t,sin.(t),label="sin(t)")
plot!(p,t,sin(0).+cos(0).*t-sin(0)/2.0.*t.^2,label="Taylor series")
t1 = range(0, 1, length=10)
f1(t) = sin(t) * 1  # This is just sin(t)
f2(t) = sin(t) * t
f3(t) = sin(t) * t^2

# Create a vector of functions
functions = [f1, f2, f3]

# Calculate the definite integrals from 0 to 1 for each function
integrals = map(f -> quadgk(f, 0, 1)[1], functions)
x = [1, 2, 3]  # You can change these values as needed
V = [1/(i+j-1) for i in 1:3, j in 1:3]
coeffs = inv(V)*integrals
plot!(p,t1,coeffs[1].+coeffs[2].*t1.+coeffs[3].*t1.^2,marker=(:circle, 5),label="Projection Operator")
plot!(p, xlim=(0.0, 1.1))
display(p)

Polynomial fits using either Taylor series approximation or the optimal projection

:::

Now let us look at the the general Gram matrix in this case, we obtain the Hilbert matrix. The fact that this matrix is ill conditioned means that the inverse greatly magnifies small differences in the input. \[\begin{align}G &= H_{ij}=\frac{1}{(i+j+1)} \\ &= \left [ \begin{matrix} 1 & \frac{1}{2} & \frac{1}{3} & ... \\ \frac{1}{2} & \frac{1}{3} & ...& ... \\ \frac{1}{3} & ... & ... & ... \\ ... & ... & ... & ... \end{matrix} \right]\end{align}\] I was struck when this matrix appeared because I had seen it before in the Sloppy model literature4 but derived in a very different context. In sloppy model analysis, we are interested in the looking at the parameter sensitivity of a continuous least squares regression. This sensitivity is captured by the Hessian of the fit function with respect to the parameters at the best fit point, so in this case, we are looking at the matrix given by

\[ \frac{\partial^2}{\partial w_i \partial w_j}\left (\frac{1}{2}\int_0^1\sum_n(w_it^i-w_i^*t^i)^2 dt \right )\]

Surprising to me, this gives the exact same matrix as the Gram matrix for the projection operator.

\[\begin{align}H_{n,m} &= \frac{\partial^2}{\partial w_n \partial w_m}\left (\frac{1}{2}\int_0^1\sum_n(w_it^i-w_i^*t^i)^2 dt\right ) \\ &= \frac{1}{2}\int_0^1 \frac{\partial^2}{\partial w_n \partial w_m} \sum_n(w_it^i-w_i^*t^i)^2 dt \\ &= \frac{1}{2}\int_0^1 \frac{\partial^2}{\partial w_n \partial w_m} \left ( \sum_n(w_it^i)^2-2\sum_n(w_it^i*w_i^*t^i)+\sum_n(w_i^*t^i)^2 \right )dt \\ &= \frac{1}{2}\int_0^1 \frac{\partial}{\partial w_n} \left ( 2\sum_n(w_it^i)*t^m-2(t^m*w_m^*t^m) \right )dt \\ &= \frac{1}{2}\int_0^1 \left ( 2t^n*t^m \right )dt \\ &= \int_0^1 t^{(n+m)} dt \\ &= \frac{1}{n+m+1}t^{n+m+1}\Big|_0^1 \\ &= \frac{1}{n+m+1} \end{align}\]

This leads me to believe that I am on the right track thinking about my worm developmental biology project, my worm behavior project, and our non equilibrium stuff in terms of projection operator theory (a convergence which emerged very unexpectedly at three different scales of inquiry)

Footnotes

  1. Hartman EJ, Keeler JD & Kowalski JM (1990) Layered neural networks with Gaussian hidden units as universal approximations. Neural Comput 2: 210–215↩︎

  2. Hornik K, Stinchcombe M & White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2: 359–366↩︎

  3. Jordan DJ & Miska EA (2023) Canalisation and plasticity on the developmental manifold of Caenorhabditis elegans. Mol Syst Biol: e11835↩︎

  4. Waterfall JJ, Casey FP, Gutenkunst RN, Brown KS, Myers CR, Brouwer PW, Elser V & Sethna JP (2006) Sloppy-model universality class and the Vandermonde matrix. Phys Rev Lett 97: 150601↩︎