Table of contents

Back to the main page
1 Non-linear structure-formation

Non-linear structure-formation

Figure: The linear power-spectrum today compared to the result of full non-linear simulations. We see that scales $k\gtrsim 0.1 h/$Mpc have gone non-linear.

Linear perturbation theory is all we need to study the CMB (with some small exceptions, e.g. gravitational lensing of the CMB requires us knowing the non-linear power-spectrum), however for structure formation the density contrast goes non-linear in the late Universe and linear perturbation theory breaks down. To be able to predict clustering on smaller scales the main approach is to do numerical simulations. Our Universe is complicated and there are many levels for which we can perform such simulations: with dark matter and baryonic gas, including effects of supernova explosion and super massive black holes at the center of galaxies, including radiation transfer of photons and so on and so on. We will here focus on the simplest case of pure dark matter simulations where all mass is treated as collisionless. This is a very good approximation for many things, however it has its shortcomings in that we (obviously) cannot form galaxies, but we can form the dark matter halos for which galaxies form inside and we can compute statistical observables like the matter power-spectrum quite accurately. Here we will go through the basic theory and algorithms for performing such simulations from generating initial conditions, performing the numerical simulations and anayzing the output. We will also briefly talk about how one can make it more realistic by adding in more physics and some other approaches to non-linear structure formation than N-body simulations.

The Vlasov-Poisson equations

We already know the equation that we really want to solve to determine the non-linear evolution for a collisionless fluid: the full fucking Boltzmann equation. We have already derived this one in linear perturbation theory, but lets do it again and not discarding any second order terms like we did in our previous derivation. We start with the distribution function $f = f(t,\vec{x},\vec{p})$ and write the Boltzmann equation as $$\frac{df}{dt} = \frac{\partial f}{\partial t} + \nabla_x f \cdot \frac{d\vec{x}}{dt} + \nabla_p f \cdot \frac{d\vec{p}}{dt} = 0$$ where the momentum is defined (slightly differently as we did before) as $$\vec{p} = m a^2 \frac{d\vec{x}}{dt}$$ and the geodesic equation for this quantity gives us $$\frac{d\vec{p}}{dt} = -m\nabla_x \Phi$$ and as usual the Poisson equation gives us $$\nabla_x^2 \Phi = 4\pi G a^2 (\rho - \overline{\rho})$$ where $$\rho = m\int \frac{d^3p}{(2\pi)^3} f$$ The full Boltzmann equation can then be written $$\frac{\partial f}{\partial t} + \nabla_x f \cdot \frac{\vec{p}}{ma^2} - m\nabla_p f \cdot \nabla_x \Phi = 0$$ $$\nabla_x^2 \Phi = 4\pi G a^2 (m\int \frac{d^3p}{(2\pi)^3} f - \overline{\rho})$$ This closed system is the so-called Vlasov-Poisson system and governs the evolution of a self-gravitating collisionless fluid of particles of mass $m$. The only problem with it is that it's $7$ dimensional so terriby expensive to solve numerically. We are not going to attempt this, though some people have solved this equation numerically. Instead we are going to use the N-body technique to approximate the distribution function by using a set of tracer particles to sample it instead. This is the main way of simulating non-linear structure formation of cold dark matter in cosmology today. We take $$f = \sum_{i=1}^N \delta(\vec{x} - \vec{x}_i(t)) \delta(\vec{p} - \vec{p}_i(t))$$ and by substituting this into the Vlasov-Poisson system we get $$\frac{d^2\vec{x}}{d\tau^2} = -\nabla \Phi$$ which is just the usual Newtonian laws of motion (in an expanding Universe, but this is "hidden" by the choice of coordinates $d\tau = \frac{dt}{a^2}$) for how the particles move. The density needed in the Poisson equation can then formally (this is not how its done in practice, but we'll get back to this) be computed as $$\rho = \sum_{i=1}^N m\delta(\vec{x} - \vec{x}_i(t))$$ As long as we have enough particles we will be able to accurately sample the distrbution function. Thus the rough outline of a simulation is as follows

  • Generate a set of particles $\{\vec{x}_i,\vec{v}_i\}_{i=1}^N$ in a periodic box with comoving size $B$.
  • Then start time-stepping. For each time-step we compute the force $\nabla\Phi$, for example by binning the paricles to a grid to get $\rho$ and solving the Poisson equation (for example by Fourier transforms).
  • Then we update the positions and velocities of the particles using some integration scheme for example (Leapfrog) $$\vec{x}_{i+1} = \vec{x}_i + \Delta \tau \vec{v}_{i+1/2}$$ $$\vec{v}_{i+1/2} = \vec{v}_{i-1/2} - \Delta \tau (\nabla\Phi)_i$$
  • Do this until we reach the present time.

From the particles we can compute anything we want: power-spectra, locate halos, computing lensing of light around structures etc. etc. etc. In the next section we will go through in more detail how to do these steps (and the various algorithms that have been invented for this purpose): the initial conditions, the time-stepping, how to compute forces and how to analyze the data.

Lagrangian Perturbation Theory

The kind of perturbation theory we go through in this course is Eulerian perturbation theory: we look at perturbations to fluid quantities. Another common way of doing perturbation theory is to consider particles making up the fluid and follow their trajectories. This is called Lagrangian perturbation theory and (among many other things) it's used to generate initial conditions (particle positions and velocities) for doing non-linear simulations of structure formation. On small scales the density contrast $\delta$ will grow to the order of unity for which linear perturbation theory breaks down and to be able to study structure formation in this regime we need simulations. In this short note we will go through the basics of how one can do this.

We will in this note use a time-coordinate $\tau$ which is defined via ${\rm d}\tau = \frac{{\rm d}t}{a^2}$ and dots below are time-derivatives wrt $\tau$. This is easier since the geodesic equation $$\frac{d^2{\bf x}}{dt^2} + 2H\frac{d{\bf x}}{dt} = - \frac{1}{a^2}\nabla\Phi$$ becomes $$\frac{d^2 {\bf x}}{d\tau^2} = -\nabla\phi$$ where $\phi = a^2\Phi$ so the equation looks just as in Newtonian gravity where $\phi$ is described by the Poisson equation $$\nabla^2 \phi = \beta \delta$$ where $\beta = 4\pi G a^4 \overline{\rho} = \frac{3}{2}\Omega_M a$.

Figure: The Lagrangian coordinates ${\bf q}$ and ${\bf x} = {\bf q} + {\bf \Psi}({\bf q},\tau)$

In Lagrangian perturbation theory we describe the position ${\bf x}$ of a particle as a function of its initial position ${\bf q}$ as: $${\bf x} = {\bf q} + {\bf \Psi}({\bf q},\tau)$$ where $\Psi$ is the so-called displacement field (not to be confused with the meric perturbation $\Psi$ - this is an unrelated quantity). If we imagine the points in the figure above representing point masses then we have the same number of points at any time. This mass conservation can be written $$\rho({\bf x,\tau}){\rm d}^3{\bf x} = \rho({\bf q}){\rm d}^3{\bf q}$$ where $\rho({\bf x,\tau})$ is the density of points. We also define the density contrast by $$\delta = \frac{\rho({\bf x},\tau)}{\rho({\bf q})}-1$$ and mass conservation then gives us $$\delta = \frac{1}{J}-1$$ where $J = \left|\frac{d^3{\bf x}}{d^3{\bf q}}\right|$ is the determinant of the Jacobian of the transformation between ${\bf q}$ and ${\bf x}$.

We are now ready to start deriving equations for the displacement field. Starting with the Poisson equation and the geodesic equation $$\nabla^2_{\bf x} \phi = \beta \delta$$ $$\ddot{{\bf x}} = -\nabla_{\bf x} \phi$$ we take the gradient $\nabla_{\bf x}$ of the last equation to arrive at (we use a subscript ${\bf x}$ to not confuse it with a gradient wrt ${\bf q}$ which is easy to do in these kinds of calculations) $$\nabla_{\bf x} \cdot \ddot{{\bf x}} = -\nabla_{\bf x}^2 \phi = \beta\delta \implies J\nabla_{\bf x} \cdot \ddot{{\bf x}} = \beta (J-1)$$ We now expand in a perturbative series $${\bf \Psi} = \epsilon {\bf \Psi}^{(1)} + \epsilon^2 {\bf \Psi}^{(2)} + \ldots$$ where $\epsilon = 1$ is a parameter there just to keep track of the order or each term. A term like ${\bf \Psi}^{(1)}$ is first order, terms like ${\bf \Psi}^{(2)}$ and ${\bf \Psi}^{(1)}\cdot {\bf \Psi}^{(1)}$ are both second order (the order is the sum of the superscripts $(\cdot)$ of each term) and so on. When we now will derive equations in perturbation theory we will do this order by order. For example if we have an equation $$A + B\epsilon + C\epsilon^2 + \ldots = a + b\epsilon + c\epsilon^2 + \ldots$$ Then the equation to zeroth order simply says $A=a$. Going to first order we get $B=b$ and so on.

Let's compute the Jacobian to first order. The matrix $\frac{\partial {\bf x}_i}{\partial {\bf q}_j}$ has components $\delta_{ij} + {\bf \Psi}_{i,j}$ where a comma denotes a derivative wrt the coordinate that follows. Note that these are ${\bf q}$ derivatives and not ${\bf x}$. We have $$ \begin{align} J = \left|\begin{array}{ccc} 1 + {\bf \Psi}_{1,1} & {\bf \Psi}_{1,2} & {\bf \Psi}_{1,3} \\ {\bf \Psi}_{2,1} & 1 + {\bf \Psi}_{2,2} & {\bf \Psi}_{2,3} \\ {\bf \Psi}_{3,1} & {\bf \Psi}_{3,2} & 1 + {\bf \Psi}_{3,3} \\ \end{array}\right| = 1 + ({\bf \Psi}_{1,1} + {\bf \Psi}_{2,2} + {\bf \Psi}_{3,3}) + \nonumber\\ ({\bf \Psi}_{1,1}{\bf \Psi}_{2,2}+{\bf \Psi}_{2,2}{\bf \Psi}_{3,3}+{\bf \Psi}_{3,3}{\bf \Psi}_{1,1} - {\bf \Psi}_{1,2}{\bf \Psi}_{2,1} - {\bf \Psi}_{2,3}{\bf \Psi}_{3,2} - {\bf \Psi}_{3,1}{\bf \Psi}_{1,3}) + \ldots \end{align} $$ where the omitted terms have three factors of ${\bf \Psi}$ (so will be atleast third order when we do a perturbative expansion). Thus to first order we simply have $J = 1 + \epsilon \nabla_{\bf q}\cdot {\bf \Psi}^{(1)} + \mathcal{O}(\epsilon^2)$. The next term we need is $$\nabla_{\bf x} \cdot \ddot{{\bf x}} = \nabla_{\bf x} \cdot \ddot{{\bf \Psi}}$$ To compute this we can write $$\nabla_{{\bf x}}\cdot \ddot{{\bf \Psi}} = \frac{d{\bf q}_k}{d{\bf x}_i}\frac{d}{d{\bf q}_k}\ddot{{\bf \Psi}}_i$$ We now need the inverse, $\frac{d{\bf q}_k}{d{\bf x}_i}$, of the matrix $\frac{d{\bf x}_i}{d{\bf q}_j} = \delta_{ij} + {\bf \Psi}_{i,j}$. This matrix satisifes $$\frac{d{\bf q}_k}{d{\bf x}_i}\frac{d{\bf x}_i}{d{\bf q}_j} = \delta_{kj}$$ (a matrix times multiplied by its inverse is the identity matrix). To zeroth order we see that $\frac{d{\bf q}_k}{d{\bf x}_i} = \delta_{ki} + \mathcal{O}(\epsilon)$. Taking $\frac{d{\bf q}_k}{d{\bf x}_i} = \delta_{ki} + Q_{ki}$ where $Q_{ki} = \mathcal{O}(\epsilon)$ we find that to first order $\frac{d{\bf q}_k}{d{\bf x}_i} = \delta_{ki} - {\bf \Psi}^{(1)}_{k,i}\epsilon + \mathcal{O}(\epsilon^2)$. And can continue this way to recursively compute it to any order. One can also make a more systematic derivation by writing the inverse in terms of the jacobian and minors of the original matrix, but for us this simple method is good enough. Using this, to first order we simply get $\nabla_{\bf x} \cdot \ddot{{\bf x}} = \epsilon\nabla_{\bf q}\cdot \ddot{{\bf \Psi}}^{(1)} + \mathcal{O}(\epsilon^2)$ and this gives us the first order equation $$\nabla_{\bf q}\cdot \ddot{{\bf \Psi}}^{(1)} = \beta \nabla_{\bf q}\cdot {\bf \Psi}^{(1)}$$ The initial conditions we have to specify is the initial density. Recall that $$\delta = \frac{1}{J}-1$$ and to first order we get $$\delta^{(1)} = \frac{1}{1 + \nabla_{\bf q}\cdot {\bf \Psi}^{(1)}\epsilon + \mathcal{O}(\epsilon^2)} - 1 = -\nabla_{\bf q}\cdot {\bf \Psi}^{(1)}\epsilon + \mathcal{O}(\epsilon^2)$$ where we used a Taylor expansion $(1+x)^{n} = 1 + nx + \mathcal{O}(x^2)$. Therefore, given an initial overdensity $\delta_{\rm ini}$ (which has to be small!) we can compute the initial displacment field as $$\nabla_{\bf q}\cdot {\bf \Psi}_{\rm ini}^{(1)} = -\delta_{\rm ini}$$ where ${\bf \Psi}_{\rm ini}^{(1)} = {\bf \Psi}^{(1)}({\bf q},\tau_{\rm ini})$.

To compute the time-evolution of ${\bf \Psi}$ notice that the equation is separable. We can write ${\bf \Psi}^{(1)}({\bf q},\tau) = D_1(\tau) {\bf \Psi}^{(1)}({\bf q},\tau_{\rm ini})$ to get $$\ddot{D_1} \cdot \nabla_{{\bf q}} {\bf \Psi}^{(1)}({\bf q},\tau_{\rm ini}) = \beta D_1\cdot {\bf \Psi}^{(1)}({\bf q},\tau_{\rm ini})$$ which gives us two equation: one PDE for the ${\bf q}$ dependence $$\nabla_{{\bf q}} {\bf \Psi}^{(1)}({\bf q},\tau_{\rm ini}) = {\bf \Psi}^{(1)}({\bf q},\tau_{\rm ini})$$ and one ODE for the $\tau$ dependence $$\ddot{D_1} = \beta D_1$$ We can simplify the PDE a bit more and reduce it from a vector potential down to a simpler scalar potential. For any vector field ${\bf \Psi}$ in 3D we can perform a Helmholz-decomposition ${\bf \Psi} = \nabla\phi + \nabla\times {\bf A}$ and if ${\bf \Psi}$ is irrotational then ${\bf A} = 0$. For our applications of this to structure formation we have that vorticity is decaying (and has no known source in the early Universe) so the velocity field should be irrotational to a good approximation (one can show that vorticity is generated at third order in perturbation theory). This allows us to write the displacment field as the gradient of a scalar-field ${\bf \Psi}^{(1)} = \nabla \phi^{(1)}$ for which the PDE simplifies to a simple Poisson equation $$\nabla^2 \phi^{(1)} = -\delta_{\rm ini}$$ This is easily solved using Fourier transforms $$\phi^{(1)}_{\rm ini} = \mathcal{F}^{-1}[\frac{1}{k^2} \mathcal{F}[\delta_{\rm ini}]]$$ The ODE we got is the same equation as we have for the Eulerian matter density contrast $\delta_m$ on sub-horizon scales, the so-called growth-factor: $$\ddot{D_1} = \beta D_1$$ where (by definition) $D_1(\tau_{\rm ini}) = 1$ and the other initial condition follows from knowing the growing mode solution in the matter dominated era $D_1\propto a$ so $\frac{d\log D_1}{d\log a} = 1$. This allows us to create particles from a given density-field as follows

  • Generate a density-field at some redshift (say a gaussian random field generated from a linear power-spectrum)
  • Compute the growth-factors and the displacement field by solving the PDE above using Fourier transforms
  • Make a uniform distribution of particles (these are the initial ${\bf q}$ positions)
  • Displace the particles according to ${\bf x} = {\bf q} + {\bf \Psi}$
  • Give the particles a velocity according to $\dot{{\bf x}} = \dot{{\bf \Psi}}$ (notice that to first order this is just $\frac{\dot{D_1}}{D_1} {\bf \Psi}$ where the derivative of the growth-factor is evaluated at the initial redshift)

One can go to further and compute equations for any order you want. The next order is left as an exercise and gives the result ($\Psi^{(2)}({\bf q},\tau) = D_2(\tau)\nabla_{\bf q} \phi^{(2)}_{\rm ini}({\bf q})$) $$\nabla^2_{\bf q} \phi^{(2)}_{\rm ini} = \left(\frac{D_2}{D_1^2}\right)_{\rm ini}[(\phi^{(1)}_{\rm ini,11})^2 + (\phi^{(1)}_{\rm ini,22})^2 + (\phi^{(1)}_{\rm ini,33})^2 - (\phi^{(1)}_{\rm ini,12})^2 - (\phi^{(1)}_{\rm ini,23})^2 - (\phi^{(1)}_{\rm ini,31})^2]$$ $$\ddot{D_2} = \beta(D_2 - D_1^2)$$ where $\phi^{(1)}_{\rm ini,ij} = \frac{\partial^2 \phi^{(1)}_{\rm ini}}{\partial {\bf q}_i \partial {\bf q}_j}$ and similar for the other terms. The growing mode in a matter dominated Universe has $D_2 = -\frac{3}{7}D_1^2$ so the prefactor $\left(\frac{D_2}{D_1^2}\right)_{\rm ini} = - \frac{3}{7}$ and the initial conditions for $D_2$ are $D_2(\tau_{\rm ini}) = -\frac{3}{7}$ (and $\frac{dD_2}{d\log a} = -\frac{6}{7}D_1^2 = -\frac{6}{7}$). The PDE above is easily solved using Fourier transforms, e.g.: $$\phi^{(1)}_{\rm ini,12} = \mathcal{F}^{-1}[-k_1k_2 \mathcal{F}[\phi^{(1)}_{\rm ini}]]$$ so from $\phi^{(1)}_{\rm ini}$ in Fourier-space we only need to multiply the grid by $k$'s and transform back to real-space to compute the right hand side in the PDE above (we need to do $6$ Fourier transforms to get all the $11,22,33,12,23,31$ terms needed). Given the right hand-side in real-space we can solve for $\phi^{(2)}_{\rm ini}$ in Fourier-space by again using Fourier transforms: $$F[\phi^{(2)}_{\rm ini}] = -\frac{1}{k^2} \mathcal{F}[S_{\rm right-hand-side-term}]$$ From the displacement potential we get the displacement field from more Fourier transforms $${\bf \Psi}^{(i)}_{\rm ini} = \mathcal{F}^{-1}[i\vec{k} \mathcal{F}[\phi_{\rm ini}^{(i)}]]$$ Given $\delta_{\rm ini}$ in real-space then we need a total of $1+3=4$ FFT's to compute $\Psi^{(1)}$. We already then have $F[\phi^{(1)}_{\rm ini}]$ so the next order requires $6+1+3=10$ FFT's to compute $\Psi^{(2)}$.

See the bottom of page 20 - page 23 of these notes for more details about Lagrangian perturbation theory.

Algorithms for solving the Poisson equation

The most important part of an ${\it N}$-body simulation is how one computes forces. This can be done in a vareity of ways.

Direct summation:

The conceptually simplest method is direct-summation: $$\vec{F}_i = \sum_{j\not= i}\frac{Gm_im_j}{|{\bf r}_i - {\bf r}_j|^2}\frac{({\bf r}_j-{\bf r}_i)}{|{\bf r}_i - {\bf r}_j|}$$ One issue here happens if we have close encounters ${\bf r}_i \approx {\bf r}_j$. Then the force blows up and things go bad! One way of solving this is to use a force-softening: $$|{\bf r}_i - {\bf r}_j|^2\to |{\bf r}_i - {\bf r}_j|^2 + \epsilon^2$$ where $\epsilon$ is the so-called force-softening length. This is effectively assigning some extent to our point-particles (though they are still allowed to pass through each others). The time-complexity for this method is $\mathcal{O}(N_p^2)$ as we have to loop over every particle to compute a single force (by symmetry, well Newtons 3rd law, we can cut the cost down by a factor of $2$ so we only need to compute $N_p(N_p-1)/2$ forces). This quickly becomes very expensive as the number of particles $N_p$ grows. Another issue to be aware of is that if we have a periodic box then we should sum the forces from all (infinite) periodic images. This is often done using Ewald summation. This applies to all methods (except those realying on solving the Poisson equation directly for which this is baked into the boundary conditions).

Particle Mesh (Fourier transforms):

The simplest method to implement numerically is the so-called Particle Mesh method. We put an uniform grid over the simulation box, assign particles to the grid to compute the density field and then solve the Poisson equation using Fourier transforms. Taking Fourier transforms of $$\nabla^2\Phi = \beta\delta$$ $${\bf F} = \nabla\Phi$$ i.e. $$-k^2\hat{\Phi} = \beta\hat{\delta}$$ $${\bf \hat{F}} = i\vec{k}\hat{\Phi}$$ and solving for the force we find $${\bf F} = \mathcal{F}^{-1}[-i\vec{k}\frac{\beta\mathcal{F}[\delta]}{k^2}]$$ where $\mathcal{F}$ denotes a Fourier transform. Thus we can compute the force-field in the whole box with just $4$ Fourier transforms. A Fourier transform has time-complexity $\mathcal{O}(N_g\log(N_g))$ where $N_g$ is the (total) number of grid-cells. There are efficient implementations of Fourier transforms out there so one can use such existing libraries to do the work for us. Given the force-field we simply interpolate it to the particle positions to get the force. This is the great advantage of this method: simplicity and speed. The downside with this method is that the force resolution is the size of the grid-cells (e.g. if we have $256^3$ cells in a box of length $B$ then the force resolution is $B/256$). This means we need a very large grid to be able to resolve small-scale forces which again is costly both in terms of runtime and in terms of the memory needed to store the grid(s). This method is great for simulating the large scale structure of the Universe, but not so good at resolving small scale structures (halos). If the force resolution is larger than typical halos then the resulting halos will be much more "puffier" than they should be.

Adaptive Particle Mesh (Multigrid relaxation):

One way of getting past the problems of having a high force resolution is to have adaptively refined grids, but then one usually cannot use Fourier transforms (easily) as most implementations require a complete and regular grid. If the grid is not complete and regular then one have to use other methods for solving the Poisson equation. One such method is relaxion. If we are in one spatial dimension then we can discretize it it (here using a three-point stencil) $$\frac{\Phi_{i+1}+\Phi_{i-1}-2\Phi_{i}}{h^2} = \beta\delta_i$$ which we can write as $$\mathcal{L}_i = \frac{\Phi_{i+1}+\Phi_{i-1}-2\Phi_{i}}{h^2} - \beta\delta_i$$ Writing it in this way we now want to find the "root" of $\mathcal{L}$. We can make a guess for $\Phi_i$ across the grid and the correct this value using Newton's method $$\Phi_i \leftarrow \Phi_i - \frac{\mathcal{L}_i}{\frac{d\mathcal{L}_i}{d\Phi_i}} = \frac{\Phi_{i+1}+\Phi_{i-1} - h^2\beta\delta_i}{2}$$ This is usually done in a "checker-board" way: i.e. we loop over every "even" cell in the grid and change the value and then we do the same with the "odd" cells (the reason is that $\Phi_i$ is changed on the basis of $\Phi_{i\pm 1}$). One can then loop until we have convergence. This usually converges quickly and is straight forward to implement. The problem is that convergence is typically very slow! We update $\Phi_i$ on the basis of only the nearest neighbors so the "speed" a change in on position $\Phi_i$ "propagates" to another site $\Phi_j$ is just one grid cell per sweep. What this means is, if we consider the Fourier components of the solution, then small-scale modes converges quickly, but large-scale modes converges slowly. The solution to this is to have a stack of grids (thereby the name Multigrid) with larger and larger grid-cells (down to the case where we have $2^3$ cells covering the whole box). We do a bit of solving on the original grid, interpolate it down to a coarser grid - solve - interpolate further down - solve etc until we reach the coarsest level. Then we interpolate up - solve - etc. until we reach the original grid. One such roundtrip is called a cycle and we typcially converge after a few cycles. The convergence criterion is usually based on the RMS value of $\mathcal{L}_i$ across the grid. The generalization to higher dimensions is stright forward (i.e. we get $\mathcal{L}_{ijk} = \frac{\Phi_{i+1,j,k} + \Phi_{i-1,j,k} - 2\Phi_{i,j,k}}{h^2} + \ldots$ in 3D). A big advantage of this method is that it allows us to also solve non-linear equations, whereas Fourier transforms are only good for linear PDEs.

Tree algorithm:

In this approach we consider groups of particles at a large distance to be a single entity and compute the force due to the group rather than sum over individual particles. The most popular algorithm in this class is the The Barnes–Hut algorithm. It recursively divides the particles into groups by storing them in an octree (in 3D). Each node in this tree represents a region of the three-dimensional space. The topmost node represents the whole space, and its eight children represent the eight octants of the space. The space is recursively subdivided into octants until each subdivision contains 0 or 1 bodies (some regions do not have bodies in all of their octants). To calculate the net force on a particular particle, the nodes of the tree are traversed, starting from the root. If the center of mass of an internal node is sufficiently far from the body, the bodies contained in that part of the tree are treated as a single particle whose position and mass is respectively the center of mass and total mass of the internal node. If the internal node is sufficiently close to the body, the process is repeated for each of its children. Whether a node is or isn't sufficiently far away from a body, depends on the quotient $s / d$, where s is the width of the region represented by the internal node, and d is the distance between the body and the node's center of mass. This means we don't have to loop over every pair of particles and significantly speed up the force calculation and brings it down from $\mathcal{O}(N_p^2)$ for direct summation down to $\mathcal{O}(N_p\log(N_p))$.

Hybrid method (Tree-PM):

Tree-PM is one great algorithm that combines the best of two worlds. The force can be split into a "short-range" and a "long-range" component. A tree is used to compute the short-range component and Fourier transforms is used to compute the long-range component. To see how this can be done we start with the Poisson equation in Fourier space $$\Phi = -\frac{\beta\delta}{k^2}$$ and write this as $$\Phi = -\frac{\beta\delta}{k^2} e^{-(kr_s)^2} -\frac{\beta\delta}{k^2}(1-e^{-(kr_s)^2}) \equiv \Phi_{\rm short-range} + \Phi_{\rm long-range}$$ where $r_s$ is some length-scale (defining what we mean by short and long). The long range component is computed in the same was as we usually solve it except we include the extra factor $(1-e^{-(kr_s)^2})$ multiplying the density contrast: $${\bf F} = \mathcal{F}^{-1}\left[-i\vec{k}\frac{\mathcal{F}[\delta]}{k^2}\cdot (1-e^{-(kr_s)^2}\right]$$ The short-range force in real-space (computed by taking the inverse Fourier transform of the expression above and differentiating it to get the force) corresponds to $${\bf F} = -\frac{Gm{\bf r}}{r^2}\left[\text{erfc}(r/2r_s) + \frac{r}{r_s\sqrt{\pi}}e^{-(r/2r_s)^2})\right]$$ where erfc is the so-called error-function. We can now apply the tree algorithm to compute the short-range force using the expression above and combine it with the longe-range force, computed using Fourier transforms, to get the total force.

Figure: The short, long and total force using the Tree-PM method. Taken from this paper.

Fast approximate methods (COLA):


TODO: B-spline interpolations: Nearest Grid Point (NGP), Cloud in Cell (CIC), Triangular Shaped Cloud (TSC), ...

Algorithms for time-integration

The geodesic equation we need to evolve in an ${\it N}$-body simulation $$\frac{d^2{\bf x}}{d\tau^2} = -\nabla\Phi$$ can be written as a system of two first order equation $$\frac{d{\bf x}}{d\tau} = {\bf v}$$ $$\frac{d{\bf v}}{d\tau} = -\nabla\Phi$$ These equations need to be discretized and there are many ways of doing it (Eulers method, Runge Kutta, ...), but its important to choose a method suitable for the problem at hand. First of all the step of evolving the velocity is expensive at it requires us to compute the force for all particles thus we want to be able to do as few force evaluations per unit step in time. We also want a method that is good at dealing with orbits. In a simulations most of the mass will end up in halos and a typical particle in halos will end up doing many orbits from entering to the simulation is over. It turns out that the usual simple methods like Euler and Runge Kutta are not great at orbits: the particle will tend to either gain or loose energy over time. A better method is a method that explicitly conserves energy: a so-called sympletic algorithm. Such algorithms always exists for systems derivable from a Hamiltonian and the simplest one is the so-called Leapfrog algorithm $${\bf x}_{n+1} = {\bf x}_n + {\bf v}_{n+1/2}\Delta\tau$$ $${\bf v}_{n+1/2} = {\bf v}_{n-1/2} + (-\nabla\Phi)_n\Delta\tau$$ The velocities and positions are displaced by one half time-step (achieved by for example taking a half-step in velocity at the start of the simulation).

Figure: Image by V. Springel (article). Simulations of a Keplerian orbit using different discretisation schemes. The analytical solution is in blue. We see that after enough orbits the numerical solution start deviating from this. For the second order Runge Kutta the orbits gets larger and larger and consequently the energy grows with time. For the fourth order Runge Kutta the orbits (and therefore the energy) decays in time. Of these three algorithms, only leapfrog is able to conserve energy and keep the shape of the orbit (though it does precess over time).

Leapfrog is $\mathcal{O}(\Delta\tau^2)$ accurate in time which is good enough for most purposes as long as the time-step is not too large. If this is not enough then there are higher order symplectic algorithms which goes using the name Yoshida methods that can be used.

Algorithms for power-spectrum estimation

The power-spectrum is just the square of the Fourier components. Starting with an overdensity field $\delta(\vec{x})$ we can compute this by first taking the (discrete) Fourier transform to get $\hat{\delta}(\vec{k})$, define some bins $k_1,k_2,\ldots,k_i = k_1 + i\Delta k$ and estimate the power-spectrum by computing $$P(k) = \left\lt |\delta(\vec{k})|^2\right\gt = \frac{\sum_{k-\Delta k/2 \lt |\vec{k}| \lt k + \Delta k/2}|\delta(\vec{k})|^2}{\sum_{k-\Delta k/2 \lt |\vec{k}| \lt k + \Delta k/2} 1}$$ where the mean is taken over all Fourier modes with $k-\Delta k/2 \lt |\vec{k}| \lt k + \Delta k/2$ where $\Delta k$ defines the size of the binning we use. There are some effects that are important to be aware off that one should correct to get accurate results.

By discretizing the density field on a grid we only sample it at discrete points $(\frac{i}{N},\frac{j}{N},\frac{k}{N})B$ with $i,j,k\in \{0,1,\ldots,N-1\}$ and the resulting Fourier transforms will then only gives us the Fourier modes $(\frac{i}{N},\frac{j}{N},\frac{k}{N})\frac{2\pi}{B}$ with $i,j,k\in\{-N/2,\ldots,-1,0,1,\ldots,N/2\}$. The largest frequency we can sample is to so-called Nyquist frequency $k_{\rm ny} = \frac{N}{2}\frac{2\pi}{B}$ corresponding to a wavelength equal to twice the grid-size $\Delta x = B/N$. The smallest frequency we have access to is the so-called fundamental mode $k_{\rm fund} = \frac{2\pi}{B}$ corresponding to a wavelength the size of the box. These two scales sets the resolution of a numerical power-spectrum.

Sample variance:

Remember cosmic variance? We have the same effect in simulations. If the density field we started off with is a gaussian random field then for a given wavenumber $k$ we only have $N_k \simeq \frac{4\pi k^2\Delta k}{k_{\rm fund}^3}$ modes availiable to estimate the power-spectrum from so there is a sample variance $$\Delta P(k) = \frac{P(k)}{\sqrt{N_k}}$$ associated with this and the power-spectrum will have a big uncertainity on the largest scales. One can beat down this variance by running many simulations with different initial conditions (which effectivey gives us more modes) or use a bigger boxsize such that the modes of interest are well sampled. One "trick" of getting around this almost completely, that turns out to work very well and not give biased results for a wide range of observables even though its not a perfect gaussian random field anymore, is to use so-called amplitude fixed initial conditions (see initial conditions section). This can also be combined with paired simulations: we run two simulations where the second one has all the phases inverted (to an overdensity becomes an underdensity) and compute the mean power-spectra of these two simulations.

Discrete Fourier transform:

The cosmological density field is continuous, while in a simulation we often discretize it on a grid to obtain the density field at $N^3$ points $(i_xB/N_,i_yB/N,i_zB/N)$ with $i_x = 0,1,2,\ldots, N-1$ and similary for $y$ and $z$. We can then perform a discrete Fourier transform by computing the sum $$f(\vec{k}) = \sum_{\vec{x}} f(\vec{x}) e^{-i\vec{k}\cdot\vec{x}}$$ to get the Fourier modes on a grid $\vec{k} = (2\pi i_x/BN, 2\pi i_y/BN, 2\pi i_z/BN)$ with $i_x = -N/2, \ldots, -1,0,1,\ldots, N/2-1$ and similary for $y$ and $z$. The inverse transform is given by $$f(\vec{x}) = \frac{(2\pi)^3}{B^3}\sum_{\vec{k}} f(\vec{k}) e^{i\vec{k}\cdot\vec{x}}$$ where the prefactor is the discrete equivalent of the $\frac{1}{(2\pi)^3}$ factor we need in continuous Fourier transform / inverse transform. There exists fast algorithms for computing such Fourier transforms that can do these sums with a $N\log(N)$ complexity (for example in C/C++ one can download and use the FFTW library).


When we compute the density field by assigning particles to a grid we compute $W*\delta$ where $W$ is our density assignment kernel. By the convolution theorem $\mathcal{F}[A*B] = \mathcal{F}[A]\mathcal{F}[B]$ so the fourier transform so the Fourier transform we compute is really $$\delta_{\rm Computed}(\vec{k}) = W(\vec{k})\delta(\vec{k})$$ The B-spline density assignement kernels are defined as $W = H$ for NGP ($H$ being the tophat), $W = H*H$ for CIC and in general $W = H*\ldots *H$ for a $N$th order B-spline kernel. The Fourier transform of a 1D tophat of size $\Delta x$ (in this case the grid-spacing) is simply $\frac{\sin(k\Delta x/2)}{k\Delta x/2}$ so in 3 dimensions the $N$th B-spline kernel is $$W(\vec{k}) = \left(\frac{\sin(k_x\Delta x/2)}{k_x\Delta x/2} \frac{\sin(k_y\Delta x)}{k_y\Delta x/2}\frac{\sin(k_z\Delta x/2)}{k_z\Delta x/2}\right)^N$$ Thus even though the density assignment for high order looks very complicated it has a simple Fourier space window function. The density field we want, $\delta(\vec{k})$, can be found from the one we compute, $\delta_{\rm Computed}(\vec{k})$, by simply dividing by the window function. This is called deconvolution (we undo the convolution that we did when doing the density field assignement). In the figure below we show the result of this deconvolution.


Aliasing occurs whenever the use of discrete elements to capture or produce a continuous signal causes frequency ambiguity. When a signal being sampled has components at frequencies higher than the Nyquist frequency $k = \pi / B$ (the sampling frequency we use that is defined by the grid-size we use) then every periodic signal with a frequency greater than the Nyquist frequency will look exactly like some other periodic signal at a frequency less than the Nyquist frequency which will be folded into the frequencies we have when we compute the dicrete Fourier transform. This comes into play when we compute the Fourier transform of the density field. Due to the finite sampling of the convolved density field we get this aliasing effect and the density field will get periodic images. The easiest way to get around this issue is to use a large enough grid such that aliasing is not a problem for the modes you are interested in, but that can be computationally expensive. There is luckily a much easier method which is to use a technique called interlacing.

Interacing consists of computing two density fields. One by assigning particles to a grid and one by assigning it to a grid that is displaced by $\Delta x/2$ in each direction. In practice this is done by adding $(\Delta x/2, \Delta x/2, \Delta x/2)$ to the particle positions when assigning this second density field. The Fourier transform of the two density fields are the same up to their aliasing properties and has the nice effect that if we add the two Fourier transforms $\frac{\delta_1(\vec{k})+ \delta_2(\vec{k})}{2}$ then we cancel out the leading alias contribution and therefore allows us to get trustable power-spectra to much larger $k$ (up to the Nyquist frequency) than without.

Shot-noise correction:

The power-spectrum is the Fourier transform of the two-point correlation function which counts how many pairs of particles we have at a given separation. Whenever we work with point-particles then the particle and itself will be at zero separation. Thus the self-correlation of particles leads to $\xi(r=0) \propto N_{\rm particles}$ which is equivalent to a broadband correction $\frac{1}{n}$ to $P(k)$ where $n = N_{\rm particles} / B^3$ is the number density of the particles in our box. The power-spectrum we compute must be corrected for this by subtracting this so-called shot-noise contribution. This is straight forward, we simply subtract off a constant to the power-spectrum we have computed $$P(k) \to P(k) - \frac{1}{n}$$ This is mainly an issue when we work with galaxies (or halos) where the number-density is not very high. In the figure below we show the effect of shot-noise.

Direct summation:

The overdensity field in our simulation box can be exactly represented as $$\frac{\rho}{\overline{\rho}} = 1 + \delta(\vec{x}) = V\frac{1}{N_{\rm particles}}\sum_{i=1}^{N_{\rm particles}} \frac{m_i}{\overline{m}} \delta_D(\vec{x} - \vec{x}_i)$$ where $m_i,\vec{x}_i$ is the mass/position of particle $i$, $\overline{m} = \sum m_i/N_{\rm particles}$ is the mean particle mass, $V=B^3$ is the volume of the box, and $\delta_D$ the Dirac delta function. We can use the disrete Fourier transform of the delta function $$\delta_D(\vec{x}) = \frac{1}{V}\sum_{\vec{k}} e^{i\vec{k}\cdot \vec{x}}$$ to get $$\delta(\vec{k}) = -\frac{V\delta_{\vec{k},0}}{(2\pi)^3} + \frac{V}{(2\pi)^3}\frac{1}{N_{\rm particles}}\sum_{i=1}^{N_{\rm particles}} \frac{m_i}{\overline{m}} e^{-i\vec{k}\cdot \vec{x}_i}$$ This allows us to compute exactly the Fourier components of the density field from which we can estimate the power-spectrum. Doing it this way we have no issue with aliasing and we don't need to do any deconvolution (but shot-noise still needs to be subtracted), however its very expensive as computing this sum for all the same grid-points as we would compute it for by performing the Fourier transform requires $N^3\cdot N_{\rm particles}$ operations! This method is therefore mainly used as a test of other faster methods.

Algorithms for halo-finding

TODO: Friends-of-friends, Phase-space FoF, spherical overdensity, unbinding, ...

Other things

TODO: massive neutrinos, relativistic effects, finite-box effects, mode-coupling, baryonic effects, ...