Table of contents

Back to the main page

3D Integrals

In this course we will encounter many 3D integrals, so lets review quickly how to deal with these. In most situations its most convenient to attack these using spherical coordinates: $x = r \sin\theta \sin\phi$, $y = r \sin\theta \cos\phi$, $z = r \cos\theta$. In such coordinates we have $$\int_{\mathbb{R}^3} f(\vec{x})d^3 \vec{x} = \int_0^\infty dr r^2 \int_0^{2\pi} d\phi \int_{-\pi}^\pi \sin\theta d\theta f(\vec{x}) = \int_0^\infty dr r^2 \int_0^{2\pi} d\phi \int_{-1}^1 d\cos\theta f(\vec{x})$$ where the extra terms $r^2 \sin\theta$ is simply the Jacobian of the coordinate transformation. If the function $f$ only depends on $r = |\vec{x}|$ then this simplifies down to a simple 1D integral $$\int_{\mathbb{R}^3} f(\vec{x})d^3 \vec{x} = 4\pi \int_0^\infty r^2 f(r) dr$$ This is not hard to understand: we are simply integrating over all space by summing up spherical-shells (in which the function is constant) which has volume given by surface area times thickness $4\pi r^2 \cdot dr$.

Some useful observations about these kinds of integrals that will be useful in this course:

  • If the integrand is anti-symmetric then the integral vanishes by symmetry $$\int_{\mathbb{R}^3} d^3\vec{x} f(|\vec{x}|)\vec{x} = 0$$
  • When dealing with Fourier transforms we often have integrals like $$\int_{\mathbb{R}^3} d^3\vec{x} f(|\vec{x}|) e^{i \vec{x} \cdot \vec{k}}$$ To evaluate this its convenient to rotate the coordinate system such that $\vec{k}$ points in the $z$-direction. Then $\vec{x} \cdot \vec{k} = r k \cos\theta$ where $r = |\vec{x}|$ and going to spherical coordinates we get $$\int_0^\infty dr r^2 \int_0^{2\pi} d\phi \int_{0}^\pi d\theta \sin\theta f(r) e^{i rk\cos\theta} = 2\pi \int_0^\infty f(r)r^2dr \int_{-1}^1 d\cos\theta e^{ir k \cos\theta} = 4\pi \int_0^\infty f(r)r^2\frac{\sin(rk)}{rk}$$ which is now a normal 1D integral.
  • The Dirac Delta Function as an integral $$\delta^{(3)}(\vec{x}) = \int_{\mathbb{R}^3} \frac{d^3\vec{k}}{(2\pi)^3} e^{i\vec{k}\cdot\vec{x}} $$
  • Solution of Poisson's equation on integral form $$\nabla^2 \Phi = 4\pi G \rho \iff \Phi = -4\pi G\int_{\mathbb{R}^3} \frac{\rho(\vec{y})}{4\pi|\vec{x} - \vec{y}|}d^3\vec{y}$$
  • Gauss divergence theorem $$\int_V \nabla\cdot \vec{F} d^3\vec{x} = \int_{\partial V} \vec{F} \cdot d\vec{S}$$
  • Integration by parts $$\int_V f\nabla g d^3\vec{x} = \int_{\partial V} fg \vec{n}\cdot d\vec{S} - \int_V g\nabla f d^3\vec{x}$$ Where the first integral on the right hand side is over the boundary of the volume $V$ we integrate over and $\vec{n}$ is a unit normal vector to the boundary surface. For cases where $V = \mathbb{R}^3$ (the whole space) and $fg$ decays fast enough as $|\vec{x}| \to \infty$ (as we often just assume) then the first term vanishes and we have $$\int_{\mathbb{R}^3} f\nabla g d^3\vec{x} = -\int_{\mathbb{R}^3} g\nabla f d^3\vec{x}$$

Boltzmann Integrals

In the lectures we enounted integrals like $\int_0^\infty \frac{x^2}{e^{x} \pm 1}dx$ and $\int_0^\infty x^2 e^{-x^2}dx$. Here we give some useful relations for evaluating these integrals:

$$\int_0^\infty \frac{x^{s-1}}{e^x-1}dx = \zeta(s)\Gamma(s)$$ $$\int_0^\infty \frac{x^{s-1}}{e^x+1}dx = \eta(s)\Gamma(s)$$ where $\zeta(s)$ is the Riemann zeta-function, $\eta(s) = (1-2^{1-s})\zeta(s)$ is the Dirichlet eta-function and $\Gamma(s)$ is the Gamma-function which for integer arguments is simply the factorial $\Gamma(n) = (n-1)!$. For even integer values of $s$ then $\zeta(s)$ can be expressed as a rational multiple of $\pi^s$ for example $\zeta(2) = \frac{\pi^2}{6}$ and $\zeta(4) = \frac{\pi^4}{90}$. The odd values are not expressible in closed form, but we have the numerical values $\zeta(3) \approx 1.202056$ and $\zeta(5) \approx 1.036928$. See the problem set for how to derive these formulas.

Another useful integral is the gaussian integral $$\int_0^\infty e^{-ax^2}dx = \frac{1}{2}\sqrt{\frac{\pi}{a}}$$ $$\int_{-\infty}^\infty e^{-ax^2}dx = \sqrt{\frac{\pi}{a}}$$ And the related integrals on the form $\int_0^\infty x^{2n} e^{-x^2}dx$ can be solved by considering $$f(a) = \int_0^\infty e^{-ax^2}dx = \frac{1}{2}\sqrt{\frac{\pi}{a}}$$ and then taking the $n$'th $a$-derivative to get $$\frac{d^n f(a)}{da^n} = \int_0^\infty (-1)^n x^{2n}e^{-ax^2}dx$$ Thus we only need to compute the $n$'th derivative of $f$ and take $a=1$ to get the integral we are after.

Fourier Transforms

The Fourier transform is a mathematical transform that decomposes a function into its constituent frequencies, such as the expression of a musical chord in terms of the volumes and frequencies of its constituent notes. The Fourier transform of a function of time is a complex-valued function of frequency, whose magnitude (absolute value) represents the amount of that frequency present in the original function, and whose argument is the phase offset of the basic sinusoid in that frequency. For a function of a single variable the transform (the convention we use in this course) and its inverse transform is defined as $$f(x) = \int_{-\infty}^\infty \hat{f}(k) e^{ikx}dk$$ $$\hat{f}(k) = \frac{1}{2\pi}\int_{-\infty}^\infty f(x) e^{-ikx}dx$$ We will typically use a "hat" $\hat{f}$ or $\mathcal{F}(f)$ to denote the Fourier transform. Another common notation/convention people use is to simply use $f$ for both the function and its transform and let the argument (or the context) tell you which of these we are talking about.

The most important properties of the transform is

  • It's a linear operator $\mathcal{F}(af+bg) = a\mathcal{F}(f) + b\mathcal{F}(g)$
  • The transforms of a derivative: $\mathcal{F}(\frac{df}{dx}) = ik \mathcal{F}(f)$. This is by far the most important property for us. It can be derived from the definition using integration by parts.
  • The transform of a product: $\mathcal{F}(fg) = \mathcal{F}(f) * \mathcal{F}(g)$ where $*$ denotes the convolution $(f*g)(x) \equiv \int_{-\infty}^\infty f(y)g(x-y)dy$. An equivalent formulation of this property is: $\mathcal{F}(f*g) = \mathcal{F}(f)\mathcal{F}(g)$.

In three spatial dimensions we have $$f(\vec{x}) = \int_{-\infty}^\infty \hat{f}(\vec{k}) e^{i\vec{k}\cdot \vec{x}}d^3k$$ $$\hat{f}(\vec{k}) = \frac{1}{(2\pi)^3}\int_{-\infty}^\infty f(\vec{x}) e^{-i\vec{k}\cdot \vec{x}}d^3x$$ and the derivative law is here $\mathcal{F}(\nabla f) = i\vec{k} \mathcal{F}(f)$.

Why is the Fourier transform so useful when working with linear PDEs (Videos: Solving the Heat Equation with the Fourier Transform, Solving PDEs with the FFT in Python)? Consider for example the wave-equation $$\frac{d^2u}{dt^2} - \nabla^2 u = 0$$ If we now take the Fourier transform (with respect to $x,y,z$) of this equation then we get $$\frac{d^2\hat{u}}{dt^2} + k^2 \hat{u} = 0$$ since $\mathcal{F}[\nabla^2 f] = (i\vec{k}) \cdot (i\vec{k}) \mathcal{F}[f] = -k^2 \mathcal{F}[f]$ and $\mathcal{F}[\frac{d^2u}{dt^2}] = \frac{d^2}{dt^2}\mathcal{F}[u]$ (the time-derivative can be pulled out of the integral). We see that the Fourier transform has reduced the PDE down to simple ODEs - one for each of the frequencies - (in this case with the simple solution $\hat{u}(k,t) = A(k)e^{ikt} + B(k)e^{-ikt}$).

This gives us a simple method for solving PDEs:

  • Take the Fourier transform of the initial condition and use this to set the initial condition for the ODEs (the $A(k),B(k)$).
  • Solve the ODE for each mode to get $\hat{u}(k,t)$.
  • Take the inverse Fourier transform to get $u(x,t)$.

This method applies for a wide class of linear PDEs and not just this simple example and often in cosmology we won't even need to go back to real-space as common observables are often expressed in terms of the Fourier components so once we have solved for $\hat{u}(k,t)$ we are done (but we'll get back to this later in the course). This also illustrates a very important property about the Fourier components: the evolution of a given mode $\hat{u}(k,t)$ only depends on itself, i.e. the ODE we end up with does not mix modes with different $k$'s. We can therefore understand the evolution of a given scale $k$ by only looking at this scale (we don't need to derive the full, complicated, solution to know what happens on a particular scale)!

Legendre Multipoles

The Legendre polynomials $P_{\ell}(x)$ arises naturally whenever one solves Laplace's equation $\nabla^2 \phi = 0$ using separation of variables in spherical coordinates. The eigenfunctions of the angular part of the Laplacian operator are the spherical harmonics (which we will encounter later on), of which the Legendre polynomials is a subset that is invariant when we do rotations about the polar axis. Whenever we have rotational symmetry then the Legendre polynomials tend to make their appearance. They were first introduced by Adrien-Marie Legendre as the coefficients in the expansion of the Newtonian potential $$\frac{1}{\left|\mathbf{x}-\mathbf{x}^{\prime}\right|}=\frac{1}{\sqrt{r^{2}+r^{\prime 2}-2 r r^{\prime} \cos \theta}}=\sum_{\ell=0}^{\infty} \frac{r^{\prime \ell}}{r^{\ell+1}} P_{\ell}(\cos \theta)$$ where $\theta$ is the angle between $\mathbf{x}$ and $\mathbf{x}^{\prime}$. This kind of expansion is quite useful when we encounter functions of vectors arguments like $f(\vec{x}, \vec{x}')$ that only depend on one or two of the norms $x,x'$ plus the angle between them $\cos\theta \equiv \frac{\vec{x} \cdot \vec{x}'}{xx'}$.

If we for example have two closely spaced opposite charges (an electric dipole) at positions ${\bf r}_{\pm}$ then the electric potential $$\phi({\bf r}) = \frac{kq}{|{\bf r} - {\bf r_+}|} - \frac{kq}{|{\bf r} - {\bf r_-}|}$$ can be Taylor expanded in powers of $(d/R)$ to get $$\phi({\bf r}) = \frac{q({\bf d}\cdot {\bf r})}{r^3} + \mathcal{O}(d/R)^2 = \frac{qd}{r^2}\cos\theta + \mathcal{O}(d/R)^2$$ where ${\bf d} = {\bf r_+} - {\bf r_-}$ is the separation between the two charges and $\cos\theta \equiv \frac{({\bf d}\cdot {\bf r})}{dr}$ is the angle between the separation vector of the two charges and the position vector (if $\cos\theta = 0$ then we are looking the charges head on and they cancel eachother out and if $\cos\theta = 1$ we are mid between the two charges). So at distances much larger than the separation of the two charges $r\gg d$ the total field is just a simple dipole: $\phi \approx \frac{qd}{r^2}P_1(\cos\theta)$. If we have four charges at the corners of a rectangle the electric field becomes a quadrupole: $\phi \propto \frac{3\cos^2\theta-1}{2} = P_2(\cos\theta)$.

    Figure 1: A dipole ($\ell=1$) and quadrupole ($\ell=2$) pattern. Image taken from Wikipedia (1, 2).

    We will encounter these functions when studying the Boltzmann equation for photons and massless neutrinos and they also play a big role when expanding the CMB temperature field in spherical harmonics. A monopole in the photon distribution at some point represents a uniform temperature in all direction, a dipole represents having "hot" photons on one side and "cold" photons on the other side and so on (see Figure 1, imagine standing in the center. Red represents "hot" and blue "cold" photons). Lets give some key properties of these functions. The Legendre polynomials are orthogonal in the sense that $$\int_{-1}^1 P_\ell(x) P_{\ell'}(x)dx = \frac{2}{2\ell + 1}\delta_{\ell\ell'} $$ and they form a complete set of basis functions. We can therefore expand any function $f(\cos\theta)$ depending on an angle $\theta$ as $$f(\cos\theta) = \sum_{\ell=0}^\infty f_\ell P_\ell(\cos\theta)$$ where $$f_\ell = \frac{2\ell+1}{2}\int_{-1}^1 f(x) P_\ell(x)dx$$ The expansion coefficients $f_\ell$ are called the (Legendre) multipoles. The $\ell=0$ term is called the monopole and is nothing but the average of $f$ over all angles. The $\ell=1$ term is called the dipole, $\ell=2$ is the quadrupole, $\ell=3$ the octopole, $\ell=4$ the hexadecapole and so on. Often we only need the first few multipoles to accurately describe the angular dependence of a function which makes this expansion very useful.

    The first few polynomials are given by $$P_0(x) = 1$$ $$P_1(x) = x$$ $$P_2(x) = \frac{3x^2-1}{2}$$ and some special values are $P_\ell(1) = 1$ and $P_\ell(-1) = (-1)^\ell$. We also have $P_{2\ell}(-x) = P_{2\ell}(x)$ and $P_{2\ell+1}(-x) = -P_{2\ell+1}(x)$ (i.e. $P_\ell$ is an even function for even $\ell$ and an odd function for odd $\ell$). $P_\ell(x)$ is a polynomial of degree $\ell$ with $\ell$ roots in $[-1,1]$ and satisfy the very useful recurrence relation $$(\ell + 1)P_{\ell+1}(x) = (2\ell+1)xP_{\ell}(x) - \ell P_{\ell-1}(x)$$ In this course we will only need to know the explicit form for the first few polyomials and the recurrence relation listed above.

    For a more thorough review of Legendre polynomials see this.

    Gaussian random fields

    A random variable $x$ is fully determined by specifying its probability distribution function (PDF) $f(x)$. For a simple 1D PDF the probabillity of the random variable having a value in $[x,x+dx]$ is simply $f(x)dx$ and the total probabillity is unity so $\int_{-\infty}^\infty f(x)dx = 1$. A gaussian (or normal) distribution has a PDF given by $f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$ where $\mu$ is the mean and $\sigma^2$ is the variance, but below we will just consider $\mu=0$. Various statistical properties (expectation values) can be computed as integrals (called moments of the distribution) $$\left\lt x \right\gt = \int_{-\infty}^\infty xf(x)dx = 0$$ $$\left\lt x^2 \right\gt = \int_{-\infty}^\infty x^2f(x)dx = \sigma^2$$ $$\left\lt x^3 \right\gt = \int_{-\infty}^\infty x^3f(x)dx = 0$$ $$\left\lt x^4 \right\gt = \int_{-\infty}^\infty x^4f(x)dx = 3\sigma^4$$ $$\left\lt x^{2n+1} \right\gt = \int_{-\infty}^\infty x^{2n+1}f(x)dx = 0$$ $$\left\lt x^{2n} \right\gt = \int_{-\infty}^\infty x^{2n}f(x)dx = (1\cdot 3\cdot 5 \cdots \cdot (2n-1))\sigma^{2n}$$ If you know all the moments then you know all there is to know about the distribution. You also do that if you know the PDF, however in practice we cannot measure the PDF directly so in physics the expectation values (moments) are the key quantities. By symmetry (since $f$ is an even function) all the odd ordered moments $\left\lt x^{2n+1} \right\gt$ vanish. All the even moments are given in terms of the second order moment, e.g. $\left\lt x^4 \right\gt = 3(\left\lt x^2 \right\gt )^2$. Thus the second order moment $\left\lt x^2 \right\gt$ contains all the information about the distribution. This is perhaps not so surprising in this simple case as the PDF only has one free parameter and the second order moments is this free parameter so obviously this is the case, but as we shall see this also holds in the more complicated situation of a multivariate gaussian distribution where its not so obvious. Having done a simple 1D gaussian lets talk about a gaussian random field. We consider having many gaussian random variables $x_1,x_2,\ldots,x_n$ at different ("all") points in space with a PDF $$f(x_1,\ldots,x_n) = \frac{1}{\sqrt{2\pi}^n |\det \xi_{ij}|^{1/2}}e^{- \frac{1}{2}\sum x_i (\xi^{-1}_{ij} x_j)}$$ where $\xi_{ij} \equiv \left\lt x_ix_j\right\gt$ is the two-point correlation function of $x_i$ and $x_j$ and tells us about the correlation between the value of the gaussian random field at $x_i$ and $x_j$. This is a so-called multivariate gaussian distribution. If all $x_i$'s are independent then $\xi_{ij} = \sigma_i^2\delta_{ij}$ is diagonal and $f(x_1,\ldots,x_n) = \prod_i f(x_i,\sigma_i)$ is simply a product of one dimensional gaussians. The PDF is normalized in the sense that $$\prod_{i=1}^n(\int_{-\infty}^\infty dx_i)f(x_1,\ldots,x_n) = 1$$ as can be checked by performing the integral explicitly (which would explain why we have exactly that prefactor to the PDF). The moments of the distribution is $$\left\lt x_i\right\gt = \prod_{i=1}^n(\int_{-\infty}^\infty dx_i)f(x_1,\ldots,x_n) x_i = 0$$ $$\left\lt x_ix_j\right\gt = \prod_{i=1}^n(\int_{-\infty}^\infty dx_i)f(x_1,\ldots,x_n) x_ix_j \equiv \xi_{ij}$$ $$\left\lt x_ix_jx_k\right\gt = \prod_{i=1}^n(\int_{-\infty}^\infty dx_i)f(x_1,\ldots,x_n) x_ix_jx_k = 0$$ $$\left\lt x_ix_jx_kx_l\right\gt = \prod_{i=1}^n(\int_{-\infty}^\infty dx_i)f(x_1,\ldots,x_n) x_ix_jx_kx_l = \xi_{ij}\xi_{kl} + \xi_{ik}\xi_{jl} + \xi_{il}\xi_{jk}$$ There is a simple way of writing down the $k$th moment: start with $x_{i_1}x_{i_2}\cdots x_{i_k}$ and write down all different ways you can group (contract) two and two terms untill all the $x_i$'s are tied up in groups. For odd $k$ this is not possible (we will always have one left over) so the result is $0$ and for even $k$ each contraction gives a factor of the two-point function. For example for $k=4$ the possible ways to contract two and two terms are: $(ij)(kl)$, $(ik)(jl)$, $(il)(jk)$ which is the result above (and this also explains the $3$ in $3\sigma^4$ for the fourth moment of the 1D gaussian). The nice way about this is that the problem of evaluating complicated multidimensional integrals is reduced to simple combinatorics. To see if you understand this try to compute $\left\lt x^6\right\gt$ for a 1D gaussian using this rule and check the result by evaluating the integral. This result is known as Wick's theorem (a result you will get much more familiar with if you study quantum field theory where it plays a central role).

    In the discussion above we had $n$ gaussians at $n$ points. The natural generalization is to now consider that we have one of these gaussians at every point in space. We introduce a spatial coordinate $\vec{q}$ and let $x(\vec{q})$ denote the value of the random variable at the point $\vec{q}$. The two point correlation function we introduced above is $$\xi_{ij} = \left\lt x(\vec{q}_i) x(\vec{q}_j)\right\gt$$ Let $\vec{r}_{ij}$ be s vector connecting the points $\vec{q}_i$ and $\vec{q}_j$, i.e. $\vec{r}_{ij} = \vec{q}_j - \vec{q}_i.$. Then $$\xi_{ij} = \left\lt x(\vec{q}_i) x(\vec{q}_i + \vec{r}_{ij})\right\gt$$ so the two point function can be though of depending on the point $\vec{q}_i$ and the separation $ \vec{r}_{ij}$ (why this is useful will become clear below). In cosmology we think the cosmological principle applies: the Universe is statistically homogenous and isotropic. Lets see what this assumption implies for the two-point function of cosmological observables. Homogenity implies translational invariance so $\xi_{ij}$ cannot depend on $\vec{q}_i$ (otherwise there would be special places in the Universe) and $\xi_{ij}$ must be a function of $\vec{r}_{ij}$ only. Isotropy implies rotational invariane so $\xi_{ij}$ cannot depend on the direction of $\vec{r}_{ij}$ (otherwise there would be special directions in our Universe). Thus $\xi_{ij}$ can only be a function of the norm of $\vec{r}_{ij}$, i.e. $$\xi_{ij} = \xi_{ij}(r_{ij})$$ where $r_{ij} = |\vec{r}_{ij}|$. We see that the assumption of statistical homognity and isotropy implies that the two point function only depends on the distance between points. In this course we won't deal much with the two-point function in real space, but we will mainly work in Fourier space. Why? Lets consider the Fourier transform of our random field $x(\vec{q})$ by writing $\hat{x}(\vec{k}) = \int d^3q e^{-i\vec{k}\cdot \vec{q}}x(\vec{q})$. Consider the two-point function in Fourier space which we define as (the complex conjugate is just for convenince) $$C_{ij} = \left\lt \hat{x}(\vec{k}_i) \hat{x}^*(\vec{k}_j)\right\gt = \int d^3 q_a d^3q_b e^{-ik_iq_a + ik_jq_b} \left\lt x(\vec{q}_a)x(\vec{q}_b) \right\gt = (2\pi)^3\delta(\vec{k}_i - \vec{k}_j)P(k_i)$$ where $P(\vec{k}_i) = \int d^3r e^{i\vec{k}_i\cdot \vec{r}} \xi(\vec{r})$ is the power-spectrum. Translation and rotation invariance impies that this only depends on the norm of $\vec{k}_i$. Notice the great simpification that has happened when we went to Fourier space. In real space we have a correlation between any two points $\xi_{ij}$ while in Fourier space the equivalent correlation matrix is diagonal so every mode is independent of all the other ones. The PDF is $$f(\hat{x}(\vec{k}_1), \hat{x}(\vec{k}_2), \ldots, \hat{x}(\vec{k}_n)) = \frac{1}{\sqrt{2\pi\prod_i P(k_i)}^n} e^{- \frac{1}{2} \sum_i \frac{|\hat{x}(\vec{k}_i)|^2}{P(k_i)}} = \prod_i \frac{1}{\sqrt{2\pi P(k_i)}} e^{- \frac{1}{2} \frac{|\hat{x}(\vec{k}_i)|^2}{P(k_i)}}$$ which is simply a product of $n$ individual gaussians. This gives an additional justification for why working in Fourier space is so neat: when dealing with gaussian random fields (which we expect the initial conditions of our Universe was) and when the cosmological principle applies then each of the modes are uncorrelated and can be treated independently. Later we will see another reason, the fact that in linear perturbation theory the different modes evolve completely independently (and in Fourier space PDEs become simple ODEs). This should be enough to convince you that Fourier space is awesome and the right tool for the job when it comes to studying the CMB.

    See Lectures on non-gaussianity by Eiichiro Komatsu (for which these notes are based on) and Lectures on gaussian random fields by M. Dijkstra for some more detailed discussion on the theory of gaussian random fields and also about non-gaussianity which is an active research field in cosmology.


    Lets start with giving the definitions and then coming back to what it means. Given a 3D random field $f(\vec{x})$ we can expand it in fourier modes $$f(\vec{x}) = \int \frac{d^3k}{(2\pi)^3} e^{i\vec{k}\cdot\vec{x}}\hat{f}(\vec{k})$$ The power-spectrum $P(k)$ is defined as $$\left\lt \hat{f}(\vec{k})\hat{f}(\vec{k}')\right\gt = (2\pi)^3\delta^{(3)}(\vec{k} + \vec{k}')P(k)$$ where the left hand side is the ensamble mean, i.e. the mean over many different realisations of the random field. If the field $f$ is real then $\hat{f}^*(\vec{k}') = \hat{f}^*(-\vec{k}')$ so we also have the equivalent definition $$\left\lt \hat{f}(\vec{k})\hat{f}^*(\vec{k}')\right\gt = (2\pi)^3\delta^{(3)}(\vec{k} - \vec{k}')P(k)$$ where the left hand side for $\vec{k} = \vec{k}'$ is $|\hat{f}(\vec{k})|^2$ so the power-spectrum is nothing but the norm of the fourier coefficients. Roughly speaking this tells us how large fluctuations there are in the field $f$ on different scales. For example a guassian random field with a flat power-spectrum $P = $ constant don't have any features with a particular size (white noise). Likewise given a field on the surface of a 3D sphere $f(\hat{n})$ (where $\hat{n}$ denotes the direction or equivalently the spherical angles) then we can expand it in spherical harmonics $$f(\hat{n}) = \sum_{\ell=0}^\infty\sum_{m=-\ell}^\ell a_{\ell m} Y_{\ell m}(\hat{n})$$ The power-spectrum $C_\ell$ is here defined to be $$\left\lt a_{\ell m} a^*_{\ell' m'}\right\gt = \delta_{\ell\ell'} \delta_{m m'}C_\ell$$ where again the left hand side is an ensamble mean.

    Why is this relevant? When dealing with random fields we cannot predict the value in any certain place as its, well, random. The key observables are instead correlation functions: how are the values of the field at different separations correlated. All the statistical information contained in the field (that tells us about the PDF "that was used" to generate it, i.e. the physical stocastic processes) is contained in the full set of $N$-point correlation functions. The simplest one is the two-point correation function and its fourier transform the power-spectrum. These are related via $$\xi(r) = \int_0^\infty \frac{\sin(kr)}{kr} \frac{k^3P(k)}{2\pi^2} \frac{dk}{k}$$ $\xi(r)$ has a more direct interpretation. For example if $f$ is the galaxy density field and you have a galaxy at some position then $\xi(r)d^3r$ is the excess probabillity, compared to the case where galaxies where uniformly distributed in space, of finding another galaxy at a distance $[r,r+dr]$ from the first one. We will mainly focus on (the matter) $P(k)$ in this course. The power-spectum of a field describes the amount of fluctuations in that field as a function of scale. If we are given a field $f(\vec{x})$ whose fourier transform is $\hat{f}(\vec{k})$ then the definition above tells us we can estimate the power-spectrum as $$P(k) = \frac{1}{N}\sum_{k - \Delta k/2 \lt |\vec{k}| \lt k + \Delta k/2} |\hat{f}(\vec{k})|^2$$ where $N$ is the number of elements in the sum so this is nothing but the mean of all fourier modes corresponding to a given wave-number $k$. The initial conditions of our Universe are believed to be very close to that of a gaussian random field. A perfectly gaussian random field has the convenient property that all the statistical information we can learn about the field is contained in just the power-spectrum (higher order moments are functions of the power-spectrum). This is therefore a very important observable! We will deal with two kinds of power-spectra in this course: the (3D) power-spectrum of the dark matter density field (which we can probe with galaxy and weak lensing surveys) and which is defined as above with $f = \delta(k)$ being the density contrast. The second one is the (2D) angular power-spectrum of the temperature fluctuations $\delta T$ of the microwave background radiation. For the former we consider the square of the density contrast in fourier space while for the latter we consider the square of the spherical harmonics coefficients (which is basically the equivalent of the fourier coefficients, but on the surface of a sphere). Note that the theoretical definition involves the ensamble mean: the mean over many realisation of our Universe. However as you might have noticed, we only have one Universe to do our measurements in! This is where the ergodic hypotesis enters: ensamble average is equal to a spatial average taken over one realisation of the field. For a gaussian random field this is a theorem so its not a big issue as long as we have enough modes (number of $\vec{k}$'s) for a given scale $k = |\vec{k}|$ to estimate the power-spectum from; then the mean of the square of these modes will be close to the theoretical spectrum it was drawn from. For modes the size of the horizon today $k \sim H_0$ we can only observe a few modes (and for larger scales - smaller $k$ - we simply cannot measure any modes). This leads to what we call cosmic variance: there is a fundamental uncertainity associated with what we can say about the largest modes in the density field or in the CMB temperature fluctuations no matter how good the observations are - we are simply limited by statistics (or if you will: by the lack of having more Universes to measure things in). We will come back to this more thoroughly in later lectures.

    Spherical harmonics

    Spherical harmonics $Y_{\ell m}$ (the indices here are just indices - its not a tensor and it does not matter if they are up or down) are orthogonal basis functions for functions defined on a sphere (Video: Spherical harmonics animation). They are eigenfunctions of the square of the angular momentum operator $\hat{L} = {\bf x}\times \nabla$ (i.e. $\hat{L}^2 Y_{\ell m} = \ell(\ell+1)Y_{\ell m}$) and pop up when trying to solve Laplace equation in spherical coordinates using separation of variables. These properties makes them the natural choice when we are going to expand the CMB temperature field. Let's review some properties (this depends on the particular convention one uses - there are different ways to normalize the $Y$'s so you might see different formulas in different places):

    Given a function $f(\hat{n})$ where $\hat{n} = (\theta,\phi)$ denotes a point on the surface of a sphere then we can expand $$f(\hat{n}) = \sum_{\ell=0}^\infty\sum_{m=-\ell}^\ell a_{\ell m} Y_{\ell m}(\hat{n})$$ where the numbers $a_{\ell m}$ (these are the equivalent of fourier modes in 3D) are given by $$a_{\ell m} = \int d\Omega_{\hat{n}} f(\hat{n})Y_{\ell m}^*(\hat{n})$$ and $d\Omega_{\hat{n}} = \sin\theta d\theta d\phi$ is the usual spherical surface element. This formula is a direct consequence of the orthogonality property $$\int d\Omega_{\hat{n}} Y_{\ell m}(\hat{n}) Y^*_{\ell' m'}(\hat{n}) = \delta_{\ell\ell'}\delta_{mm'}$$ which can be through of as the equivalent of the property $\frac{1}{2\pi}\int_0^{2\pi} e^{in x} e^{i m x} dx = \delta_{n,-m}$ for Fourier series (or $\int e^{ikx} e^{ik'x}dx = 2\pi \delta^{(3)}(k+k')$ for the fourier transform).

    An explicit expression for $Y_{\ell m}$ (which we will not need very much): $$Y_{\ell m}(\theta,\phi) = (-1)^m\sqrt{\frac{(2\ell+1)(\ell-m)!}{4\pi(\ell+m)!}}P_{\ell m}(\cos(\theta))e^{im\varphi}$$ The expansion of Legendre polynomials in spherical harmonics: $$\mathcal{P}_\ell(\hat{a}\cdot \hat{b}) = \frac{4\pi}{2\ell+1}\sum_{m=-\ell}^\ell Y_{\ell m}(\hat{a}) Y^*_{\ell m}(\hat{b})$$ where $P_{\ell m}$ are associated Legendre polynomials and $\hat{a},\hat{b}$ two unit vectors. When $m=0$ we have $P_{\ell m} = \mathcal{P}_{\ell}$ the usual Legendre polynomials (the ones we used when expanding the temperature perturbations in the Boltzmann equation in multipoles).

    Another useful relation that we will need (which follows from the above is) $$\int d\Omega_{\hat{n}} \mathcal{P}_\ell(\hat{k}\cdot\hat{n}) Y_{\ell'm}^*(\hat{n}) = \delta_{\ell\ell'} \frac{4\pi}{2\ell+1} Y_{\ell m}^*(\hat{k}) $$ where $\hat{k},\hat{n}$ are two unit vectors. A word on notation: we will sometimes write $Y_{\ell m}(\theta,\phi)$ and sometimes $Y_{\ell m}(\hat{n})$ where $\hat{n}$ is some direction (unit vector). These are equivalent and with the latter we simply mean the first where the angles $(\theta,\phi)$ are those associated with $\hat{n}$ (i.e. the angles we get when $\hat{n}$ is written in spherical coordinates).

    Spherical Bessel Functions

    When doing the line of sight integration (for a flat geometry) we encountered the integral $$j_\ell(x) = \frac{i^\ell}{2}\int_{-1}^1 P_\ell(\mu) e^{-i\mu x} d\mu$$ This is called the spherical bessel function. From the definition above its seen to just be the expansion coefficients when we expand the function $e^{-i\mu x}$ in Legendre polynomials: $$e^{-i\mu x} = \sum_{\ell = 0}^\infty \frac{2\ell + 1}{i^\ell}j_\ell(x)P_\ell(\mu)$$ The simplest ones are $$j_0(x) = \frac{\sin(x)}{x}$$ $$j_1(x) = \frac{\sin(x)}{x^2} - \frac{\cos(x)}{x}$$ The general behavior of this function is to grow from zero towards a maximum around $x\sim \ell$ and then starts to osscillate (with period $\sim 2\pi$) with decaying amplitude ($\propto 1/x$) for large $x$. There are some more info about this function in the numerical notes.

    Lagrangian description of General Relativity

    One way of getting the field equations of General Relativity is to put up an action functional and have the field equation following from the principle of least action just as is common to define theories in field theory. The Lagrangian we need for this is the so-called Einstein-Hilbert lagrangian $L = \frac{R}{16\pi G} \sqrt{-g}$ and the corresponding action $$S = \int [\frac{R}{16\pi G} + \mathcal{L}_{\rm matter}(g_{\mu\nu})]\sqrt{-g}d^4x$$ where $\mathcal{L}_{\rm matter}(g_{\mu\nu})$ is the lagrangian of all energy components of the Universe (which depends on the metric, for example a massless scalar-field has $\mathcal{L} = \frac{1}{2}g^{\mu\nu}\frac{\partial \phi}{\partial x^\mu}\frac{\partial \phi}{\partial x^\nu}$). Performing a variation $\delta g^{\mu\nu}$ of the metric the action changes as (there are many important details missing here, see a proper textbook like Carroll's or the link above for a full derviation) $$\delta S = \int [\frac{R_{\mu\nu}}{16\pi G} - \frac{Rg_{\mu\nu}}{32\pi G} + \frac{1}{\sqrt{-g}}\frac{\partial(\sqrt{-g}\mathcal{L}_{\rm matter}(g_{\mu\nu}))}{\partial g^{\mu\nu}}]\delta g^{\mu\nu}\sqrt{-g}d^4x$$ Requiring a stationary action $\delta S = 0$ and defining the energy-momentum tensor as $T_{\mu\nu} \equiv \frac{2}{\sqrt{-g}}\frac{\partial(\sqrt{-g}\mathcal{L}_{\rm matter}(g_{\mu\nu}))}{\partial g^{\mu\nu}}$ then we get $$R_{\mu\nu} -\frac{1}{2}Rg_{\mu\nu} = 8\pi G T_{\mu\nu}$$ Now why would we do it this way? First of all this is how the other theories (the standard model) is formulated so it puts General Relativity into the same formulation as the rest of the fundamental physical theories we have (which is also a formulation that is very useful if one wants to try to quantize gravity just as we do for the other forces). This formulation also allows us to much more easily propose other theories of gravity instead of "guessing" field equations. For example we could replace $R$ by a general function $f(R)$ and voila we have a modified gravity theory ($f(R)$ gravity, see also Modified gravity and Cosmology) that we can derive field equations for. The free function can then be designed to only be relevant in certain regimes (e.g. high or low Ricci curvature $R$) like for example $f(R) = R + \alpha R^2$ is a simple model proposed for getting inflation (the Starobinski model). Working at the level of the action also allows us to much more easily ensure that certain symmetries (like the symmetries that underlies the standard model) are respected by the mode. Even if you don't want to work with model building and is perfectly happy to only do "practical GR" its still good to know about this formulation as you might encounter it in talks and research papers.