## Hardy and the zeta function, Part 2

After that brief hiatus, we return to the proof of Hardy’s theorem that the Riemann zeta function has infinitely many zeros on the real line; probably best to go and brush up on part one first.

Two things remain to complete the proof; to show that

$\displaystyle \left\lvert\int_T^{2T}W(t)Z(t)\,\mathrm{d}t\right\rvert \ \ \ \ \ (1)$

is small and that

$\displaystyle \int_T^{2T}W(t)\lvert Z(t)\rvert\,\mathrm{d}t$

is large, for some suitably chosen kernel ${W:\mathbb{R}\rightarrow\mathbb{R}_+}$ and all large ${T\in\mathbb{R}}$.

For no particular reason, we’ll deal with the latter part first. By our choice of ${Z(t)}$ we need to give a lower bound for

$\displaystyle \int_T^{2T}W(t)\lvert \zeta(1/2+it)\rvert\,\mathrm{d}t. \ \ \ \ \ (2)$

Of course, it’s very difficult to estimate this integral without first deciding what our choice of kernel ${W(t)}$ is going to be. The reason we want to introduce the kernel is to help us estimate (1), by allowing us to control the cancellation. We can actually deal with the mean values of ${\lvert \zeta(1/2+it)\rvert}$ quite well, so all we need from the kernel at the moment is that it won’t ruin things too much. In particular, because we want to show that (2) is large, it’s enough to know that ${W(t)}$ doesn’t decay too quickly, and then we can just use the trivial bound.

Making a (somewhat) arbitrary choice about what this means, all that we’ll assume about the kernel ${W(t)}$ in this post is that it satisfies

$\displaystyle W(t)\gg t^{-1/4}$

so that we can bound (2) by

$\displaystyle \int_T^{2T}W(t)\lvert \zeta(1/2+it)\rvert\,\mathrm{d}t\gg T^{-1/4}\int_T^{2T}\lvert \zeta(1/2+it)\rvert\,\mathrm{d}t.$

All that we need now is a bound such as:

$\displaystyle \left\lvert \int_{T}^{2T} \zeta(1/2+it)\,\mathrm{d}t\right\rvert \gg T. \ \ \ \ \ (3)$

Using the above and the triangle inequality we deduce that

$\displaystyle \int_T^{2T}W(t)\lvert Z(t)\rvert\,\mathrm{d}t\gg T^{3/4}$

which is good enough combined with the upcoming estimates for (1) to yield Hardy’s theorem.

Great! All we need to do now is prove (3), a mean-value estimate for the zeta function. This is classical analytic number theory stuff, and there is a standard approach to dealing with integrals involving the Riemann zeta function: using Cauchy’s theorem move the path of integration across to the half-plane ${\Re(s)>1}$, picking up some acceptable error along the way, and then using the nice Dirichlet series for the zeta function in that half-plane to do things explicitly.

Let ${\mathcal{C}_1}$ be the path from ${1/2+2Ti}$ to ${2+2Ti}$ and let ${\mathcal{C}_2}$ be the path from ${2+Ti}$ to ${1/2+Ti}$. Since ${\zeta(s)}$ only has a pole at ${s=1}$, and as long as ${T>0}$ we certainly avoid that, it follows from Cauchy’s theorem that

$\displaystyle \int_{T}^{2T} \zeta(1/2+it)\,\mathrm{d}t=\int_{T}^{2T} \zeta(2+it)\,\mathrm{d}t-\left(\int_{\mathcal{C}_1}+\int_{\mathcal{C}_2}\right)\zeta(s)\,\mathrm{d}s.$

Let’s first deal with the ‘error terms’ here. Since we’re happy with a lower bound of order ${T}$ we can dispatch these error terms using the trivial bound as long as we knew that

$\displaystyle \zeta(\sigma+iT)\ll T^{1/2},$

uniformly for ${1/2\leq \sigma\leq 2}$, say. Of course, we need something about the zeta function beyond its Dirichlet series representation here, but we don’t need much, just the identity

$\displaystyle \zeta(s)=\sum_{n\leq x}n^{-s}+\frac{x^{1-s}}{s-1}+\frac{\{x\}}{x^s}-s\int_{x}^\infty \{ u\} u^{-s-1}\,\mathrm{d}u,$

valid for any ${\sigma>0}$ and ${x>0}$ (assuming ${s\neq 1}$). This identity comes from the Dirichlet series for $zeta(s)$ using partial summation – see any text on analytic number theory for details.

We now bound everything very crudely for ${s=\sigma+iT}$ where ${1/2\leq \sigma\leq 2}$, giving

$\displaystyle \zeta(s)\ll \sum_{n\leq x}n^{-1/2}+\frac{x^{1/2}}{T}+\frac{1}{x^{1/2}}+Tx^{-1/2}\ll x^{1/2}+Tx^{-1/2}$

and we’re done by choosing ${x=T}$. That’s the error terms dealt with, so it only remains to show that

$\displaystyle \int_{T}^{2T} \zeta(2+it)\,\mathrm{d}t\gg T.$

Finally we’re on safe ground, since we’re only dealing with the zeta function in the half-plane ${\Re(s)>1}$, where we have a Dirichlet series and everything is rosy. So

$\displaystyle \int_{T}^{2T} \zeta(2+it)\,\mathrm{d}t=\sum_{n=1}^\infty \int_T^{2T}n^{-2-it}\,\mathrm{d}t= T+O(1)$

where the ${T}$ term comes from the ${n=1}$ summand and the other summands are bounded above by an absolute constant.

That’s it, we’re done. To recap, we’ve shown that

$\displaystyle \int_T^{2T}W(t)\lvert Z(t)\rvert\,\mathrm{d}t\gg T^{3/4}$

using only the assumption that ${W(t)\gg t^{-1/4}}$ and some quite crude information about the Riemann zeta function – all that one needs is knowledge of Cauchy’s theorem, the Dirichlet series representation for the zeta function and a dash of partial summation. We’ll need slightly more for the rest of the proof in the next post, but for now, Hardy’s theorem is not as formidable as it might seem!

## Symmetrization

We call a random variable ${X}$ (which is just a measurable function ${X:\Sigma\rightarrow\mathbb{R}}$ where ${\Sigma}$ is a probability measure space) symmetric if ${X}$ is identically distributed to ${-X}$. Symmetric random variables are often a lot more pleasant to handle. Much of what is true for symmetric random variables is also true in general, but both the proofs and statements are lot more complicated. For this reason, it would be very useful to know that we can always assume, in some sense, that a random variable is symmetric. This process, finding a related random variable which is symmetric, is known as symmetrization. With a good understanding of this process many proofs in probability become vastly simpler: one can first reduce to the symmetric case, and then just do it by hand.

Fix a random variable ${X}$. One obvious related random variable which is symmetric is ${X-X'}$, where ${X'}$ is a random variable identical in distribution to ${X}$. We denote this by ${X^s}$, and it is clearly symmetric. Furthermore, note that ${\mathbb{E}(X^s)=0}$.

Not only is ${X^s}$ symmetric, it is also closely related to the original random variable ${X}$, as its definition suggests. For example, for every ${x}$ and ${a}$ we have

$\displaystyle \begin{array}{rcl} \mathbb{P}(\lvert X^s\rvert\geq x)&=&\mathbb{P}(\lvert X-a-(X'-a)\rvert\geq x)\\ &\leq&2\mathbb{P}(\lvert X-a\rvert\geq x/2)\end{array}$

and so the tail probability for ${X}$ is roughly bounded below by that of ${X^s}$. If ${\mathbb{E}(X)=0}$ originally, then ${X}$ and ${X^s}$ are particularly strongly related. For example, for any ${r\geq 1}$ and real number ${x}$,

$\displaystyle \lvert x\rvert^r=\lvert x+\mathbb{E}(X)\rvert^r\leq \mathbb{E}\lvert x+X\rvert^r,$

and hence in particular,

$\displaystyle \mathbb{E}\lvert X\rvert^r\leq\mathbb{E}\lvert X^s\rvert^r.$

As a quick illustration of the power of symmetric random variables, we mention the following.

Lemma 1 If ${X}$ and ${Y}$ are independent random variables with finite moments of order ${r\geq 2}$ and ${Y}$ is symmetric then

$\displaystyle \mathbb{E}\lvert X\rvert^r+\mathbb{E}\lvert Y\rvert^r\leq\mathbb{E}\lvert X+Y\rvert^r.$

To prove this, we use a consequence of Clarkson’s inequality: for any ${x,y\in\mathbb{R}}$ we have

$\displaystyle \lvert x+y\rvert^r+\lvert x-y\rvert^r\geq 2(\lvert x\rvert^r+\lvert y\rvert^r).$

The lemma follows after taking expectations of both sides and noting that ${X+Y}$ is identically distributed to ${X-Y}$.

Posted in Uncategorized | 1 Comment

## Every large number is the sum of 13 cubes

An important function that arises from Waring’s problem is the the following. For any ${k}$, let ${G(k)}$ be the least ${s}$ such that, for all ${n}$ sufficiently large (how large can depend on ${k}$ and ${s}$) there exist positive integers ${n_1,\hdots,n_s}$ such that

$\displaystyle n=n_1^k+\cdots+n_s^k.$

Waring’s problem itself says that ${G(k)}$ actually exists for all ${k\geq 2}$. In other words, past some (possible very large) integer, all integers are the sum of at most some constant number of ${k}$th powers. The classical case is Lagrange’s theorem, which states that ${G(2)=4}$. In fact, we can make the stronger statement that all integers, not just the large ones, are the sum of at most four squares.

In this post, I’ll give a nice bound for ${G(3)}$ (and hence, in particular, ${G(3)}$ actually exists). I’ll be following section 21.2 of Hardy and Wright.

Theorem 1

$\displaystyle G(3)\leq 13.$

In what follows, ${z}$ will always denote an integer congruent to 1 modulo 6. Define

$\displaystyle \phi(z)=11z^9+(z^3+1)^3+125z^3=12z^9+3z^6+128z^3+1,$

and

$\displaystyle \psi(z)=14z^9.$

For large ${z}$ we have ${\phi(z+6)<\psi(z)}$ — a quick computation shows that this is true for ${z\geq 115}$, for example. It follows that if we define the interval ${I_z=[\phi(z),\psi(z)]}$ then these overlap, and so for ${z\geq 115}$ these intervals overlap. In particular, every ${n\geq 10^{20}>14(115^9)}$ is in one of these intervals.

It suffices to show, therefore, that if ${n\in I_z}$ for some fixed ${z}$ then we can write ${n}$ as the sum of at most ${13}$ cubes.

Let ${r}$, ${s}$, and ${N}$ be the smallest positive integers such that

$\displaystyle n\equiv 6r\pmod{z^3},$

$\displaystyle n\equiv s+4\pmod{6},$

and

$\displaystyle N=(r+1)^5+(r-1)^3+2(z^3-r)^3+(sz)^3.$

In particular, note that ${N}$ is the sum of 5 cubes. Furthermore,

$\displaystyle 8z^9<\phi(z)-N\leq n-N<14z^9.$

Also,

$\displaystyle N\equiv (r+1)^3-(r-1)^3-2r^3\equiv 6r\equiv n\equiv n-8z^9\pmod{z^3}.$

Similarly, ${N\equiv n-8z^9\pmod{6}}$. Hence ${n-N-8z^9}$ is a multiple of ${6z^3}$, so we can write

$\displaystyle n=N+8z^9+6mz^3$

for some ${0. If ${m}$ were some constant, we could stop here having shown that ${G(3)\leq 12}$, but unfortunately ${m}$ could be increasing with ${z}$, and hence ${n}$, so a further reduction is required.

Here we use the fact mentioned above, that ${G(2)\leq 4}$. Hence we can write ${m=x_1^2+x_2^2+x_3^2+x_4^2}$. Plugging this into the above and doing some elementary algebra proves the identity

$\displaystyle n=N+\sum_{i=1}^4\left( (z^3+x_i)^3+(z^3-x_i)^3\right).$

We have written ${n}$ as the sum of 13 cubes, and all that remains is to check that these are positive cubes — this follows from the fact that ${0\leq x_i, so here is where we needed ${0. The proof is complete.

A couple of final notes:

1) By checking all ${n<10^{20}}$ by hand, or by being clever first and then checking by hand, it can be shown that in fact every integer is the sum of at most 13 cubes.

2) The right answer is ${G(3)=9}$. This can be proved in a way similar to the above if we assumed that every integer which is not of the form ${4^a(8m+7)}$ is actually the sum of three squares. This is true, but difficult.

## Hardy and the zeta function, Part 1

Today we’ll begin with an interesting and important theorem about the Riemann zeta function. Recall that this is defined when ${\Re(s)>1}$ by the absolutely convergent series

$\displaystyle \zeta(s)=\sum_{n=1}^\infty n^{-s}.$

This can be analytically continued to a function meromorphic on the entire complex plane, holomorphic everywhere except for a simple pole at ${s=1}$. Also, we know that it obeys the following functional equation whenever ${s\neq1}$:

$\displaystyle \zeta(s)=\zeta(1-s)2^s\pi^{s-1}\Gamma(1-s)\sin(\pi s/2).$

It isn’t hard to show that ${\zeta(s)\neq0}$ whenever ${s>1}$, and hence the functional equation shows that apart from possible zeros in the critical strip ${0\leq s\leq 1}$ the only zeros of ${\zeta}$ are the so-called ‘trivial zeroes’ at ${s=-2,-4,\hdots}$.

The functional equation also shows that zeros are symmetrically distributed about the line ${\Re(s)=1/2}$. The infamous Riemann hypothesis conjectures that this symmetry collapses so that all non-trivial zeros of ${\zeta}$ occur on the line ${\Re(s)=1/2}$.

This is a whirlwind summary, and we’ll introduce or elaborate on other properties of ${\zeta}$ as they are needed. It is not trivial that ${\zeta}$ should have any zeros in the critical strip at all, though it actually does. We now know the first ${10^{13}}$ or so zeros, and sure enough, they all lie on the critical line ${\Re(s)=1/2}$.

It had been shown soon after its introduction that ${\zeta}$ had infinitely many non-trivial zeros. Very little was known about the distribution of these zeros in regards to the critical line was known until 1914 when Hardy proved the following.

Theorem 1 There are infinitely many zeros of ${\zeta}$ on the critical line ${\Re(s)=1/2}$.

This is a result seemingly more often quoted than proved, and the proof is rather technical and difficult. I have always found it astounding, however, and intrigued by what methods could accomplish this feat. Hence I shall blog the proof of this theorem in a ‘top down’ fashion: first giving the outline of the proof, and then filling in the technical lemmas as needed. I will be following closely the proof given in Montgomery and Vaughan’s “Multiplicative Number Theory” in chapter 14.

Firstly, we note that the function we are really bothered about for this theorem is the function of the real variable ${t}$ defined by

$\displaystyle \zeta(1/2+it).$

After all, we’re not going to even attempt to discover anything about ${\zeta}$ away from this critical line. If we can show that this function of ${t}$ has infinitely many zeros, then we’re done. This should be easier for a start, since now we’re only dealing with a function of a real variable rather than a complex variable.

But, unfortunately, this function is not always real-valued itself. Looking back at the functional equation, we can see why — ${\zeta}$ has a fundamentally ‘asymmetric’ nature. To fix this, we recast the functional equation by first defining

$\displaystyle \xi(s)=\frac{s(s-1)}{2}\zeta(s)\Gamma(s/2)\pi^{-s/2},$

and then noting that after some analytic manipulation the functional equation can be rephrased as

$\displaystyle \xi(s)=\xi(1-s)$

for all ${s\in\mathbb{C}}$. In particular, since we also have the reflection property ${\xi(\overline{s})=\overline{\xi(s)}}$, we see that this functional equation implies that ${\xi(1/2+it)}$ is real for all real ${t}$. Furthermore, ${\xi(1/2+it)}$ has a zero at ${t\in\mathbb{R}}$ if and only if ${\zeta(1/2+it)}$ does.

It suffices, then, to show that the function ${\xi(1/2+it)}$ has infinitely many zeros ${t\in\mathbb{R}}$. This is already much easier to handle, since it is a function from ${\mathbb{R}}$ to ${\mathbb{R}}$. This is nearly the actual function we will consider. However, in some ways ${\zeta}$ remains easier to analyse than ${\xi}$, and so we’ll normalise so that the absolute value of the function always agrees with that of ${\zeta(1/2+it)}$.

Putting all this together, we can finally define the Hardy ${Z}$-function as

$\displaystyle Z(t)=f(t)\xi(1/2+it)$

where the function ${f:\mathbb{R}\rightarrow\mathbb{R}}$ is chosen so that ${\lvert Z(t)\rvert=\lvert\zeta(1/2+it)\rvert}$ for all ${t\in\mathbb{R}}$. We can be explicit, looking at the definition of ${\xi}$, and define

$\displaystyle Z(t)=\zeta(1/2+it)\frac{\Gamma(1/4+it/2)\pi^{-1/4-it/2}}{\lvert\Gamma(1/4+it/2)\pi^{-1/4-it/2}\rvert}.$

The function ${Z(t)}$ will change sign ${\gamma}$ if and only if ${\zeta(1/2+i\gamma)}$ has a zero at ${1/2+i\gamma}$ of odd multiplicity. There is an easy way to detect when a function has a sign change in the interval ${[T,2T]}$ — if and only if the inequality

$\displaystyle \left\lvert\int_T^{2T}Z(t)\,\mbox{d}t\right\rvert<\int_T^{2T}\lvert Z(t)\rvert\,\mbox{d}t$

is sharp. If we could show that was true for arbitrarily large ${T}$, we’d be done. This is possible, but hard.

To make things easier, we will first multiply by some suitably chosen kernel ${W(t)}$. We’ll make sure that ${W(t)>0}$ for all ${t\in\mathbb{R}}$, so it won’t mess things up. It will hopefully make the analysis simpler.

We’ll choose ${W(t)}$ later. For now, remember that we’ve boiled down the proof to deriving a contradiction from the assumption that for all sufficiently large ${T}$,

$\displaystyle \left\lvert\int_T^{2T}W(t)Z(t)\,\mbox{d}t\right\rvert=\int_T^{2T}W(t)\lvert Z(t)\rvert\,\mbox{d}t.$

Posted in Uncategorized | 3 Comments

## Szemerèdi Regularity Lemma

Welcome to the first in our series about Great Lemmas of Mathematics. We’ll start with one of the greats from combinatorics, perhaps the most useful tool in combinatorics and graph theory: the Szemerèdi Regularity Lemma (SRL). This was first proved by Szemerèdi (unsurprisingly) in his 1975 paper “On sets of integers containing no $k$ elements in arithmetic progression” as part of his proof of what is now known as Szemerèdi’s theorem – a cornerstone of additive combinatorics and one which I’m sure we’ll come back to in a future post.

This lemma turned out to be very powerful and versatile, and new and exciting results which use SRL have appeared at regular intervals ever since. Gowers has compared the SRL to playing the same role in combinatorics as the classification of finite simple groups has in finite group theory, providing a rough classification of all graphs, and offering a strategy for proving results in (finite) graph theory: first show it’s true for all graphs which obey very nice structural properties, then use the SRL to show that any graph behaves like one of these.

To state it precisely, we need to set up some notation. First, recall that a graph is just a finite set (called vertices) together with a set of pairs of vertices (called edges). We suppose that our graphs have no loops (i.e. edges connecting a vertex with itself) and that the edges aren’t directed or have any special weights attached to them.

Suppose we have two sets of vertices $U$ and $V$. Define the density $d(U,V)$ between $U$ and $V$ to be the total number of edges between them divided by the total number of edges possible between them (which is $\lvert U\rvert\lvert V\rvert$).  For any parameter $\epsilon>0$ we say that $U$ and $V$ are $\epsilon$-regular if we have

$\lvert d(U',V')-d(U,V)\rvert<\epsilon$

whenever $U'$ and $V'$ are subsets of $U$ and $V$ respectively with sizes at least $\epsilon\lvert U\rvert$ and $\epsilon\lvert V\rvert$.

Szemerèdi Regularity Lemma

For any graph $\epsilon>0$ there exists a $K(\epsilon)$ such that any graph $G$ with vertex set $V$ can be broken into $k\leq K$  chunks $V_1,\hdots, V_k$ such that the following hold:

1. $\lvert V_i\rvert\leq \epsilon \lvert V\rvert$,
2. All the $V_i$ differ in size by at most one element from one another, and
3. All except $\epsilon k^2$ of the pairs $(V_i,V_j)$ are $\epsilon$-regular.

What does this really say? How this is usually used is to first consider the graph you get by letting each vertex chunk $V_i$ be a single vertex and joining two vertices if their corresponding chunks are an $\epsilon$-regular pair. You can then use regularity to deduce that if the collapsed graph contains, for example, a certain subgraph (such as a triangle) then the original graph must as well. Edges inside the vertex chunks, as well as between non-regular pairs form only a tiny proportion of the total number of edges so they don’t really matter.

A picture, however, is worth a thousand words, even it is made with my rudimentary paint skills.

I haven’t actually said why being a regular pair is a good thing to be, however, or what this really means. The idea is that a regular pair behaves like a bipartite graph where the edges are assigned randomly with some fixed probability: in other words, if you zoom in on any two bits of either chunk then they look like what you started with. Local behaviour resembles global behaviour with $\epsilon$-precision.

To see why this is a good thing to have, let’s look at the example hinted at earlier: finding triangles inside our graph. Suppose we have a triangle of regular pairs, a triangle in the collapsed graph mentioned above. Also suppose that our graph has quite a few edges to begin with, so that the edge density between these regular pairs is fairly high. It follows that there must be lots of B-C edges which share a vertex with some A-B edge (if our chunks are labelled A, B and C). Now consider the set of A vertices which are on some A-B edge, and the set of C vertices which are on one of the B-C edges connected with an A-B edge. Both of these sets are large, and by regularity, there must be lots of edges going between these two sets. In particular, there must be some edge which joins up a pair of edges to form a triangle.

That makes a lot less sense than I intended it to do. Perhaps it’s better digested in the form of a more formal proof, which will be coming soon, when I describe one of the most well-known uses of the SRL in the Triangle Lemma.

A final remark: it is crucial that the parameter $K$, which controls how many chunks are in this regular decomposition, depends only on $\epsilon$ and not on the size of the graph $G$. Otherwise, we could decompose the graph $G$ into chunks which each only contain a single vertex, and these pairs are all regular since there is nothing local to zoom in on. The more chunks we have, the worse our arguments become, so we don’t want too many.

Here’s the kicker: the proof of the SRL gives an upper bound for $K$ which is of tower height. In other words, a tower of exponents $2^{2^{2^{\cdots}}}$ which is of height roughly $1/\epsilon$. Even for fairly large $\epsilon$, this is massive, and so any results which use the SRL have terrible quantitative bounds, such as the ones Szemerèdi’s theorem had before Gowers found an alternative proof.

It is often possible to replace the use of the SRL with different arguments, and desirable for precisely this reason. One might hope that maybe we can improve the SRL directly, however, and get good (say, exponential) bounds instead. Gowers has shown that this is impossible, and constructed a graph that needs such a ridiculous number of chunks in any regular decomposition. Again, I hope to give this construction in a future blog post, since it really is amazing.

Enough for now. Hopefully these vague comments will make more sense when we actually start talking about such things formally.

Posted in Uncategorized | 2 Comments

## Rudin’s inequality

After that whirlwind tour through Khintchine’s inequality, let’s take a look at the less well-known Rudin’s inequality. This says (at least in one form) that, given a finite abelian group $G$ and a function $f:G\to\mathbb{C}$ if the support of the Fourier transform of $f$ is entirely contained inside a dissociated set (we’ll come to what that means later) then for any $2\leq p<\infty$ we have

$\| f \|_p\ll \sqrt{p}\| f\|_2.$

Oh ho ho, this looks suspiciously similar to our form of Khintchine’s inequality from the last post:

$\| \sum\epsilon_na_n\|_p\ll \sqrt{p}\| \sum\epsilon_na_n\|_2.$

What are the differences? Well, in Rudin’s inequality we deal with functions with Fourier support contained inside a dissociated set, and we’re taking $L^p$ norms over the group $G$. In Khintchine’s inequality, however, the $L^p$ norms are now being taken over the probability space, and our functions take in something from our probability space and spit out a distribution of signs.

Using the Fourier inversion formula, we can write out Rudin’s inequality in a way which highlights the similarity even more. Let $S$ denote the support of the Fourier transform of $f$. The left hand side of Rudin’s inequality becomes

$\left(\mathbb{E}_{x\in G}\left\lvert \sum_{\gamma\in S}\widehat{f}(\gamma)\gamma(x)\right\rvert^p\right)^{1/p},$

while the left hand side of Khintchine’s inequality is, if we denote our coefficients $a_n$ as $\widehat{f}(\gamma)$ instead, indexed over  $S$ (all we’re doing here is changing our notation),

$\left(\mathbb{E}\left\lvert \sum_{\gamma\in S}\widehat{f}(\gamma)\epsilon_\gamma\right\rvert^p\right)^{1/p}.$

The difference is now staring us in the face. All we’ve done in moving from Khintchine’s inequality to Rudin’s inequality is swapped our random variable $(\epsilon_\gamma)_{\gamma\in S}\in \{-1,1\}^{\lvert S\rvert}$ to a function $(\gamma(x))_{\gamma \in S}\in \mathbb{C}^{\lvert S\rvert}$.

In other words, the characters from a dissociated set behave like independent random variables taking values in $\{-1,1\}$. We will now try to make this heuristic precise, and deduce Rudin’s inequality as a corollary of Khintchine’s inequality. The following follows the proof of Rudin’s inequality given in [Gr1].

We first randomise the sum $f(x)= \sum_{\gamma\in S}\widehat{f}(\gamma)\gamma(x)$ by introducing a random assignment of signs to get $f_\epsilon(x)= \sum_{\gamma\in S}\epsilon(\gamma)\widehat{f}(\gamma)\gamma(x)$, where $\epsilon(\gamma)$ are independent random variables taking the values $\pm1$. Khintchine’s inequality then gives, for any fixed $x$,

$\mathbb{E}\lvert f_\epsilon(x)\rvert^p\leq (Cp)^{p/2}\| f\|_2^p.$

Taking the expectation over all $x\in G$ we get (taking $L^p$ norms over $G$ now)

$\mathbb{E}\| f_\epsilon\|_p^p\leq (Cp)^{p/2}\| f\|_2^p.$

By the pigeonhole principle there is some $\epsilon$ such that

$\| f_\epsilon \|_p\leq C\sqrt{p}\| f\|_2.$

The left hand side is nearly what we want; if we could replace the $f_\epsilon$ by $f$ we would have Rudin’s inequality. To do this, I’ll borrow a trick I found in [Gr1], but first I should say what dissociated means: the set $S$ is dissociated if and only if there are no non-trivial equations of the form $\gamma_1+\cdots+\gamma_k-\gamma_{k+1}-\cdots-\gamma_{l}=0$. This is useful because it allows us to write

$f(x)=2f_\epsilon(x)\ast\left(\prod_{\gamma\in S}\left(1+\frac{\epsilon_\gamma}{2}(\gamma(x)+\gamma(-x))\right)\right)$

for any distribution of signs $\epsilon$, as can be checked by expanding out the product and convolution. Furthermore, by the same argument we check that the product on the right has $L^1$ norm equal to 1. It follows that we can bound $\| f\|_p\leq 2\| f_\epsilon \|_p$ for any $p\geq2$ and we have proven Rudin’s inequality. Notice that here was where we really needed $p\geq 2$, to pass from a randomised version of $f$ to $f$ itself.

The ‘randomisation’ technique we used here was to use probabilistic arguments and the pigeonhole principle to show that the desired result was true for some random twist of the original function $f$, and then to argue that the original function is suitably controlled by every random twist of $f$. This latter condition is where we needed to invoke dissociativity.

To recap: Rudin’s inequality can be proved using randomisation plus Khintchine’s inequality, and dissociativity is invoked the control the randomisation part. We will soon see how an analogous sort of argument in physical space gives the powerful new method of Croot and Sisask.

## Khintchine’s inequality

It was recently suggested to me that the proof of Khintchine’s inequality is one which I should think about deeply, and I would especially like to explore the relationship between Khintchine’s and Rudin’s inequality. This will be the first of a series of short posts about these inequalities.

We begin with Khintchine’s inequality. We have a set of real numbers $a_n$ and a corresponding set of independent random variables $\epsilon_n$ taking the values $\pm1$ with equal probability. Khintchine’s inequality shows that we can control the sum $\sum \epsilon_na_n$ in any $L^p$ norm by the $L^2$ norm of the coefficients $a_n$. The slogan here is “changing the signs of a sum is well-behaved on average”.

More precisely, for any $0, we have the following bound

$\mathbb{E}\lvert\sum\epsilon_na_n\rvert^p\leq C_p\left(\sum a_n^2\right)^{p/2}.$

The constant $C_p$ depends only on $p$, and we shall obtain an explicit value below. The inequality is perhaps more suggestive if we observe that $\sum a_n^2=\| \sum\epsilon_na_n\|_2^2$, where the $L^2$ norm is taken over the probability space, so that we can write the inequality as

$\| \sum\epsilon_na_n\|_p\ll_p \|\sum\epsilon_na_n\|_2.$

Hence for random variables of this shape we have very good control over all $L^p$ norms.

We now give the proof, which is a surprisingly straightforward combination of exponential means and Markov’s inequality, polished off with a simple integral. First note that the independence of the $\epsilon_n$ combined with the elementary inequality $e^x+e^{-x}\leq 2e^{x^2/2}$ gives

$\mathbb{E}\left( e^{t\sum\epsilon_na_n}\right)\leq e^{\frac{t^2}{2}\sum a_n^2}.$

For any $\lambda>0$ applying Markov’s inequality and setting $t=\lambda/\sum a_n^2$ gives the inequality

$\mathbb{P}\left(\lvert \sum\epsilon_na_n\rvert\geq \lambda\right)\leq 2e^{-\lambda^2/2\sum a_n^2}.$

Recalling the distributional definition of $L^p$ norms, that is,

$\mathbb{E}\lvert f\rvert^p=p\int \lambda^{p-1}\mathbb{P}(\lvert f\rvert\geq \lambda)\,\textrm{d}\lambda,$

and making the substitution $\lambda=\left(2\sum a_n^2x\right)^{1/2}$ to do the integration, we compute that

$\mathbb{E}\left(\lvert \sum \epsilon_na_n\rvert^p\right)\leq 2^{p/2}p\Gamma(\frac{p}{2})\left(\sum a_n^2\right)^{p/2}$

and the proof is complete. Let me just record for future use the following explicit form which follows from the proof above together with Stirling’s formula:

$\| \sum\epsilon_na_n\|_p\leq C\sqrt{p}\|\sum\epsilon_na_n\|_2$

where $C$ is some absolute constant.

The proof above is taken from Thomas Wolff’s excellent lecture notes on harmonic analysis. Of course, the proof also holds for complex $a_n$ with appropriate modulus signs scattered about. Using duality and the fact that equality holds for $p=2$ we also get a similar lower bound.

Posted in Uncategorized | Tagged | 1 Comment

## What does additive structure mean?

Suppose we have a finite subset $A$ of an abelian group, and I told you it was additively structured. There are lots of different things this vague term could mean:

1. The set $A+A=\{ a+a' : a,a'\in A\}$ is not much bigger than $A$.
2. There are lots of quadruples $(a,b,c,d)\in A$ such that $a+b=c+d$ (the number of such quadruplets is called the additive energy, and I denote it by $E(A)$).
3. The Fourier transform of the characteristic function of the set, $\widehat{A}$, is lumpy and takes on large values quite often.
4. $A$ behaves like an additive substructure, such as a subgroup or an arithmetic progression.
5. $A$ contains lots of little structured bits, like arithmetic progressions or subgroups,

and more exotic things. One of the main themes of additive combinatorics, which could perhaps be taken as a definition of the area, is that these different definitions are all roughly equivalent to each other, and the most important theorems of the area can be roughly divided into two camps: those which provide such equivalences, and those which show that a given set is structured in one of the senses above.

The list above is still very vague. To make things more precise, we note that all of the different definitions are true in some trivial way. For example, we can trivially bound $\lvert A+A\rvert\leq \lvert A\rvert^2$, or $E(A)\leq \lvert A\rvert^3$, and so on. In general, we can measure how much better we can do than these trivial bounds, which gives us some parameter which measures how structured our set is. By showing rough equivalence, I mean that if we know that one type of structure is true for some parameter $K$ then all the others are with some parameters depending only on $K$.

Let’s explain one of the easiest examples to give you a flavour: showing that 1. implies 2. Suppose we know that $\lvert A+A\rvert= K\lvert A\rvert$ (we say that $K$ is the doubling constant for $A$), then, and our task is to give a lower bound for $E(A)$.

For any $x\in A+A$ let $r(x)$ denote the number of pairs $(a,a')\in A\times A$ such that $x=a+a'$. It is easy to see that

$\sum_{x\in A+A}r(x)=\lvert A\rvert^2,$

and so by Cauchy-Schwarz we get that

$\left(\sum_{x\in A+A}r(x)^2\right)\lvert A+A\rvert\geq \lvert A\rvert^4.$

But by expanding out the sum on the left hand side, we see that it is exactly equal to $E(A)$. Hence we get that

$E(A)\geq K^{-1}\lvert A\rvert^3$

and we are done. Notice that if $K=1$, which means that $A+A=A$ then we get the best possible $E(A)\geq \lvert A\rvert^3$ and if $K=\lvert A\rvert$ then we get the trivial $E(A)\geq \lvert A\rvert^2$. We leave it as an exercise for the reader to interpret these facts.

Here we have an easy and powerful equivalence: if 1. is true with parameter $K$ then 2. is true with parameter $K^{-1}$. In general, we can’t hope for equivalences this strong, and for some any kind of polynomial dependence on $K$ is impossible. We will return to showing other kinds of equivalences in future posts.