PhD course, Spring 2014
Random Graphs and Complex Networks
Federico Bassetti and Francesco Caravenna
Presentation
This course is an introduction to the mathematical modeling of random graphs.
Only a basic probabilistic background is required (finite and countable probability, discrete random variables), making the course accessible to every mathematical student. One of our aims is to illustrate some interesting ideas and techniques of modern probability theory in an accessible and relatively non-technical framework.
Here are some slides presenting the course.
Summary
In recent years there has been an increasing interest for real-world networks, such as the internet, collaboration and citation networks, social relations, biological and ecological networks. A remarkable fact is that such different examples show surprisingly similar large-scale features. For instance, these networks are "small worlds", meaning that typical distances among nodes grow slowly with the size of the network, and many of them are "scale free", meaning that the empirical distribution of the degrees (number of nodes connected to a given node) is almost independent of the size of the network, and shows a power-law decay. In order to explain such features, many efforts have been devoted to the study of mathematical models that go under the name of "random graphs".
One of the simplest way to produce a random graph, having {1, 2, ..., n} as the sets of nodes, is to toss a coin for each couple (i,j) of nodes, putting a connection between them if a head comes up (the coins are assumed to be independent, and to give head with the same probability p). This model was investigated already in the late 1950s by Erdos and Renyi, who showed that an interesting "phase transition" occurs: depending on how the probability p scales with n, the graph either presents a "giant connected component", of size proportional to n, or it consists of many "small disconnected ilands", of size at most log(n). Despite its apparent simplicity, the fine properties of this model are far from trivial and are amenable to a thorough mathematical analysis. This will be the main object of the first part of the course.
The Erdos-Renyi random graph is not a good model for many real-world networks, because it is not scale free. For this reason, in the last decades several alternative models have been proposed and investigated. We will focus on two of the most popular ones: the "configuration model", a random graph model in which the degrees are fixed beforehand and the nodes are connected in the most random way, and the "preferential attachment model", where the graph is built dynamically, adding nodes to the network in a sequential fashion, favoring connections among nodes with large degrees. Both classes of models lead to scale free and small world networks and one can prove precise asymptotic results for the typical distances and degree distributions. This will be the object of the second part of the course.
References
- The course will be mainly based on the lecture notes Random Graphs and Complex Networks by Remco van der Hofstad, available on the author webpage.
- Another relevant reference is the book Random Graph Dynamics by Rick Durrett, Cambridge Unversity Press (2006) (see here for more information).
Tentative Program
- Probabilistic Tools and Techniques: random variables, coupling, inequalities.
- Branching Processes as Random Trees: asymptotic properties.
- The Erdos-Renyi Random Graph: phase transition and giant component.
- Configuration Model: asymptotic distances and small world phenomenon.
- Preferential Attachment: emergence of a scale free network.
Lectures Log
References:
- [vdH] R. van der Hostad Random graphs and complex networks, lecture notes available on the author webpage
- [LP] R. Lyons, Y. Peres. Probability on Trees and Networks, available here
- [CDP] Caravenna, Dai Pra. Probabilità: Un'introduzione attraverso modelli e applicazioni. Springer.
Lecture 10 (27th May 2014)
- Preferential attachment model.
- Reminders of definition and Markov property of degree (Di(t))t ∈ N.
- Remark: if E[Yt | Ft-1] = (ηt-1) Yt-1 and Y0 = y0, for some constants ηt > 1 and y0 > 0, the process Mt := (∏ s ≤ t-1 1/ηs) Yt / y0 is a martingale. Corollary: Di(t) / t1/(2+δ) is bounded in L2 [exercise: considering Yt := (Di(t) + δ) (Di(t) + δ + 1)].
- Azuma-Hoeffding inequality for martingales with bounded increments: discussion and proof (Thm 2.24 [vdH]).
- Asymptotic properties of degrees in the preferential attachment model.
- Concentration of Nk(t) (number of vertexes of degree k) around its expected value, using the Azuma-Hoeffding inequality (Prop 8.3 [vdH]).
- Detailed proof that (Mn := E[Nk(t) | Fn])n ∈ N is a martingale with bounded increments (|Mn - Mn-1| ≤ 2) using the Markov property of (Di(t))t ∈ N.
- Asymptotic behavior of E[Nk(t)] ∼ t pk with pk ∼ cδ / k3+δ using recursive equations (Prop. 8.4 [vdH], sketch of the proof).
Lecture 9 (20th May 2014)
- Preferential attachment model.
- Motivation and introduction to the model.
- PA(t) = PA(1,δ)(t) defined as a function of the sequence of random variables Zi ∈ {1, ..., i}, for i=1, ..., t, where Zi is the vertex to which the vertex "i" is attached. Distribution of the random variables Zi's.
- σ-algebra Ft and its interpretation. The process (Di(t))t ∈ N, giving the degree of a vertex "i" as a function of t, is a Markov chain.
- Extension of the model: PA(m,δ)(t) for m ≥ 2 in terms of PA(1,δ/m)(mt).
- Asymptotic properties of degrees in the preferential attachment model.
- Empirical degree distribution in PA(t) as a random probability on N, which converges to a heavy-tailed distribution (Thm 8.2 [vdH], proof next time).
- Reminders on martingales, conditions for a.s. and Lp convergence.
- Reminders on Euler's Gamma function Γ(t).
- Degrees of fixed vertices: expected value and asymptotic behavior of Di(t) as t → ∞ (Thm. 8.1 [vdH]).
- Heuristic derivation of heavy tails for the empirical degree distribution from the knowledge E[Di(t)] ∼ c (t/i)1/(τ-1) (with τ = 3+δ).
Lecture 8 (13th May 2014)
- Size biasing: discussion and examples.
- Phase transition in the configuration model, revisited.
- Emergence of a giant component according to whether ν > 1 or ν ≤ 1, where ν := E[D(D-1)]/E[D] (Thm 10.1 [vdH], no proof).
- Heuristics: two-stage branching process with size-biased distribution.
- Polynomial τ-condition (with τ ∈ (2,3)) for a sequence d(n) = (d1(n), ..., dn(n)).
- Small world phenomenon in the configuration model:
- log log n asymptotic behavior for the diameter of the configuration model with heavy tails (polynomial τ-condition): (Theorem 10.33 [VdH]; partial proof of the upper bound, see below).
- Proposition 1: log log n upper bound on the diameter of the Core (Theorem 10.28 [VdH], no proof).
- Proposition 2: log log n upper bound on the distance from the Core (proof as a consequence of Propositions 3 and 4 below).
- Proposition 3: estimate on the connection probability of two subsets of vertices (enhanced version of Lemma 10.30 [VdH], with proof).
- Proposition 4: estimate on the "effective degree" (number of half-edges) of the neighborhood N≤k(v) of a vertex v (sketch of the proof).
Lecture 7 (6th May 2014)
- Proof of the convergence of the empirical degree distribution pkER,(n) of the erased configuration model CMnER(d) (Thm 7.6 [vdH]).
- Pointwise convergence pk(n) → pk for every fixed k is equivalent to ℓ1 convergence ∑k |pk(n) - pk| → 0.
- If αn → ∞, the fraction of vertexes v ∈ [n] with dv > αn vanishes.
- Estimate of ∑k |pkER,(n) - pk(n)| in terms of the fraction of vertexes v ∈ [n] having at least a self-loop or a multiple-edge.
- Expected values of the numbers of self-loops (Sv) and multiple edges (Mv) emanating from a vertex v ∈ [n].
- Determination of the truncation αn, assuming max{dv: v ∈ [n]} = o(n).
- Conclusion: proof that max{dv: v ∈ [n]} = o(n) if regularity conditions (a) and (b) are satisfied.
- Configuration model with i.i.d. degrees.
- An i.i.d. sequence D = (D1, D2, ...) satisfies a.s. the regularity condition (a), and also (b) and (c) if E[D1] < ∞ and E[D12] < ∞ respectively.
- Configuration model with i.i.d. degrees CMn(D) as a function Ψ(D, W), where W are i.i.d. U(0,1) which generate a uniform configuration.
- Phase transition in the configuration model:
- Emergence of a giant component according to whether ν > 1 or ν ≤ 1, where ν := E[D(D-1)]/E[D] (Thm 10.1 [vdH], no proof).
- Heuristics: two-stage branching process with size-biased distribution.
Lecture 6 (29th April 2014)
- Beyond Erdos-Renyi:
- Reminders: Poisson limit of the empirical degree distribution for Erdos-Renyi (Thm 5.9 [vdH], no proof).
- How to build a model with (asymptotic) polynomial degree distribution? Introduction to Configuration Model and Preferential Attachment.
- Introduction to the Configuration Model (§7.1 [vdH]):
- Not every d = (d1, ..., dn) is the degree sequence of a graph. Graphs, multigraphs and their matrix notation: spaces Gn and MGn.
- Intuitive definition of the Configuration Model CMn(d) via "uniform matching" of the half-edges attached to each vertex in [n].
- Notation ℓn := d1 + ... + dn. Space Conf(ℓn) of partitions of [ℓn] into pairs ("configurations"). Map φd: Conf(ℓn) → MGn between configurations and multigraphs.
- How to generate a r.v. U uniformly distributed on Conf(d) via successive uniform pairings. CMn(d) has the same law as φ(U).
- Basic properties of the Configuration Model:
- Distribution of CMn(d): probability of a given multigraph (Prop 7.4 [vdH]).
- Corollary: conditionally on the event {CMn(d) ∈ Gn}, the r.v. CMn(d) is uniformly distributed on Gn. How to sample a uniform simple graph with given degrees d by repeatedly sampling CMn(d).
- Regularity conditions (a), (b), (c) for a sequence d(n) = (d1(n), ..., dn(n)) with respect to a given probability (pk)k∈N0. Probabilistic reformulation.
- Probability of CMn(d) being a simple graph (Thm 7.8 [vdH], no proof).
- Erased configuration model CMnER(d). Convergence of the corresponding erased empirical degree distribution (Thm 7.6 [vdH], proof next time).
Lecture 5 (14th April 2014)
- Proof of the law of large numbers for giant component in the supercritical regime.
- Concentration of the number of vertices in large clusters (Prop 4.10, Coroll 4.11 [vdH]);
- The law of the random walk in the exploration process (Prop 4.6 [vdH]);
- No intermediate clusters (Prop 4.12, Coroll 4.13 [vdH]).
- More on supercritical regime.
- Duality principle for ER random graphs (Section 4.4.2 [vdH]);
- Connectivity threshold (Thm 5.5 [vdH], without proof).
- Connected components in the critical window of ER random graphs.
- Statement of the Aldous theorem (Theorem 5.4 [vdH]).
- Degree sequence.
- Poisson (uniform) limit of the empirical degree distribution (Thm 5.9 [vdH], without proof).
Lecture 4 (8th April 2014)
- Comparing the connected components of an Erdos-Renyi graph with binomial branching processes.
- Stochastic domination of the cluster size (thm 4.2 [vdH]);
- Lower bound on cluster tail (thm 4.3 [vdH]).
- Largest component in the subcritical regime.
- Upper and lower bounds on largest subcritical component and the law of large numbers (thm 4.4/4.5 [vdH]);
- Complete proofs of the two bounds (following Sections 4.3.1, 4.3.2, 4.3.3 in [vdH] with minor modifications).
- Supercritical regime.
- Statements of the law of large numbers and the central limit theorem for giant component (thm 4.8, thm 4.15 [vdH]);
- Statement of the law of large numbers for the second largest component (Sec. 4.4.2 [vdH]).
Lecture 3 (1st April 2014)
- Random-walk perspective to branching processes.
- Definition of conjugate pair of off-spring distributions and basic properties (ex. 3.16, 3.17 [vdH]);
- Duality principle for branching processes (thm 3.7 [vdH]);
- Cramer's upper bound, Chernoff bound (thm 2.17 [vdH]);
- Extinction probability with large total progeny (thm 3.8 [vdH]).
- Hitting-time theorem and the total progeny.
- Hitting-time theorem (thm. 3.14 [vdH]);
- Law of total progeny (thm 3.13 [vdH]);
- Individuals with an infinite line of descent (thm. 3.11 [vdH]).
- Binomial and Poisson branching processes.
- Poisson duality principle (thm 3.15 [vdH]);
- Total progeny distribution for the Poisson branching process (thm. 3.16 [vdH]);
- Total variation distance, Strassen theorem, Poisson-Binomial coupling (thm. 2.9, thm 2.10 [vdH], see also Capitolo 4 [CDP]);
- Poisson and binomial branching processes (thm. 3.16 [vdH]).
- Erdos-Renyi random graph.
- Exploration process for the connected component of a node (random walk perspective);
- The size of the connected component as the hitting time of S_t and the law of the X_t's (Sec. 4.1 [vdH]).
Lecture 2 (25th March 2014)
- Brancing process.
- Extinction vs survival (definition and basic results). Thm 3.1 [vdH].
- The intrinsic Martingale and its limit.
- Thm 3.3, Thm 3.9, result (3.4.1), Exer.3.19 in [vdH];
- Kesten-Stigum Theorem and Seneta Theorem (statements without proofs), Thm 3.10 [vdH].
- Definition of inherited property and Proposition 5.6 and Corollary 5.6 in [LP];
- Conditioning on extinction, conditioning on survival.
- Bernoulli percolation.
- Definition of general percolation graph, definition of critical probability.
- Binomial Galton Watson tree as the percolation graph of an n-ary tree.
- Erdos-Renyi random graph as the percolation graph of a complete graph.
- Critical probability in a Galton Watson tree, Prop 5.9 in [LP].
- Total progeny.
- Definition of Total progeny, thm 3.2 in [vdH].
- Random-walk perspective to branching processes.
- Definition of History. Total progeny as hitting time of a random walk. Sec. 3.3 [vdH].
Lecture 1 (18 March 2014)
- Graphs and trees.
- Basic definitions: edges, nodes, self-loop, directed and undirected graphs, tree, rooted tree. Degree of a node.
- Ulam-Harris tree, rooted labelled trees and locally finite trees.
- Random graphs and random trees.
- Definition of (finte) random graph and (finte) random tree.
- Definition of random tree as a random variable on the space of locally finite trees.
- Definition of Erdos-Renyi graph (density and different constructions).
- Binomial distribution of the degree of a node in the Erods-Renyi graph and Poisson limit.
- Galton-Watson tree process.
- The off-spring distribution and the definition of Galton-Watson tree.
- Basic properties of independence of the Galton-Watson tree.
- The number of individuals in the n-th generation (the brancing process Z_n).
- Real words networks and random graphs.
- The proportion of vertices of degree k in some empirical networks and the study of the degree distribution of a random graph.
- Power-law decay of the degree distribution, hub and small-words in real words networks.
Logistic Details
Due to room shortage, lectures in Milano-Bicocca will take place in different rooms located in different buildings:
- Building U5 is in Via Roberto Cozzi 55
- Building U9 is in Viale dell'Innovazione 10
- Building U7 is in Via Bicocca degli Arcimboldi 8
- Bicocca campus maps: here or here
Schedule for the second part of the course, held in Milano Bicocca:
- Tuesday 29 April 2014, 11:00-12:30 and 14:00-15:30, Aula U5-3014
- Tuesday 6 May 2014, 11:00-12:30 and 14:00-15:30, Aula U9-10
- Tuesday 13 May 2014, 11:00-12:30 and 14:00-15:30, Aula U7-6
- Tuesday 20 May 2014, 11:00-12:30 and 14:00-15:30 14:15-17:30 (with a short break), Aula U7-6
- Tuesday 27 May 2014, 11:00-12:30 and 14:00-15:30, Aula U5-3014
Schedule for the first part of the course, held in Pavia (Dipartimento di Matematica, via Ferrata 1):
- Tuesday 18 March 2014, 11:15-13:15, Aula Beltrami (Introductory Lecture)
- Tuesday 25 March 2014, 11:15-12:45 and 14:00-15:30, Aula C13
- Tuesday 1 April 2014, 11:15-12:45 and 14:00-15:30, Aula Beltrami
- Tuesday 8 April 2014, 11:15-12:45 and 14:00-15:30, Aula Beltrami
- Tuesday 15 April 2014, 11:15-12:45 and 14:00-15:30, Aula Beltrami
Interested students are invited to contact us via email.
What else?
For more information, do not hesitate to contact us via email.