This post is an annotated writeup of a final project for Information and Physical Computing (CSDS/PHYS). The central question: in the high-dimensional limit (features $p$, samples $n$, and latent dimension $d$ all large with fixed ratios), what is the exact generalization error of a linear student model trained on data generated by a nonlinear teacher?

The answer—derived via the replica method—reveals the celebrated double-descent phenomenon as a consequence of the structure of the loss landscape near the interpolation threshold $p/n \approx 1$.

Setup: Hidden Manifold Model

Data is generated by a two-step process:

  1. Draw a low-dimensional latent vector $\mathbf{c}^\mu \sim \mathcal{N}(0, I_d)$.
  2. Project and apply a nonlinearity to get observations: $\mathbf{x}^\mu = \sigma!\left(\frac{1}{\sqrt{d}} F^\top \mathbf{c}^\mu\right)$, where $F \in \mathbb{R}^{d \times p}$ is a random matrix.
  3. Generate labels via a teacher direction $\theta^0$: $y^\mu = f^0!\left(\frac{1}{d}\,\mathbf{c}^\mu \cdot \theta^0\right)$.

This “hides” the linear structure inside a random nonlinear projection. The student only sees $(\mathbf{x}^\mu, y^\mu)$ and must recover the direction $\theta^0$ in latent space.

Key Results

The generalization error in the thermodynamic limit depends only on three scalar overlap parameters ($m_s, q_s, q_w$) which satisfy self-consistent saddle-point equations. For regression:

\[\epsilon_g = \rho + Q^\star - 2M^\star\]

where $Q^\star = \kappa_1^2 q_s^\star + \kappa_\star^2 q_w^\star$ and $M^\star = \kappa_1 m_s^\star$. For classification, it reduces to the angle between the student and teacher directions:

\[\epsilon_g = \frac{1}{\pi}\cos^{-1}\!\left(\frac{M^\star}{\sqrt{Q^\star}}\right)\]

The double-descent spike at $p/n \approx 1$ is visible in the fixed-point solutions when regularization $\lambda$ is small.

Double Descent

Double descent curve
Double descent: test error peaks near the interpolation threshold p/n ≈ 1, then decreases as the model becomes overparameterized.

Interactive Notebook

The full derivation (Gibbs measure → replica trick → Gaussian equivalence → replica-symmetric saddle point → fixed-point iteration) is in the interactive notebook below. You can explore how regularization $\lambda$, the nonlinearity $\sigma$, and the noise level $\Delta$ affect the generalization error surface.

If the interactive notebook does not load, open it directly: full notebook.

Reference

Gerace, F., Loureiro, B., Krzakala, F., Mézard, M., & Zdeborová, L. (2021). Generalisation error in learning with random features and the hidden manifold model. ICML 2021.