Skip to content

MonoSoup Mathematics

Date: 2026-02-05

This article explains how MonoSoup.py constructs an edited model between a pretrained checkpoint and a fine-tuned checkpoint.

Key implementation entry points:

1) Problem Setup

For each trainable layer, define:

\[ \Delta W = W_1 - W_0, \]

where:

  • \(W_0\): pretrained weights,
  • \(W_1\): fine-tuned weights.

MonoSoup.py first checks whether a parameter should be processed (should_process_param) using a relative update test:

\[ \frac{\|\Delta W\|_F}{\|W_0\|_F + \epsilon} \ge \tau. \]

Code: https://github.com/alirezaabdollahpour/MonoSoup/blob/main/MonoSoup.py#L350

2) Matrix View for SVD

SVD is applied to a 2D view of each tensor:

  • linear: shape [out, in],
  • convolution: reshape to [out, in * k_h * k_w].

Code path:

3) Spectral Decomposition

_monosoup_update_for_layer computes:

\[ \Delta W = U \Sigma V^\top. \]

Given \(k\), the update is split into:

\[ \Delta W_{\text{high}} = U_{1:k}\Sigma_{1:k}V_{1:k}^\top, \qquad \Delta W_{\text{low}} = \Delta W - \Delta W_{\text{high}}. \]

Code: https://github.com/alirezaabdollahpour/MonoSoup/blob/main/MonoSoup.py#L470

4) Choosing k: Two Modes

Variance Mode

Use the smallest \(k\) such that cumulative squared singular-value energy reaches threshold \(R\):

\[ \frac{\sum_{i=1}^k \sigma_i^2}{\sum_{i=1}^r \sigma_i^2} \ge R. \]

Code: https://github.com/alirezaabdollahpour/MonoSoup/blob/main/MonoSoup.py#L413

Freevariance Mode (Roy-Vetterli style effective rank)

Build normalized singular-value magnitudes:

\[ p_i = \frac{|\sigma_i|}{\sum_j |\sigma_j|}. \]

Compute entropy and effective rank:

\[ H = -\sum_i p_i \log(p_i + \epsilon), \qquad r_{\text{eff}} = e^H. \]

Set:

\[ k = \lceil r_{\text{eff}} \rceil. \]

Code: https://github.com/alirezaabdollahpour/MonoSoup/blob/main/MonoSoup.py#L397

5) MonoSoup Mixing Coefficients

After selecting \(k\), define:

\[ P_k = \frac{\sum_{i=1}^k \sigma_i^2}{\sum_{i=1}^r \sigma_i^2}, \qquad \rho = \left(\frac{\sigma_{k+1}}{\sigma_1 + \epsilon}\right)^2. \]

Then:

\[ \cos(\alpha) = \sqrt{1 - P_k}, \]
\[ \lambda_{\text{low}} = \rho + (1-\rho)\cos(\alpha), \qquad \lambda_{\text{high}} = 1 - \lambda_{\text{low}}. \]

The edited update:

\[ \Delta W_{\text{mono}} = \lambda_{\text{high}}\Delta W_{\text{high}} + \lambda_{\text{low}}\Delta W_{\text{low}}. \]

Final layer:

\[ W_{\text{mono}} = W_0 + \Delta W_{\text{mono}}. \]

Code: https://github.com/alirezaabdollahpour/MonoSoup/blob/main/MonoSoup.py#L476

6) Full-Model Pass

apply_monosoup iterates over matched parameters and applies the layer update where valid:

  • skips not-found keys,
  • skips shape mismatch,
  • skips very small updates.

A summary of processed and skipped layers is logged.

Code: https://github.com/alirezaabdollahpour/MonoSoup/blob/main/MonoSoup.py#L562

7) Practical Complexity

The dominant cost per processed layer is compact SVD of the flattened update matrix. For large layers, this can be expensive in both memory and time. In practice:

  • min_rel_update reduces unnecessary decomposition work,
  • verbose_layers helps inspect spectral behavior,
  • float16/bfloat16 tensors are cast to float32 before SVD for stability.

Code: https://github.com/alirezaabdollahpour/MonoSoup/blob/main/MonoSoup.py#L449

8) Minimal Reproducible Command

python MonoSoup.py \
  --pretrained-checkpoint /path/to/model_0.pt \
  --finetuned-checkpoint /path/to/model_31.pt \
  --data-location /path/to/data_root \
  --model-type 32 \
  --version freevariance \
  --R 0.8 \
  --output-json results/monosoup_clip.json