MonoSoup Mathematics¶
Date: 2026-02-05
This article explains how MonoSoup.py constructs an edited model between a pretrained checkpoint and a fine-tuned checkpoint.
Key implementation entry points:
apply_monosoup: https://github.com/alirezaabdollahpour/MonoSoup/blob/main/MonoSoup.py#L507_monosoup_update_for_layer: https://github.com/alirezaabdollahpour/MonoSoup/blob/main/MonoSoup.py#L424_choose_k_and_pk: https://github.com/alirezaabdollahpour/MonoSoup/blob/main/MonoSoup.py#L379
1) Problem Setup¶
For each trainable layer, define:
where:
- \(W_0\): pretrained weights,
- \(W_1\): fine-tuned weights.
MonoSoup.py first checks whether a parameter should be processed (should_process_param) using a relative update test:
Code: https://github.com/alirezaabdollahpour/MonoSoup/blob/main/MonoSoup.py#L350
2) Matrix View for SVD¶
SVD is applied to a 2D view of each tensor:
- linear: shape
[out, in], - convolution: reshape to
[out, in * k_h * k_w].
Code path:
- reshape in: https://github.com/alirezaabdollahpour/MonoSoup/blob/main/MonoSoup.py#L331
- reshape out: https://github.com/alirezaabdollahpour/MonoSoup/blob/main/MonoSoup.py#L346
3) Spectral Decomposition¶
_monosoup_update_for_layer computes:
Given \(k\), the update is split into:
Code: https://github.com/alirezaabdollahpour/MonoSoup/blob/main/MonoSoup.py#L470
4) Choosing k: Two Modes¶
Variance Mode¶
Use the smallest \(k\) such that cumulative squared singular-value energy reaches threshold \(R\):
Code: https://github.com/alirezaabdollahpour/MonoSoup/blob/main/MonoSoup.py#L413
Freevariance Mode (Roy-Vetterli style effective rank)¶
Build normalized singular-value magnitudes:
Compute entropy and effective rank:
Set:
Code: https://github.com/alirezaabdollahpour/MonoSoup/blob/main/MonoSoup.py#L397
5) MonoSoup Mixing Coefficients¶
After selecting \(k\), define:
Then:
The edited update:
Final layer:
Code: https://github.com/alirezaabdollahpour/MonoSoup/blob/main/MonoSoup.py#L476
6) Full-Model Pass¶
apply_monosoup iterates over matched parameters and applies the layer update where valid:
- skips not-found keys,
- skips shape mismatch,
- skips very small updates.
A summary of processed and skipped layers is logged.
Code: https://github.com/alirezaabdollahpour/MonoSoup/blob/main/MonoSoup.py#L562
7) Practical Complexity¶
The dominant cost per processed layer is compact SVD of the flattened update matrix. For large layers, this can be expensive in both memory and time. In practice:
min_rel_updatereduces unnecessary decomposition work,verbose_layershelps inspect spectral behavior,- float16/bfloat16 tensors are cast to float32 before SVD for stability.
Code: https://github.com/alirezaabdollahpour/MonoSoup/blob/main/MonoSoup.py#L449
8) Minimal Reproducible Command¶
python MonoSoup.py \
--pretrained-checkpoint /path/to/model_0.pt \
--finetuned-checkpoint /path/to/model_31.pt \
--data-location /path/to/data_root \
--model-type 32 \
--version freevariance \
--R 0.8 \
--output-json results/monosoup_clip.json