An Exploration of the Primes, Modular Arithmetic & Factoring
*Author Note: This is some cliff notes of a conversation I had with Claude about several vairous topics ranging from modular arithmetic to Chinese remainder theorom. When I was younger I always have this idea about “unzipping” co-prime numbers like those found in RSA by starting at the last digit and making education decisions on possible factors. This page will somewhat serve as a living document of different ideas an explorations I’ve had playing around with the primes and co.
1. Invariants
An invariant is a property that remains unchanged under some transformation. In a loop, a loop invariant holds true before and after every iteration. In geometry, distance is invariant under rotation. The concept is useful for reasoning about correctness — if something must be preserved, you can rely on it.
2. 1/89 and the Fibonacci sequence
The decimal expansion of 1/89 encodes the Fibonacci sequence digit by digit:
0.01
0.001
0.0002
0.00003
...
= 0.01123595... = 1/89 This works because the Fibonacci recurrence F(n) = F(n-1) + F(n-2) has a rational generating function:
F(x) = x / (1 - x - x²) Plugging in x = 1/10 gives 10/89, and dividing by 10 gives 1/89. Notably, 89 is itself a Fibonacci number.
This generalizes to any linear recurrence — any sequence defined by a(n) = c₁·a(n-1) + c₂·a(n-2) + ... has a rational generating function, so plugging in x = 1/10 always yields a fraction whose decimal expansion encodes the sequence. The constraint is that terms can’t grow faster than 10^n, otherwise the sum diverges.
3. Why primes don’t fit
Primes are a sequence defined by a property, not a recurrence. No formula exists of the form p(n) = c₁·p(n-1) + c₂·p(n-2) + ...
This isn’t just unknown — it’s provably impossible. Any linear recurrence produces a sequence that is eventually periodic mod m (for any integer m). But by Dirichlet’s theorem, primes are distributed across all valid residue classes infinitely and never lock into a repeating pattern. A linear recurrence generating primes would produce a contradiction.
4. Primes mod m
Taking primes modulo m and looking at the distribution of remainders reveals structure:
- mod 2 — after p=2, all primes land on remainder 1 (all primes are odd)
- mod 3 — only remainders 1 and 2 appear (after p=3), roughly balanced
- mod 4 — only remainders 1 and 3 appear (the famous 4k+1 and 4k+3 primes)
- 4k+1 primes can always be written as a sum of two squares
- 4k+3 primes never can
The “banned” remainders are those that share a factor with m. Primes must live in coprime residue classes only. The count of valid classes is exactly Euler’s totient function φ(m).
Nested mod — (p % m1) % m2
Taking remainders at two levels:
- When m2 divides m1 (e.g. m1=15, m2=3): the second level collapses to
p % m2directly — no new information - When gcd(m1, m2) = 1 (coprime): the distribution across m2’s residues is governed by how φ(m1) residues map into m2 buckets
- When m1 is prime and gcd(m1, m2) = 1: the φ(m1) = m1-1 residues spread nearly uniformly across m2’s classes, with slight imbalance when (m1-1) doesn’t divide evenly by m2
5. Chinese Remainder Theorem (CRT)
CRT is the inverse of collapsing remainders. Given a number’s remainders under several coprime moduli, you can uniquely reconstruct the original number.
Example
p % 3 = 2
p % 5 = 1
p % 7 = 6 Step 1 — N = 3 × 5 × 7 = 105
Step 2 — partial products: N1=35, N2=21, N3=15
Step 3 — modular inverses (x such that Ni·x ≡ 1 mod mi):
35·x ≡ 1 (mod 3) → x = 2
21·x ≡ 1 (mod 5) → x = 1
15·x ≡ 1 (mod 7) → x = 1 Step 4 — combine: (2·35·2) + (1·21·1) + (6·15·1) = 251
Step 5 — reduce: 251 % 105 = 41
Verify: 41 % 3 = 2 ✓ 41 % 5 = 1 ✓ 41 % 7 = 6 ✓
CRT is used in RSA (key operations decompose for speed), fast polynomial multiplication (NTT), and secret sharing schemes.
6. Modular inverse and the Euclidean algorithm
The modular inverse of a mod b is the value x such that a·x ≡ 1 (mod b). It’s the “division equivalent” in a world of only integers — instead of 1/35, you find a whole number that behaves like 1/35 under mod 3.
The Euclidean algorithm finds gcd efficiently using the identity:
gcd(a, b) = gcd(b, a % b) Example:
gcd(35, 3) = gcd(3, 2) = gcd(2, 1) = gcd(1, 0) = 1 The extended Euclidean algorithm tracks extra variables during the same process to also produce x, y satisfying a·x + b·y = gcd(a, b). When gcd = 1, this gives a·x ≡ 1 (mod b) — the modular inverse — in O(log n) steps regardless of the size of the numbers.
brute force: O(n) breaks down fast
Euclidean gcd: O(log n) always fast
extended version: O(log n) also yields the modular inverse for free 7. Digit-unzipping as a factoring approach
The idea: given a semiprime n = p × q, recover p and q digit by digit from the least significant end.
Core insight — the last digit of n constrains the last digits of p and q. For example if n ends in 3, the only valid last-digit pairs (both odd) are those where dp × dq ≡ 3 (mod 10). Then at each subsequent digit position, the partial product must still match n from the right.
Digit unzipping Example
n = 10403 = 101 × 103
n ends in 3. Valid last-digit pairs where dp × dq ≡ 3 (mod 10), both odd:
(1,3), (3,1), (7,9), (9,7)
Depth 1 — last digit
| p so far | q so far | p×q mod 10 | n mod 10 | match? |
|---|---|---|---|---|
| 1 | 3 | 3 | 3 | pass |
| 3 | 1 | 3 | 3 | pass |
| 7 | 9 | 3 | 3 | pass |
| 9 | 7 | 3 | 3 | pass |
All four branches survive. We pursue each.
Depth 2 — tens digit (branch from p=1, q=3)
| p so far | q so far | p×q mod 100 | n mod 100 | match? |
|---|---|---|---|---|
| 01 | 03 | 3 | 3 | pass |
| 01 | 13 | 13 | 3 | fail |
| 01 | 23 | 23 | 3 | fail |
| 11 | 03 | 33 | 3 | fail |
| … | … | … | … | most fail |
The digit match check kills most candidates immediately.
Depth 3 — hundreds digit (branch from p=101, q=_03)
| p so far | q so far | p×q mod 1000 | n mod 1000 | match? |
|---|---|---|---|---|
| 101 | 003 | 303 | 403 | fail |
| 101 | 103 | 403 | 403 | pass |
| 101 | 203 | 503 | 403 | fail |
Only (101, 103) survives.
Depth 4 — thousands digit
| p so far | q so far | p×q mod 10000 | n mod 10000 | match? |
|---|---|---|---|---|
| 0101 | 0103 | 10403 mod 10000 = 403 | 403 | pass |
| 1101 | 0103 | 113403 mod 10000 = 3403 | 403 | fail |
Depth 5 — final check
| p | q | p × q | = n? | result |
|---|---|---|---|---|
| 101 | 103 | 10403 | 10403 | found |
Why 10403 is a clean example
101 and 103 are nearly equal (both close to √10403 ≈ 102), so the search finds them very early in the (1,3) branch without backtracking. In the worst case — factors far apart in size — far more branches survive each depth before the product constraint kills them.
Pruning layers:
(p_partial × q_partial) % 10^depth == n % 10^depth— digit match from the rightp_partial × q_partial <= n— can’t exceed n- Digit sum mod 9 consistency — since
p×q ≡ n (mod 9)always
Parallelism — each branch of the search tree is completely independent, making this embarrassingly parallel. However, the tree grows exponentially while parallelism only scales linearly, so parallelism can’t overcome the fundamental exponential growth.
Runtime complexity:
worst case: O(10^d) where d = digit count ≈ O(n)
with pruning: depends entirely on branching factor × survival rate per level This is comparable to brute force trial division in the worst case. To beat GNFS (the best classical algorithm), pruning would need to achieve sub-exponential survival rate — nobody has found a digit-based strategy that achieves this.
Connection to real attacks:
- The sequential digit approach is related to p-adic lifting / Hensel’s lemma
- Coppersmith’s algorithm (1996) uses lattice methods (LLL) to recover a full factor given roughly half its digits — formalizing the “partial information constrains the search” intuition
- Lattice-based factoring encodes partial factor knowledge as a geometric object and finds the shortest vector to recover the answer
Sample C Code
Below is a sample program written in C which provides an implementation of this “unzipping” method for small n.
// @file unzip-coprimes.c
// @description sample C code for "unzipping" coprime numbers.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
typedef unsigned long long u64;
static u64 nodes_explored = 0;
static int found = 0;
// digit sum mod 9 for cheap consistency check
static int digit_sum_mod9(u64 n) {
int s = 0;
while (n > 0) { s += n % 10; n /= 10; }
return s % 9;
}
void search(u64 n, u64 p, u64 q, int depth, u64 place, int n_digits, int target_mod9) {
nodes_explored++;
// base case: all digit positions filled
if (depth == n_digits) {
if (p > 1 && q > 1 && p * q == n) {
printf(" found: %llu x %llu\n", p, q);
found = 1;
}
return;
}
u64 next_place = place * 10;
for (int dp = 0; dp <= 9; dp++) {
for (int dq = 0; dq <= 9; dq++) {
u64 p_new = p + (u64)dp * place;
u64 q_new = q + (u64)dq * place;
// prune 1: enforce p <= q to avoid duplicate pairs
if (p_new > q_new) continue;
// prune 2: partial product must match n digit-by-digit from the right
if ((p_new * q_new) % next_place != n % next_place) continue;
// prune 3: partial product already exceeds n
if (p_new > 1 && q_new > 1 && p_new * q_new > n) continue;
// prune 4: at final depth, digit sum mod 9 must match
if (depth == n_digits - 1 &&
(digit_sum_mod9(p_new) * digit_sum_mod9(q_new)) % 9 != target_mod9) continue;
search(n, p_new, q_new, depth + 1, next_place, n_digits, target_mod9);
}
}
}
int count_digits(u64 n) {
int d = 0;
while (n > 0) { d++; n /= 10; }
return d;
}
void factor(u64 n) {
printf("factoring: %llu\n", n);
nodes_explored = 0;
found = 0;
if (n < 4) { printf(" too small\n\n"); return; }
// handle even numbers
if (n % 2 == 0) {
printf(" found: 2 x %llu\n\n", n / 2);
return;
}
// handle multiples of 5
if (n % 5 == 0) {
printf(" found: 5 x %llu\n\n", n / 5);
return;
}
int d = count_digits(n);
int target_mod9 = digit_sum_mod9(n);
// build valid last-digit starting pairs based on n % 10
// both digits must be odd, product must match last digit of n
// enforce dp <= dq to avoid duplicates from the start
int pairs[8][2];
int npairs = 0;
int last = n % 10;
for (int dp = 1; dp <= 9; dp += 2) {
for (int dq = dp; dq <= 9; dq += 2) {
if ((dp * dq) % 10 == last) {
pairs[npairs][0] = dp;
pairs[npairs][1] = dq;
npairs++;
}
}
}
for (int i = 0; i < npairs; i++) {
search(n, pairs[i][0], pairs[i][1], 1, 10, d, target_mod9);
}
if (!found) printf(" prime (no factors found)\n");
printf(" nodes: %llu\n\n", nodes_explored);
}
int main(int argc, char *argv[]) {
if (argc > 1) {
for (int i = 1; i < argc; i++) {
factor(strtoull(argv[i], NULL, 10));
}
} else {
u64 tests[] = {
21, // 3 x 7
10403, // 101 x 103
1189, // 29 x 41
8051, // 83 x 97
1046527, // 953 x 1099
104729, // prime
9999991, // prime
};
int n = sizeof(tests) / sizeof(tests[0]);
for (int i = 0; i < n; i++) factor(tests[i]);
}
return 0;
} To run from the CLI use the following:
# compile source code:
gcc unzip-coprimes.c -o unzip-coprimes
# run program:
./unzip-coprimes 8. Empirical benchmark
Running the digit-unzipping algorithm on random semiprimes of increasing digit count showed:
| digits | nodes explored | time |
|---|---|---|
6 | ~500 | <5ms |
8 | ~70 | <1ms |
10 | ~33,000 | ~130ms |
12 | ~70,000 | ~260ms |
14 | >1,000,000 | >3s |
16 | >1,000,000 | >3s |
On a log scale, nodes grow roughly linearly with digit count — confirming exponential growth. The algorithm breaks down around 13-14 digits without better pruning. For comparison, RSA-250 (250 digits) required 2700 CPU core-years using GNFS across tens of thousands of machines over several months.
9. RSA challenge context
| challenge | digits | status | compute |
|---|---|---|---|
| RSA-100 | 100 | solved 1991 | days on a supercomputer |
| RSA-129 | 129 | solved 1994 | 17 years after being posed |
| RSA-250 | 250 | solved 2020 | 2700 CPU core-years, few months wall time |
| RSA-260 | 260 | unsolved | estimated 10-30x harder than RSA-250 |
| RSA-2048 | 617 | in active use | effectively unbreakable classically |
The difficulty gap between each solved number grows faster than hardware improves. RSA-260, posted in 1991, remains unsolved after 35 years despite RSA-250 falling in 2020.
Quantum computers running Shor’s algorithm would break RSA in polynomial time. NIST standardized the first post-quantum cryptographic algorithms in 2024, which are ironically based on lattice problems — the same mathematical structure underlying some of the best classical attacks on RSA.