Primes, modular arithmetic & factoring

Primes, modular arithmetic & factoring


An Exploration of the Primes, Modular Arithmetic & Factoring

*Author Note: This is some cliff notes of a conversation I had with Claude about several vairous topics ranging from modular arithmetic to Chinese remainder theorom. When I was younger I always have this idea about “unzipping” co-prime numbers like those found in RSA by starting at the last digit and making education decisions on possible factors. This page will somewhat serve as a living document of different ideas an explorations I’ve had playing around with the primes and co.

1. Invariants

An invariant is a property that remains unchanged under some transformation. In a loop, a loop invariant holds true before and after every iteration. In geometry, distance is invariant under rotation. The concept is useful for reasoning about correctness — if something must be preserved, you can rely on it.


2. 1/89 and the Fibonacci sequence

The decimal expansion of 1/89 encodes the Fibonacci sequence digit by digit:

0.01
0.001
0.0002
0.00003
...
= 0.01123595... = 1/89

This works because the Fibonacci recurrence F(n) = F(n-1) + F(n-2) has a rational generating function:

F(x) = x / (1 - x - x²)

Plugging in x = 1/10 gives 10/89, and dividing by 10 gives 1/89. Notably, 89 is itself a Fibonacci number.

This generalizes to any linear recurrence — any sequence defined by a(n) = c₁·a(n-1) + c₂·a(n-2) + ... has a rational generating function, so plugging in x = 1/10 always yields a fraction whose decimal expansion encodes the sequence. The constraint is that terms can’t grow faster than 10^n, otherwise the sum diverges.


3. Why primes don’t fit

Primes are a sequence defined by a property, not a recurrence. No formula exists of the form p(n) = c₁·p(n-1) + c₂·p(n-2) + ...

This isn’t just unknown — it’s provably impossible. Any linear recurrence produces a sequence that is eventually periodic mod m (for any integer m). But by Dirichlet’s theorem, primes are distributed across all valid residue classes infinitely and never lock into a repeating pattern. A linear recurrence generating primes would produce a contradiction.


4. Primes mod m

Taking primes modulo m and looking at the distribution of remainders reveals structure:

  • mod 2 — after p=2, all primes land on remainder 1 (all primes are odd)
  • mod 3 — only remainders 1 and 2 appear (after p=3), roughly balanced
  • mod 4 — only remainders 1 and 3 appear (the famous 4k+1 and 4k+3 primes)
    • 4k+1 primes can always be written as a sum of two squares
    • 4k+3 primes never can

The “banned” remainders are those that share a factor with m. Primes must live in coprime residue classes only. The count of valid classes is exactly Euler’s totient function φ(m).

Nested mod — (p % m1) % m2

Taking remainders at two levels:

  • When m2 divides m1 (e.g. m1=15, m2=3): the second level collapses to p % m2 directly — no new information
  • When gcd(m1, m2) = 1 (coprime): the distribution across m2’s residues is governed by how φ(m1) residues map into m2 buckets
  • When m1 is prime and gcd(m1, m2) = 1: the φ(m1) = m1-1 residues spread nearly uniformly across m2’s classes, with slight imbalance when (m1-1) doesn’t divide evenly by m2

5. Chinese Remainder Theorem (CRT)

CRT is the inverse of collapsing remainders. Given a number’s remainders under several coprime moduli, you can uniquely reconstruct the original number.

Example

p % 3 = 2
p % 5 = 1
p % 7 = 6

Step 1 — N = 3 × 5 × 7 = 105

Step 2 — partial products: N1=35, N2=21, N3=15

Step 3 — modular inverses (x such that Ni·x ≡ 1 mod mi):

35·x ≡ 1 (mod 3) → x = 2
21·x ≡ 1 (mod 5) → x = 1
15·x ≡ 1 (mod 7) → x = 1

Step 4 — combine: (2·35·2) + (1·21·1) + (6·15·1) = 251

Step 5 — reduce: 251 % 105 = 41

Verify: 41 % 3 = 2 ✓ 41 % 5 = 1 ✓ 41 % 7 = 6 ✓

CRT is used in RSA (key operations decompose for speed), fast polynomial multiplication (NTT), and secret sharing schemes.


6. Modular inverse and the Euclidean algorithm

The modular inverse of a mod b is the value x such that a·x ≡ 1 (mod b). It’s the “division equivalent” in a world of only integers — instead of 1/35, you find a whole number that behaves like 1/35 under mod 3.

The Euclidean algorithm finds gcd efficiently using the identity:

gcd(a, b) = gcd(b, a % b)

Example:

gcd(35, 3) = gcd(3, 2) = gcd(2, 1) = gcd(1, 0) = 1

The extended Euclidean algorithm tracks extra variables during the same process to also produce x, y satisfying a·x + b·y = gcd(a, b). When gcd = 1, this gives a·x ≡ 1 (mod b) — the modular inverse — in O(log n) steps regardless of the size of the numbers.

brute force:       O(n)       breaks down fast
Euclidean gcd:     O(log n)   always fast
extended version:  O(log n)   also yields the modular inverse for free

7. Digit-unzipping as a factoring approach

The idea: given a semiprime n = p × q, recover p and q digit by digit from the least significant end.

Core insight — the last digit of n constrains the last digits of p and q. For example if n ends in 3, the only valid last-digit pairs (both odd) are those where dp × dq ≡ 3 (mod 10). Then at each subsequent digit position, the partial product must still match n from the right.

Digit unzipping Example

n = 10403 = 101 × 103

n ends in 3. Valid last-digit pairs where dp × dq ≡ 3 (mod 10), both odd: (1,3), (3,1), (7,9), (9,7)

Depth 1 — last digit

p so farq so farp×q mod 10n mod 10match?
1333pass
3133pass
7933pass
9733pass

All four branches survive. We pursue each.

Depth 2 — tens digit (branch from p=1, q=3)

p so farq so farp×q mod 100n mod 100match?
010333pass
0113133fail
0123233fail
1103333fail
most fail

The digit match check kills most candidates immediately.

Depth 3 — hundreds digit (branch from p=101, q=_03)

p so farq so farp×q mod 1000n mod 1000match?
101003303403fail
101103403403pass
101203503403fail

Only (101, 103) survives.

Depth 4 — thousands digit

p so farq so farp×q mod 10000n mod 10000match?
0101010310403 mod 10000 = 403403pass
11010103113403 mod 10000 = 3403403fail

Depth 5 — final check

pqp × q= n?result
1011031040310403found

Why 10403 is a clean example

101 and 103 are nearly equal (both close to √10403 ≈ 102), so the search finds them very early in the (1,3) branch without backtracking. In the worst case — factors far apart in size — far more branches survive each depth before the product constraint kills them.

Pruning layers:

  1. (p_partial × q_partial) % 10^depth == n % 10^depth — digit match from the right
  2. p_partial × q_partial <= n — can’t exceed n
  3. Digit sum mod 9 consistency — since p×q ≡ n (mod 9) always

Parallelism — each branch of the search tree is completely independent, making this embarrassingly parallel. However, the tree grows exponentially while parallelism only scales linearly, so parallelism can’t overcome the fundamental exponential growth.

Runtime complexity:

worst case:        O(10^d) where d = digit count ≈ O(n)
with pruning:      depends entirely on branching factor × survival rate per level

This is comparable to brute force trial division in the worst case. To beat GNFS (the best classical algorithm), pruning would need to achieve sub-exponential survival rate — nobody has found a digit-based strategy that achieves this.

Connection to real attacks:

  • The sequential digit approach is related to p-adic lifting / Hensel’s lemma
  • Coppersmith’s algorithm (1996) uses lattice methods (LLL) to recover a full factor given roughly half its digits — formalizing the “partial information constrains the search” intuition
  • Lattice-based factoring encodes partial factor knowledge as a geometric object and finds the shortest vector to recover the answer

Sample C Code

Below is a sample program written in C which provides an implementation of this “unzipping” method for small n.

// @file unzip-coprimes.c
// @description sample C code for "unzipping" coprime numbers.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>

typedef unsigned long long u64;

static u64 nodes_explored = 0;
static int found = 0;

// digit sum mod 9 for cheap consistency check
static int digit_sum_mod9(u64 n) {
    int s = 0;
    while (n > 0) { s += n % 10; n /= 10; }
    return s % 9;
}

void search(u64 n, u64 p, u64 q, int depth, u64 place, int n_digits, int target_mod9) {
    nodes_explored++;

    // base case: all digit positions filled
    if (depth == n_digits) {
        if (p > 1 && q > 1 && p * q == n) {
            printf("  found:  %llu x %llu\n", p, q);
            found = 1;
        }
        return;
    }

    u64 next_place = place * 10;

    for (int dp = 0; dp <= 9; dp++) {
        for (int dq = 0; dq <= 9; dq++) {
            u64 p_new = p + (u64)dp * place;
            u64 q_new = q + (u64)dq * place;

            // prune 1: enforce p <= q to avoid duplicate pairs
            if (p_new > q_new) continue;

            // prune 2: partial product must match n digit-by-digit from the right
            if ((p_new * q_new) % next_place != n % next_place) continue;

            // prune 3: partial product already exceeds n
            if (p_new > 1 && q_new > 1 && p_new * q_new > n) continue;

            // prune 4: at final depth, digit sum mod 9 must match
            if (depth == n_digits - 1 &&
                (digit_sum_mod9(p_new) * digit_sum_mod9(q_new)) % 9 != target_mod9) continue;

            search(n, p_new, q_new, depth + 1, next_place, n_digits, target_mod9);
        }
    }
}

int count_digits(u64 n) {
    int d = 0;
    while (n > 0) { d++; n /= 10; }
    return d;
}

void factor(u64 n) {
    printf("factoring: %llu\n", n);
    nodes_explored = 0;
    found = 0;

    if (n < 4) { printf("  too small\n\n"); return; }

    // handle even numbers
    if (n % 2 == 0) {
        printf("  found:  2 x %llu\n\n", n / 2);
        return;
    }

    // handle multiples of 5
    if (n % 5 == 0) {
        printf("  found:  5 x %llu\n\n", n / 5);
        return;
    }

    int d = count_digits(n);
    int target_mod9 = digit_sum_mod9(n);

    // build valid last-digit starting pairs based on n % 10
    // both digits must be odd, product must match last digit of n
    // enforce dp <= dq to avoid duplicates from the start
    int pairs[8][2];
    int npairs = 0;
    int last = n % 10;
    for (int dp = 1; dp <= 9; dp += 2) {
        for (int dq = dp; dq <= 9; dq += 2) {
            if ((dp * dq) % 10 == last) {
                pairs[npairs][0] = dp;
                pairs[npairs][1] = dq;
                npairs++;
            }
        }
    }

    for (int i = 0; i < npairs; i++) {
        search(n, pairs[i][0], pairs[i][1], 1, 10, d, target_mod9);
    }

    if (!found) printf("  prime (no factors found)\n");
    printf("  nodes:  %llu\n\n", nodes_explored);
}

int main(int argc, char *argv[]) {
    if (argc > 1) {
        for (int i = 1; i < argc; i++) {
            factor(strtoull(argv[i], NULL, 10));
        }
    } else {
        u64 tests[] = {
            21,         // 3 x 7
            10403,      // 101 x 103
            1189,       // 29 x 41
            8051,       // 83 x 97
            1046527,    // 953 x 1099
            104729,     // prime
            9999991,    // prime
        };
        int n = sizeof(tests) / sizeof(tests[0]);
        for (int i = 0; i < n; i++) factor(tests[i]);
    }
    return 0;
}

To run from the CLI use the following:

# compile source code:
gcc unzip-coprimes.c -o unzip-coprimes

# run program:
./unzip-coprimes

8. Empirical benchmark

Running the digit-unzipping algorithm on random semiprimes of increasing digit count showed:

digitsnodes exploredtime
6~500<5ms
8~70<1ms
10~33,000~130ms
12~70,000~260ms
14>1,000,000>3s
16>1,000,000>3s

On a log scale, nodes grow roughly linearly with digit count — confirming exponential growth. The algorithm breaks down around 13-14 digits without better pruning. For comparison, RSA-250 (250 digits) required 2700 CPU core-years using GNFS across tens of thousands of machines over several months.


9. RSA challenge context

challengedigitsstatuscompute
RSA-100100solved 1991days on a supercomputer
RSA-129129solved 199417 years after being posed
RSA-250250solved 20202700 CPU core-years, few months wall time
RSA-260260unsolvedestimated 10-30x harder than RSA-250
RSA-2048617in active useeffectively unbreakable classically

The difficulty gap between each solved number grows faster than hardware improves. RSA-260, posted in 1991, remains unsolved after 35 years despite RSA-250 falling in 2020.

Quantum computers running Shor’s algorithm would break RSA in polynomial time. NIST standardized the first post-quantum cryptographic algorithms in 2024, which are ironically based on lattice problems — the same mathematical structure underlying some of the best classical attacks on RSA.