FHE16: End-to-End Homomorphic Encryption That Runs Everywhere
Description:A deterministic 16‑bit integer‑NTT FHE stack designed for portability—from Arduino‑class MCUs to servers and accelerators—while keeping the same parameters and byte‑identical ciphertext outputs across devices.
Why it matters: FHE lets blockchains program when (or whether) data is opened, and—without TEE—enables fully end-to-end encrypted compute and verification that are decentralized.
Core design:16-bit arithmetic plus integer-only NTT = deterministic, portable execution with the same parameters and the same ciphertext outputs on MCU → laptop → server → accelerator.
Blockchain-friendly: Re-execution gives simple verification/consensus; long term we add VC proofs to make verification extremely cheap.
Why Homomorphic Encryption?
Control the blockchain “open time”
In many on-chain apps (auctions, orderflow, private analytics), when data is revealed has economic and security consequences. Commit-reveal and time-lock are indirect fixes. FHE goes further: it lets you compute without opening plaintext.
Outcomes
Run matching/price logic on ciphertext, and program if/when to reveal.
Private settlement: accounting/risks progress while plaintext disclosure is delayed—or skipped—per policy.
Schedulable disclosure: enables onboarding of all off-chain financial opportunities, integrating threshold decryption or escrowed key policies for controlled reveal timing.
Note: FHE is not “delay the reveal.” It is compute without reveal, so you can precisely design what/when/how much to disclose.
The ultimate End-to-End
Messaging E2EE protects transit. FHE extends E2E to the compute phase: storage, transport, and computation all stay encrypted.
Decentralized compute & verification—unlike TEE
TEEs centralize trust in vendor hardware/attestation and are operationally concentrated.
With FHE:
Anyone (including small devices) can contribute encrypted computation.
Verification is protocolic via re-execution or VC (zk proofs, etc.)—no vendor root of trust.
FHE16: Vision & Architecture of LatticA
Vision: Build an FHE stack that is fast, deterministic, and portable enough to run even on Arduino-class devices, while staying compatible across all machines.
16-bit arithmetic for low-end environments
Why 16-bit? MCUs have tight registers, memory, and ALUs.
Approach: compose multiple 16-bit prime moduli via RNS to reach target security/precision; keep bootstrap/key-switch paths 16-bit friendly*.
Tip: RNS & NTT
Split a big modulus into several 16-bit primes, accelerate polynomial ops with NTT per prime, then recombine with CRT. You get high effective precision with tiny word sizes.
NTT instead of FFT: identical results on every device
Floating-point FFT can diverge (AVX2 vs AVX-512, compiler/CPU differences).
FHE16 uses integer-only NTT, avoiding FP rounding. Same inputs + same parameters + same seeds → byte-identical ciphertexts.
Fig. 1. Ciphertext output after homomorphically adding two pre-stored ciphertexts (123, 456) on a Skylake server (AVX-512).
Fig. 2. Ciphertext output after homomorphically adding two pre-stored ciphertexts (123, 456) on a Haswell server (AVX-2).
FP rounding differences cause ciphertext divergence. FHE16’s integer-NTT path eliminates this at the design level.
Fig. 3. At the GPU level developed by the ZAMA, the underlying algorithm itself changes, making it completely incompatible.
(https://docs.zama.ai/tfhe-rs/1.0/configuration/run_on_gpu)
Notes: Values are taken from the provided screenshot of Table 3. Multiplication and Division/Modulo were marked “Under Investigation” and are excluded here.
Unit: milliseconds (avg latency).
Operation
TFHE-rs
FHE16 (ours)
Speedup (tfhe-rs / ours)
Negation (–)
148.57 ms
66.46 ms
2.23×
Add / Sub (+, –)
182.48 ms
94.80 ms
1.92×
ABS
246.05 ms
69.71 ms
3.52×
Equal / Not Equal (eq, ne)
139.56 ms
74.77 ms
1.86×
Comparisons (ge, gt, le, lt)
180.29 ms
88.92 ms
2.02×
Max / Min (max, min)
256.00 ms
101.90 ms
2.51×
Bitwise (&, |, ^)
40.90 ms
21.24 ms
1.92×
Select
64.41 ms
30.63 ms
2.10×
Table 1. Performance comparison on 64-bit integer ops
“Our FHE16 supports computation on inputs ranging from 1 bit to 256 bits.”
Summary: Across the eight ops with a baseline, the mean speedup is ≈ 2.26×, with ABS peaking at 3.52×.
One parameter set, same ciphertexts on GPU/FPGA/ASIC (AGIC)
Rule: a single parameter pack yields the same ciphertext outputs everywhere.
Ciphertext verification: re-execution now, VC later
Re-execution: same inputs, parameters, and seeds → bit-for-bit identical results; no extra consensus.
Long-term VC: proofs (e.g., ZK) replace heavy re-execution for cheap verification.
Policy: allow a re-execution window first; store proofs for long-term validation.
Security note
Determinism relies on public seed derivation (e.g., block header/VRF/tx-hash) and integer-only kernels.
FHE16 fixes PRG policies and bans FP math in core kernels.
Roadmap (Draft)
Phase
Focus
Description
0
CPU Integer-NTT Core
16-bit RNS architecture, integer NTT kernels, fixed serialization. Benchmarked vs TFHE family with deterministic verification.