FHE16: End-to-End Homomorphic Encryption That Runs Everywhere
Description: A deterministic 16‑bit integer‑NTT FHE stack designed for portability—from Arduino‑class MCUs to servers and accelerators—while keeping the same parameters and byte‑identical ciphertext outputs across devices.
Why it matters: FHE lets blockchains program when (or whether) data is opened, and—without TEE—enables fully end-to-end encrypted compute and verification that are decentralized.
Core design: 16-bit arithmetic plus integer-only NTT = deterministic, portable execution with the same parameters and the same ciphertext outputs on MCU → laptop → server → accelerator.
Blockchain-friendly: Re-execution gives simple verification/consensus; long term we add VC proofs to make verification extremely cheap.
Why Homomorphic Encryption?
Control the blockchain “open time”
In many on-chain apps (auctions, orderflow, private analytics), when data is revealed has economic and security consequences. Commit-reveal and time-lock are indirect fixes. FHE goes further: it lets you compute without opening plaintext.
Outcomes
Run matching/price logic on ciphertext, and program if/when to reveal.
Private settlement: accounting/risks progress while plaintext disclosure is delayed—or skipped—per policy.
Schedulable disclosure: enables onboarding of all off-chain financial opportunities, integrating threshold decryption or escrowed key policies for controlled reveal timing.
Note: FHE is not “delay the reveal.” It is compute without reveal, so you can precisely design what/when/how much to disclose.
The ultimate End-to-End
Messaging E2EE protects transit. FHE extends E2E to the compute phase: storage, transport, and computation all stay encrypted.
Decentralized compute & verification—unlike TEE
TEEs centralize trust in vendor hardware/attestation and are operationally concentrated.
With FHE:
Anyone (including small devices) can contribute encrypted computation.
Verification is protocolic via re-execution or VC (zk proofs, etc.)—no vendor root of trust.
FHE16: Vision & Architecture of LatticA
Vision: Build an FHE stack that is fast, deterministic, and portable enough to run even on Arduino-class devices, while staying compatible across all machines.
16-bit arithmetic for low-end environments
Why 16-bit? MCUs have tight registers, memory, and ALUs.
Approach: compose multiple 16-bit prime moduli via RNS to reach target security/precision; keep bootstrap/key-switch paths 16-bit friendly*.
Tip: RNS & NTT Split a big modulus into several 16-bit primes, accelerate polynomial ops with NTT per prime, then recombine with CRT. You get high effective precision with tiny word sizes.
NTT instead of FFT: identical results on every device
Floating-point FFT can diverge (AVX2 vs AVX-512, compiler/CPU differences).
FHE16 uses integer-only NTT, avoiding FP rounding. Same inputs + same parameters + same seeds → byte-identical ciphertexts.
test code : https://github.com/waLLLnut/CheckingConsistencyTFHECiphertext_AVX2_VS_AVX512
FP rounding differences cause ciphertext divergence. FHE16’s integer-NTT path eliminates this at the design level.

Server-class CPU performance (updated)
Environment: Intel(R) Xeon(R) Gold 6240R @ 2.40 GHz · 96 threads · AVX-512
Notes: Values are taken from the provided screenshot of Table 3. Multiplication and Division/Modulo were marked “Under Investigation” and are excluded here. Unit: milliseconds (avg latency).
Negation (–)
148.57 ms
66.46 ms
2.23×
Add / Sub (+, –)
182.48 ms
94.80 ms
1.92×
ABS
246.05 ms
69.71 ms
3.52×
Equal / Not Equal (eq, ne)
139.56 ms
74.77 ms
1.86×
Comparisons (ge, gt, le, lt)
180.29 ms
88.92 ms
2.02×
Max / Min (max, min)
256.00 ms
101.90 ms
2.51×
Bitwise (&, |, ^)
40.90 ms
21.24 ms
1.92×
Select
64.41 ms
30.63 ms
2.10×
Table 1. Performance comparison on 64-bit integer ops








“Our FHE16 supports computation on inputs ranging from 1 bit to 256 bits.”
Summary: Across the eight ops with a baseline, the mean speedup is ≈ 2.26×, with ABS peaking at 3.52×.
One parameter set, same ciphertexts on GPU/FPGA/ASIC (AGIC)
Rule: a single parameter pack yields the same ciphertext outputs everywhere.
GPU: bandwidth-aware NTT/CRT kernels, stream parallelism.
FPGA: pipelined modular mult + butterflies, optimized for latency/power.
ASIC/AGIC: fixed-precision modular engines and on-chip memory tiers for minimal latency.
Result: identical ciphertexts → ideal for consensus/verification.
Open participation → truly decentralized compute
Reference kernels for light nodes, browsers (WASM), and mobile.
Operators submit ciphertext results; verifiers re-execute or check VC proofs; incentives reward both.
Ciphertext verification: re-execution now, VC later
Re-execution: same inputs, parameters, and seeds → bit-for-bit identical results; no extra consensus.
Long-term VC: proofs (e.g., ZK) replace heavy re-execution for cheap verification.
Policy: allow a re-execution window first; store proofs for long-term validation.
Security note Determinism relies on public seed derivation (e.g., block header/VRF/tx-hash) and integer-only kernels. FHE16 fixes PRG policies and bans FP math in core kernels.
Roadmap (Draft)
0
CPU Integer-NTT Core
16-bit RNS architecture, integer NTT kernels, fixed serialization. Benchmarked vs TFHE family with deterministic verification.
1
Embedded / Mobile
Arduino & MCU reference builds, ARM-NEON optimization, WASM kernel.
2
GPU / FPGA
Stream-parallel NTT, on-chip pipeline optimization, memory tier tuning.
3
VC (Verifiable Computation)
Introduce re-execution verification windows → VC proof migration. Proof-friendly circuit definitions.
4
ASIC / AGIC
Low-power modular engines, cache-optimized pipelines, full timing closure.
Public & Private Research Outputs — Hashed Keyword Registry (SHA-256)
Keyword:
0c2dec8c7e6208a58dc99e6f5155a88bd9ee1061c30f3b8b854bc7027ad5c278Keyword Open: FHE16
Summary: Uses 16-bit primes to accelerate homomorphic computation on both low-end devices and servers.
Status: Open
Keyword:
37be6a2215921f8e418f27caf7fcdb195d304c58ddc8119b75a63bee358d94f7Keyword Open: Actively secure one-bit sampling over secret sharing with a composite modulus
Summary: Enables efficient Multiparty FHE in preprocessing settings for FHE16, CKKS, and BGV.
Status: Open
Keyword:
f321ce2f5032c6d408f553606755b51378366c99adfa37337c95c1a330577139Keyword Open: —
Summary: Current benchmarks show FHE16 ≥ 2× ZAMA on integer operations; public release planned in November.
Keyword:
e4462274db0498727f14b6ae254c32bbe0b1eda2f61c192ef71c896c88e7b7f5Keyword Open: —
Summary: Core FHE16 technique expected to reduce error probability to 2^-128 while improving speed; paper to be posted on arXiv first.
Keyword:
ffc2d4c64b8a683cc44cc98f910d2d9f7d7d873668f10b0bae14d025421016d8Keyword Open: —
Summary: FHE-based MPC expected to significantly improve verification asymptotic complexity.
Keyword:
6c0214904ca1f25da77c4db533603d793f3c1cb335086e84f918ea10677c45d5Keyword Open: —
Summary: FHE-based MPC expected to substantially improve blockchain performance.
Keyword:
d5ecfb2451e705ba59754f815b2084693aa251dda15a650785bb2eddf97f5de2Keyword Open: —
Summary: FHE-based MPC expected to dramatically reduce on-chain latency.
Keyword:
8ab272949f6b7bd5dc48830d631a234bc1dd1bf3e82f8298c944f9d65a56a9a0Keyword Open: —
Summary: Newly proposed bootstrapping expected to slightly reduce error versus prior work.
Keyword:
fed0b54c1b6928c74384a5998b61465adfb72bb5bb6d176a96f57fb5a694ed17Keyword Open: —
Summary: Transciphering improvements.
Last updated