FHE16: End-to-End Homomorphic Encryption That Runs Everywhere

Description: A deterministic 16‑bit integer‑NTT FHE stack designed for portability—from Arduino‑class MCUs to servers and accelerators—while keeping the same parameters and byte‑identical ciphertext outputs across devices.

Why it matters: FHE lets blockchains program when (or whether) data is opened, and—without TEE—enables fully end-to-end encrypted compute and verification that are decentralized.

Core design: 16-bit arithmetic plus integer-only NTT = deterministic, portable execution with the same parameters and the same ciphertext outputs on MCU → laptop → server → accelerator.

Blockchain-friendly: Re-execution gives simple verification/consensus; long term we add VC proofs to make verification extremely cheap.

Why Homomorphic Encryption?

Control the blockchain “open time”

In many on-chain apps (auctions, orderflow, private analytics), when data is revealed has economic and security consequences. Commit-reveal and time-lock are indirect fixes. FHE goes further: it lets you compute without opening plaintext.

Outcomes
- Run matching/price logic on ciphertext, and program if/when to reveal.
- Private settlement: accounting/risks progress while plaintext disclosure is delayed—or skipped—per policy.
- Schedulable disclosure: enables onboarding of all off-chain financial opportunities, integrating threshold decryption or escrowed key policies for controlled reveal timing.

Note: FHE is not “delay the reveal.” It is compute without reveal, so you can precisely design what/when/how much to disclose.

The ultimate End-to-End

Messaging E2EE protects transit. FHE extends E2E to the compute phase: storage, transport, and computation all stay encrypted.

Decentralized compute & verification—unlike TEE

TEEs centralize trust in vendor hardware/attestation and are operationally concentrated.
With FHE:
- Anyone (including small devices) can contribute encrypted computation.
- Verification is protocolic via re-execution or VC (zk proofs, etc.)—no vendor root of trust.

FHE16: Vision & Architecture of LatticA

Vision: Build an FHE stack that is fast, deterministic, and portable enough to run even on Arduino-class devices, while staying compatible across all machines.

16-bit arithmetic for low-end environments

Why 16-bit? MCUs have tight registers, memory, and ALUs.
Approach: compose multiple 16-bit prime moduli via RNS to reach target security/precision; keep bootstrap/key-switch paths 16-bit friendly*.

Tip: RNS & NTT Split a big modulus into several 16-bit primes, accelerate polynomial ops with NTT per prime, then recombine with CRT. You get high effective precision with tiny word sizes.

NTT instead of FFT: identical results on every device

Floating-point FFT can diverge (AVX2 vs AVX-512, compiler/CPU differences).
FHE16 uses integer-only NTT, avoiding FP rounding. Same inputs + same parameters + same seeds → byte-identical ciphertexts.

test code : https://github.com/waLLLnut/CheckingConsistencyTFHECiphertext_AVX2_VS_AVX512

FP rounding differences cause ciphertext divergence. FHE16’s integer-NTT path eliminates this at the design level.

Server-class CPU performance (updated)

Environment: Intel(R) Xeon(R) Gold 6240R @ 2.40 GHz · 96 threads · AVX-512

Notes: Values are taken from the provided screenshot of Table 3. Multiplication and Division/Modulo were marked “Under Investigation” and are excluded here. Unit: milliseconds (avg latency).

Operation

TFHE-rs

FHE16 (ours)

Speedup (tfhe-rs / ours)

Negation (–)

148.57 ms

66.46 ms

2.23×

Add / Sub (+, –)

182.48 ms

94.80 ms

1.92×

ABS

246.05 ms

69.71 ms

3.52×

Equal / Not Equal (eq, ne)

139.56 ms

74.77 ms

1.86×

Comparisons (ge, gt, le, lt)

180.29 ms

88.92 ms

2.02×

Max / Min (max, min)

256.00 ms

101.90 ms

2.51×

Bitwise (&, |, ^)

40.90 ms

21.24 ms

1.92×

Select

64.41 ms

30.63 ms

2.10×

Table 1. Performance comparison on 64-bit integer ops

“Our FHE16 supports computation on inputs ranging from 1 bit to 256 bits.”

Summary: Across the eight ops with a baseline, the mean speedup is ≈ 2.26×, with ABS peaking at 3.52×.

One parameter set, same ciphertexts on GPU/FPGA/ASIC (AGIC)

Rule: a single parameter pack yields the same ciphertext outputs everywhere.
GPU: bandwidth-aware NTT/CRT kernels, stream parallelism.
FPGA: pipelined modular mult + butterflies, optimized for latency/power.
ASIC/AGIC: fixed-precision modular engines and on-chip memory tiers for minimal latency.
Result: identical ciphertexts → ideal for consensus/verification.

Open participation → truly decentralized compute

Reference kernels for light nodes, browsers (WASM), and mobile.
Operators submit ciphertext results; verifiers re-execute or check VC proofs; incentives reward both.

Ciphertext verification: re-execution now, VC later

Re-execution: same inputs, parameters, and seeds → bit-for-bit identical results; no extra consensus.
Long-term VC: proofs (e.g., ZK) replace heavy re-execution for cheap verification.
Policy: allow a re-execution window first; store proofs for long-term validation.

Security note Determinism relies on public seed derivation (e.g., block header/VRF/tx-hash) and integer-only kernels. FHE16 fixes PRG policies and bans FP math in core kernels.

Roadmap (Draft)

Phase

Focus

Description

CPU Integer-NTT Core

16-bit RNS architecture, integer NTT kernels, fixed serialization. Benchmarked vs TFHE family with deterministic verification.

Embedded / Mobile

Arduino & MCU reference builds, ARM-NEON optimization, WASM kernel.

GPU / FPGA

Stream-parallel NTT, on-chip pipeline optimization, memory tier tuning.

VC (Verifiable Computation)

Introduce re-execution verification windows → VC proof migration. Proof-friendly circuit definitions.

ASIC / AGIC

Low-power modular engines, cache-optimized pipelines, full timing closure.

Public & Private Research Outputs — Hashed Keyword Registry (SHA-256)

Keyword:0c2dec8c7e6208a58dc99e6f5155a88bd9ee1061c30f3b8b854bc7027ad5c278
- Keyword Open: FHE16
- Summary: Uses 16-bit primes to accelerate homomorphic computation on both low-end devices and servers.
- Status: Open
Keyword:37be6a2215921f8e418f27caf7fcdb195d304c58ddc8119b75a63bee358d94f7
- Keyword Open: Actively secure one-bit sampling over secret sharing with a composite modulus
- Summary: Enables efficient Multiparty FHE in preprocessing settings for FHE16, CKKS, and BGV.
- Status: Open
Keyword:f321ce2f5032c6d408f553606755b51378366c99adfa37337c95c1a330577139
- Keyword Open: —
- Summary: Current benchmarks show FHE16 ≥ 2× ZAMA on integer operations; public release planned in November.
Keyword:e4462274db0498727f14b6ae254c32bbe0b1eda2f61c192ef71c896c88e7b7f5
- Keyword Open: —
- Summary: Core FHE16 technique expected to reduce error probability to 2^-128 while improving speed; paper to be posted on arXiv first.
Keyword:ffc2d4c64b8a683cc44cc98f910d2d9f7d7d873668f10b0bae14d025421016d8
- Keyword Open: —
- Summary: FHE-based MPC expected to significantly improve verification asymptotic complexity.
Keyword:6c0214904ca1f25da77c4db533603d793f3c1cb335086e84f918ea10677c45d5
- Keyword Open: —
- Summary: FHE-based MPC expected to substantially improve blockchain performance.
Keyword:d5ecfb2451e705ba59754f815b2084693aa251dda15a650785bb2eddf97f5de2
- Keyword Open: —
- Summary: FHE-based MPC expected to dramatically reduce on-chain latency.
Keyword:8ab272949f6b7bd5dc48830d631a234bc1dd1bf3e82f8298c944f9d65a56a9a0
- Keyword Open: —
- Summary: Newly proposed bootstrapping expected to slightly reduce error versus prior work.
Keyword:fed0b54c1b6928c74384a5998b61465adfb72bb5bb6d176a96f57fb5a694ed17
- Keyword Open: —
- Summary: Transciphering improvements.

PreviousLatticA: Confidential Coprocessor for Solana

Last updated 24 days ago