FHE16: End-to-End Homomorphic Encryption That Runs Everywhere

Description: A deterministic 16‑bit integer‑NTT FHE stack designed for portability—from Arduino‑class MCUs to servers and accelerators—while keeping the same parameters and byte‑identical ciphertext outputs across devices.

Why it matters: FHE lets blockchains program when (or whether) data is opened, and—without TEE—enables fully end-to-end encrypted compute and verification that are decentralized.

Core design: 16-bit arithmetic plus integer-only NTT = deterministic, portable execution with the same parameters and the same ciphertext outputs on MCU → laptop → server → accelerator.

Blockchain-friendly: Re-execution gives simple verification/consensus; long term we add VC proofs to make verification extremely cheap.


Why Homomorphic Encryption?

Control the blockchain “open time”

In many on-chain apps (auctions, orderflow, private analytics), when data is revealed has economic and security consequences. Commit-reveal and time-lock are indirect fixes. FHE goes further: it lets you compute without opening plaintext.

  • Outcomes

    • Run matching/price logic on ciphertext, and program if/when to reveal.

    • Private settlement: accounting/risks progress while plaintext disclosure is delayed—or skipped—per policy.

    • Schedulable disclosure: enables onboarding of all off-chain financial opportunities, integrating threshold decryption or escrowed key policies for controlled reveal timing.

Note: FHE is not “delay the reveal.” It is compute without reveal, so you can precisely design what/when/how much to disclose.

The ultimate End-to-End

Messaging E2EE protects transit. FHE extends E2E to the compute phase: storage, transport, and computation all stay encrypted.

Decentralized compute & verification—unlike TEE

  • TEEs centralize trust in vendor hardware/attestation and are operationally concentrated.

  • With FHE:

    • Anyone (including small devices) can contribute encrypted computation.

    • Verification is protocolic via re-execution or VC (zk proofs, etc.)—no vendor root of trust.


FHE16: Vision & Architecture of LatticA

Vision: Build an FHE stack that is fast, deterministic, and portable enough to run even on Arduino-class devices, while staying compatible across all machines.

16-bit arithmetic for low-end environments

  • Why 16-bit? MCUs have tight registers, memory, and ALUs.

  • Approach: compose multiple 16-bit prime moduli via RNS to reach target security/precision; keep bootstrap/key-switch paths 16-bit friendly*.

Tip: RNS & NTT Split a big modulus into several 16-bit primes, accelerate polynomial ops with NTT per prime, then recombine with CRT. You get high effective precision with tiny word sizes.

NTT instead of FFT: identical results on every device

  • Floating-point FFT can diverge (AVX2 vs AVX-512, compiler/CPU differences).

  • FHE16 uses integer-only NTT, avoiding FP rounding. Same inputs + same parameters + same seeds → byte-identical ciphertexts.

Fig. 1. Ciphertext output after homomorphically adding two pre-stored ciphertexts (123, 456) on a Skylake server (AVX-512).
Fig. 2. Ciphertext output after homomorphically adding two pre-stored ciphertexts (123, 456) on a Haswell server (AVX-2).

test code : https://github.com/waLLLnut/CheckingConsistencyTFHECiphertext_AVX2_VS_AVX512

FP rounding differences cause ciphertext divergence. FHE16’s integer-NTT path eliminates this at the design level.

Fig. 3. At the GPU level developed by the ZAMA, the underlying algorithm itself changes, making it completely incompatible. (https://docs.zama.ai/tfhe-rs/1.0/configuration/run_on_gpu)

Server-class CPU performance (updated)

Environment: Intel(R) Xeon(R) Gold 6240R @ 2.40 GHz · 96 threads · AVX-512

Notes: Values are taken from the provided screenshot of Table 3. Multiplication and Division/Modulo were marked “Under Investigation” and are excluded here. Unit: milliseconds (avg latency).

Operation
TFHE-rs
FHE16 (ours)
Speedup (tfhe-rs / ours)

Negation (–)

148.57 ms

66.46 ms

2.23×

Add / Sub (+, –)

182.48 ms

94.80 ms

1.92×

ABS

246.05 ms

69.71 ms

3.52×

Equal / Not Equal (eq, ne)

139.56 ms

74.77 ms

1.86×

Comparisons (ge, gt, le, lt)

180.29 ms

88.92 ms

2.02×

Max / Min (max, min)

256.00 ms

101.90 ms

2.51×

Bitwise (&, |, ^)

40.90 ms

21.24 ms

1.92×

Select

64.41 ms

30.63 ms

2.10×

Table 1. Performance comparison on 64-bit integer ops

“Our FHE16 supports computation on inputs ranging from 1 bit to 256 bits.”

Summary: Across the eight ops with a baseline, the mean speedup is ≈ 2.26×, with ABS peaking at 3.52×.

One parameter set, same ciphertexts on GPU/FPGA/ASIC (AGIC)

  • Rule: a single parameter pack yields the same ciphertext outputs everywhere.

  • GPU: bandwidth-aware NTT/CRT kernels, stream parallelism.

  • FPGA: pipelined modular mult + butterflies, optimized for latency/power.

  • ASIC/AGIC: fixed-precision modular engines and on-chip memory tiers for minimal latency.

  • Result: identical ciphertexts → ideal for consensus/verification.

Open participation → truly decentralized compute

  • Reference kernels for light nodes, browsers (WASM), and mobile.

  • Operators submit ciphertext results; verifiers re-execute or check VC proofs; incentives reward both.

Ciphertext verification: re-execution now, VC later

  • Re-execution: same inputs, parameters, and seeds → bit-for-bit identical results; no extra consensus.

  • Long-term VC: proofs (e.g., ZK) replace heavy re-execution for cheap verification.

  • Policy: allow a re-execution window first; store proofs for long-term validation.

Security note Determinism relies on public seed derivation (e.g., block header/VRF/tx-hash) and integer-only kernels. FHE16 fixes PRG policies and bans FP math in core kernels.


Roadmap (Draft)

Phase
Focus
Description

0

CPU Integer-NTT Core

16-bit RNS architecture, integer NTT kernels, fixed serialization. Benchmarked vs TFHE family with deterministic verification.

1

Embedded / Mobile

Arduino & MCU reference builds, ARM-NEON optimization, WASM kernel.

2

GPU / FPGA

Stream-parallel NTT, on-chip pipeline optimization, memory tier tuning.

3

VC (Verifiable Computation)

Introduce re-execution verification windows → VC proof migration. Proof-friendly circuit definitions.

4

ASIC / AGIC

Low-power modular engines, cache-optimized pipelines, full timing closure.


Public & Private Research Outputs — Hashed Keyword Registry (SHA-256)

  1. Keyword:0c2dec8c7e6208a58dc99e6f5155a88bd9ee1061c30f3b8b854bc7027ad5c278

    • Keyword Open: FHE16

    • Summary: Uses 16-bit primes to accelerate homomorphic computation on both low-end devices and servers.

    • Status: Open

  2. Keyword:37be6a2215921f8e418f27caf7fcdb195d304c58ddc8119b75a63bee358d94f7

    • Keyword Open: Actively secure one-bit sampling over secret sharing with a composite modulus

    • Summary: Enables efficient Multiparty FHE in preprocessing settings for FHE16, CKKS, and BGV.

    • Status: Open

  3. Keyword:f321ce2f5032c6d408f553606755b51378366c99adfa37337c95c1a330577139

    • Keyword Open:

    • Summary: Current benchmarks show FHE16 ≥ 2× ZAMA on integer operations; public release planned in November.

  4. Keyword:e4462274db0498727f14b6ae254c32bbe0b1eda2f61c192ef71c896c88e7b7f5

    • Keyword Open:

    • Summary: Core FHE16 technique expected to reduce error probability to 2^-128 while improving speed; paper to be posted on arXiv first.

  5. Keyword:ffc2d4c64b8a683cc44cc98f910d2d9f7d7d873668f10b0bae14d025421016d8

    • Keyword Open:

    • Summary: FHE-based MPC expected to significantly improve verification asymptotic complexity.

  6. Keyword:6c0214904ca1f25da77c4db533603d793f3c1cb335086e84f918ea10677c45d5

    • Keyword Open:

    • Summary: FHE-based MPC expected to substantially improve blockchain performance.

  7. Keyword:d5ecfb2451e705ba59754f815b2084693aa251dda15a650785bb2eddf97f5de2

    • Keyword Open:

    • Summary: FHE-based MPC expected to dramatically reduce on-chain latency.

  8. Keyword:8ab272949f6b7bd5dc48830d631a234bc1dd1bf3e82f8298c944f9d65a56a9a0

    • Keyword Open:

    • Summary: Newly proposed bootstrapping expected to slightly reduce error versus prior work.

  9. Keyword:fed0b54c1b6928c74384a5998b61465adfb72bb5bb6d176a96f57fb5a694ed17

    • Keyword Open:

    • Summary: Transciphering improvements.

Last updated