GPU-accelerated Equihash 192,7 miner in Rust with three solver backends: - CPU: Wagner's algorithm, AVX2 packed slots (xenoncat-style) - OpenCL: full on-GPU solve (kernels/equihash.cl); runs on NVIDIA and AMD - CUDA: driver-API replay of miniZ's extracted fatbin (src/miniz/) Also includes a default-off pearlhash backend (src/pearl/, native CPU core + NVRTC int8-GEMM GPU kernels) and a WIP Ethash CUDA backend (src/ethash/). Reverse-engineering scratch (alpha-miner, pearl-dump/) and the active runtime config (mine.toml) are gitignored; mine.example.toml is the template. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
4.7 KiB
miniz-solver-rs
Basic Rust program that uses the extracted miniZ Equihash 192,7 GPU solver.
It loads the captured CUDA fatbin (../miniz-dump/solver_192_7/equihash192_7.fatbin)
through the CUDA Driver API (raw FFI to libcuda, no external crates) and drives
its kernels on the GPU.
Build & run
cargo build --release
./target/release/miniz-solver # load + enumerate all 57 kernels
./target/release/miniz-solver --launch # also execute a real solver kernel
./target/release/miniz-solver --round0 # replay round 0 (digit_f) with a captured midstate
./target/release/miniz-solver /path/to.fatbin # use a different fatbin
Requires an NVIDIA GPU + driver (/usr/lib/libcuda.so). The fatbin contains
sm_80/sm_86/sm_120 cubins; the driver auto-picks the one for your GPU.
What it does
cuInit→ context on GPU#0cuModuleLoadDataon the raw fatbin (magic0xBA55ED50)cuModuleEnumerateFunctions+cuFuncGetName+cuFuncGetAttribute: lists every kernel with regs / shared / local / max-threads and labels the Wagnern=192,k=7pipeline:digit_f(round 0: BLAKE2b + bucketing) →digit_1..3,digit_4w/5w/6w(rounds 1–6) →digit_l(round 7: solution recovery) →sort_and_compress.- with
--launch: allocates a device buffer and launches the realcleanup<64>(void*, uint)kernel, thencuCtxSynchronize. - with
--round0: drives the real round 0 (digit_f) — allocates the four buffers at their template sizes, launches the exact runtime variant (grid=65536, block=256) with a BLAKE2b midstate captured from a live job, and reads back the bucket counters. Verified output: 33,554,432 = 2^25 entries bucketed into 12288 buckets (the correct 192,7 initial-entry count). - with
--replay [rec.log]: runs the entire solver — parses a recorded pass (recording.log), allocates one arena, rebases every device pointer, and executes all 10 kernels (cleanup → digit_f → digit_1..6 → digit_l → sort_and_compress). All kernels complete; extracts a 128-index candidate. - with
--header <hex>: computes a BLAKE2b(192,7) midstate from a 140-byte header, injects it, and runs the full pipeline (mint a new job). - with
--selftest: BLAKE2b-512 known-answer test (RFC 7693) — PASS. - with
--verify-share: verify a real pool-accepted share (BLAKE2b + Wagner) — VALID. - with
--solve: the complete solver — inject a known header's midstate+tail, run the GPU pipeline, and harvest a solution from the container that the verifier accepts. Reproducibly printsSOLUTION HARVESTED FROM GPU — VALID ✓.
See ../miniz-dump/solver_192_7/ORCHESTRATION.md for the full pipeline + recovery.
Status (honest)
- Pipeline: complete. All 10 kernels run standalone; round 0 verified bit-exact (2^25 entries). Faithful end-to-end replay of miniZ's 192,7 solver.
- Hash model + verification: SOLVED. Captured live stratum (plaintext) via a
logging relay; a real pool-accepted share verifies exactly under
hash(i) = BLAKE2b(header‖LE32(i/2), person="ZcashPoW"+LE32(192)+LE32(7), digest=48)[(i%2)*24..].--verify-sharereproduces VALID ✓ (192/192 zero bits, all 7 Wagner levels) in Rust. So--selftest,blake2b.rs,verify.rsand the solution decoder are all proven against ground truth. - Complete (
--solve). Container = 128 consecutive u32 indices at offset 0; the midstate is textbook BLAKE2b-after-128B and the digit_fuintis the 4 varying header-tail bytes (nonce[28..31]; nonce[20..27] are constant 0). So:header → midstate+tail → GPU pipeline → container[0..128] → VALID solution, reproducibly. The miniZ Equihash 192,7 solver is fully reverse-engineered.
What it does NOT do (scope)
It does not mine or produce valid Equihash solutions. A working solver also needs miniZ's host orchestration, which is not part of the extracted kernels:
- exact device-buffer sizing per round (the kernels' template/array dims give the
bucket geometry, e.g.
uint4[180][6656][32], but the host owns allocation) - the precise
digit_f → digit_1..6 → digit_l → sort_and_compresslaunch sequence with the correct grid/block dims and shared-mem config per round - BLAKE2b midstate setup from the block header + nonce, and the
equi<...>/ScontainerReal192struct layouts passed between kernels
That host logic lives in miniZ's encrypted blob. Reconstructing it (from the SASS
in ../miniz-dump/solver_192_7/equihash192_7.sm_120.sass plus the kernel
signatures in kernels_demangled.txt) is the next step toward a standalone miner.
Files
src/cuda.rs— minimal CUDA Driver API FFI bindingssrc/main.rs— loader / enumerator / launch demobuild.rs— linkslibcuda