Files
jackpot-miner/collab/jmprcx-solver/README.md
T
jackpotincorporated e2fab622b5 Initial commit: jackpotminer Equihash 192,7 miner
GPU-accelerated Equihash 192,7 miner in Rust with three solver backends:
- CPU: Wagner's algorithm, AVX2 packed slots (xenoncat-style)
- OpenCL: full on-GPU solve (kernels/equihash.cl); runs on NVIDIA and AMD
- CUDA: driver-API replay of miniZ's extracted fatbin (src/miniz/)

Also includes a default-off pearlhash backend (src/pearl/, native CPU core +
NVRTC int8-GEMM GPU kernels) and a WIP Ethash CUDA backend (src/ethash/).

Reverse-engineering scratch (alpha-miner, pearl-dump/) and the active runtime
config (mine.toml) are gitignored; mine.example.toml is the template.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 23:08:20 -04:00

4.7 KiB
Raw Blame History

miniz-solver-rs

Basic Rust program that uses the extracted miniZ Equihash 192,7 GPU solver. It loads the captured CUDA fatbin (../miniz-dump/solver_192_7/equihash192_7.fatbin) through the CUDA Driver API (raw FFI to libcuda, no external crates) and drives its kernels on the GPU.

Build & run

cargo build --release
./target/release/miniz-solver                 # load + enumerate all 57 kernels
./target/release/miniz-solver --launch        # also execute a real solver kernel
./target/release/miniz-solver --round0        # replay round 0 (digit_f) with a captured midstate
./target/release/miniz-solver /path/to.fatbin # use a different fatbin

Requires an NVIDIA GPU + driver (/usr/lib/libcuda.so). The fatbin contains sm_80/sm_86/sm_120 cubins; the driver auto-picks the one for your GPU.

What it does

  • cuInit → context on GPU#0
  • cuModuleLoadData on the raw fatbin (magic 0xBA55ED50)
  • cuModuleEnumerateFunctions + cuFuncGetName + cuFuncGetAttribute: lists every kernel with regs / shared / local / max-threads and labels the Wagner n=192,k=7 pipeline: digit_f (round 0: BLAKE2b + bucketing) → digit_1..3, digit_4w/5w/6w (rounds 16) → digit_l (round 7: solution recovery) → sort_and_compress.
  • with --launch: allocates a device buffer and launches the real cleanup<64>(void*, uint) kernel, then cuCtxSynchronize.
  • with --round0: drives the real round 0 (digit_f) — allocates the four buffers at their template sizes, launches the exact runtime variant (grid=65536, block=256) with a BLAKE2b midstate captured from a live job, and reads back the bucket counters. Verified output: 33,554,432 = 2^25 entries bucketed into 12288 buckets (the correct 192,7 initial-entry count).
  • with --replay [rec.log]: runs the entire solver — parses a recorded pass (recording.log), allocates one arena, rebases every device pointer, and executes all 10 kernels (cleanup → digit_f → digit_1..6 → digit_l → sort_and_compress). All kernels complete; extracts a 128-index candidate.
  • with --header <hex>: computes a BLAKE2b(192,7) midstate from a 140-byte header, injects it, and runs the full pipeline (mint a new job).
  • with --selftest: BLAKE2b-512 known-answer test (RFC 7693) — PASS.
  • with --verify-share: verify a real pool-accepted share (BLAKE2b + Wagner) — VALID.
  • with --solve: the complete solver — inject a known header's midstate+tail, run the GPU pipeline, and harvest a solution from the container that the verifier accepts. Reproducibly prints SOLUTION HARVESTED FROM GPU — VALID ✓.

See ../miniz-dump/solver_192_7/ORCHESTRATION.md for the full pipeline + recovery.

Status (honest)

  • Pipeline: complete. All 10 kernels run standalone; round 0 verified bit-exact (2^25 entries). Faithful end-to-end replay of miniZ's 192,7 solver.
  • Hash model + verification: SOLVED. Captured live stratum (plaintext) via a logging relay; a real pool-accepted share verifies exactly under hash(i) = BLAKE2b(header‖LE32(i/2), person="ZcashPoW"+LE32(192)+LE32(7), digest=48)[(i%2)*24..]. --verify-share reproduces VALID ✓ (192/192 zero bits, all 7 Wagner levels) in Rust. So --selftest, blake2b.rs, verify.rs and the solution decoder are all proven against ground truth.
  • Complete (--solve). Container = 128 consecutive u32 indices at offset 0; the midstate is textbook BLAKE2b-after-128B and the digit_f uint is the 4 varying header-tail bytes (nonce[28..31]; nonce[20..27] are constant 0). So: header → midstate+tail → GPU pipeline → container[0..128] → VALID solution, reproducibly. The miniZ Equihash 192,7 solver is fully reverse-engineered.

What it does NOT do (scope)

It does not mine or produce valid Equihash solutions. A working solver also needs miniZ's host orchestration, which is not part of the extracted kernels:

  • exact device-buffer sizing per round (the kernels' template/array dims give the bucket geometry, e.g. uint4[180][6656][32], but the host owns allocation)
  • the precise digit_f → digit_1..6 → digit_l → sort_and_compress launch sequence with the correct grid/block dims and shared-mem config per round
  • BLAKE2b midstate setup from the block header + nonce, and the equi<...> / ScontainerReal192 struct layouts passed between kernels

That host logic lives in miniZ's encrypted blob. Reconstructing it (from the SASS in ../miniz-dump/solver_192_7/equihash192_7.sm_120.sass plus the kernel signatures in kernels_demangled.txt) is the next step toward a standalone miner.

Files

  • src/cuda.rs — minimal CUDA Driver API FFI bindings
  • src/main.rs — loader / enumerator / launch demo
  • build.rs — links libcuda