⬡ RMT Research Hypotheses

PENDING REVIEW 110 experiments · 4 phases · 132K real agents
Current Sybil Detection
93.7%
Target
99%+
Composite Score Gap
0.9397 → 0.97+
Overview
H1: Calibration P0
H2: Benchmark P0
H3: Adversarial P1
H4: Sensitivity P1
H5: Cold Start P0

Baseline Metrics

93.97% Composite
93.7% Sybil Det.
0.85 Alpha (α)
132K agentsHelixa + ERC-8004Base L2

Experiment Budget (110 total)

110 experiments
H1: Calibration — 10
H2: Benchmark — 20
H3: Adversarial — 40
H4: Sensitivity — 30
H5: Cold Start — 10

Execution Dependency Chain

PHASE 1 (parallel) H1 + H5 20 experiments PHASE 2 H2 20 experiments PHASE 3 H3 40 experiments PHASE 4 H4 30 experiments

Algorithms Under Test

📊
PageRank
Current production
🎯
PPR
Personalized PageRank
🔗
EigenTrust
Fast convergence
HITS
Hub/authority

Decision Summary

H1: Real-Data Calibration Pending
H2: Algorithm Benchmark Pending
H3: Adversarial Stress Pending
H4: Param Sensitivity Pending
H5: Cold Start Pending

Budget Summary

HypothesisPriorityPhaseExperimentsKey MetricStatus
H1: Real-Data CalibrationP0110Spearman ≥0.7⏳ Pending
H2: Algorithm BenchmarkP0220PPR +3-5% det.⏳ Pending
H3: Adversarial Stress TestP1340>95% on 4/8 attacks⏳ Pending
H4: Parameter SensitivityP1430≥97% det. / ≤2% FP⏳ Pending
H5: Cold Start ConvergenceP0110PPR 2.5× faster⏳ Pending
Total110
H1: Real-Data Calibration
Validate current parameters against 132K real agents from Helixa API
P0 — Critical 10 experiments
What & Why

Test current production parameters (alpha=0.85, reciprocalPenalty=0.95) against 132K real agents sourced from Helixa API + ERC-8004 on Base. Compare PageRank rankings vs Helixa's 13-factor trust scores to establish ground truth correlation.

Experiment Design
  • 5 runs varying trust initialization strategies (uniform, degree-based, KYC-weighted, random, Helixa-seeded)
  • 5 runs varying edge-weight schemes (binary, transaction-volume, recency-weighted, Helixa-endorsed, composite)
Parameters
ParameterValueNotes
alpha (damping)0.85Production default
reciprocalPenalty0.95Reduces mutual-endorsement weight
Graph size132K nodesFull Helixa dataset
Ground truthHelixa 13-factorExternal validation source
Evaluation Dimensions
📈Spearman correlation
🎯Rank agreement (top-100)
Convergence time
📊Score distribution
Expected Outcome

Spearman correlation ≥0.7 for honest agents between PageRank scores and Helixa trust ratings. This establishes that our algorithm is directionally aligned with real-world trust signals.

⚠ Risk

If correlation is <0.5, the current parameter regime may be fundamentally misaligned with real trust patterns, requiring a larger parameter sweep before proceeding to H2.

H2: Algorithm Benchmark
Head-to-head comparison of all 4 algorithms on the same real graph
P0 — Critical 20 experiments
What & Why

Run PageRank, PPR, EigenTrust, and HITS on the same real 132K-node graph at 4 sybil injection ratios (5%, 10%, 15%, 20%). Determines which algorithm family provides the best sybil detection for our specific topology.

Experiment Matrix
Algorithm5% Sybil10% Sybil15% Sybil20% Sybil
PageRank
PPR
EigenTrust
HITS
4 algorithms × 4 ratios = 16 runs + 4 baseline (no injection) = 20 total
Evaluation Dimensions
🎯Sybil detection rate
🚨False positive rate
Convergence time
📐Rank stability
🔍Hub/authority structure
💾Memory footprint
Expected Outcome

PPR achieves +3-5% sybil detection over PageRank but with higher false positives. EigenTrust shows best convergence time. HITS reveals hub/authority structure useful for Trust Anchor validation.

⚠ Risk

If all algorithms converge to similar performance (<1% delta), the bottleneck is likely in graph construction or feature engineering rather than algorithm choice — redirecting effort to H4 may be more productive.

H3: Adversarial Stress Test
8 attack types via MiroShark overlaid on real topology
P1 — Important 40 experiments
Attack Type Matrix

Each attack tested with 5 runs at varying injection scale. Color indicates expected detection rate.

Attack #1
Sybil Ring
50 bot accounts in a trust ring
>95%
Expected detection
Attack #2
Carousel
10 rotating identities cycling trust
>95%
Expected detection
Attack #3
Anchor Compromise
1-3 Trust Anchors flipped adversarial
<80%
Expected detection
Attack #4
Eclipse Attack
200 sybils isolating 1 honest node
<80%
Expected detection
Attack #5
TTL Farming
30 nodes farming trust-through-length
>95%
Expected detection
Attack #6
Star Topology
1 hub, 100 spoke accounts
>95%
Expected detection
Attack #7
Citation Laundering
20 nodes laundering trust via honest intermediaries
<80%
Expected detection
Attack #8
Mass Registration
500 accounts registered simultaneously
80-95%
Expected detection
Detection Rate Legend
>95% — High confidence 80-95% — Moderate <80% — Weak / needs mitigation
Expected Outcome

Strong detection (>95%) on structurally obvious attacks (ring, carousel, TTL, star). Weak (<80%) on sophisticated attacks (anchor compromise, eclipse, citation laundering) — these identify where additional defenses are needed.

⚠ Risk

Attacks #3 (Anchor Compromise) and #4 (Eclipse) may reveal fundamental architectural weaknesses that require protocol-level changes, not just parameter tuning. Budget additional design work if detection rates are below 60%.

H4: Parameter Sensitivity Analysis
Sweep 5 key parameters using Latin hypercube sampling on real data
P1 — Important 30 experiments
What & Why

Systematically explore the 5-dimensional parameter space to find configurations that achieve ≥97% sybil detection with ≤2% false positives. Uses Latin hypercube sampling for efficient coverage, followed by validation runs on the top Pareto-optimal configurations.

Parameter Space
ParameterRangeCurrentStep/Method
alpha (damping factor)0.60 – 0.950.85Step 0.05
reciprocalPenalty0.70 – 1.000.95Continuous
clusterDensityThreshold0.50 – 0.95Continuous
carouselPenalty30.10 – 0.50Continuous
starInDegreeThreshold5 – 25Integer
Experiment Breakdown
  • 25 runs: Latin hypercube sampling across 5D space
  • 5 runs: Validation on top-3 Pareto-optimal configurations
Evaluation Dimensions
🎯Sybil detection rate
🚨False positive rate
📐Pareto frontier
📊Sensitivity gradient
🔄Cross-validation stability
⚖️Detection/FP tradeoff
Target Outcome

Identify parameter configurations achieving ≥97% sybil detection with ≤2% false positives. Produce a Pareto frontier showing the detection/false-positive tradeoff surface.

⚠ Risk

If no configuration in the searched space meets the 97%/2% target, the algorithm itself may need augmentation (e.g., ensemble methods, additional features). This would trigger a scope expansion for H2 follow-ups.

H5: Cold Start Convergence
How fast new honest agents reach Navigator tier across KYC levels
P0 — Critical 10 experiments
What & Why

Measure how quickly a newly onboarded honest agent reaches Navigator tier under different KYC verification levels and algorithm choices. This directly impacts user experience and onboarding flow design.

Experiment Matrix
KYC LevelPageRankPPRDescription
L0 (none)No verification
L1 (email)Email verified
L2 (ID)Government ID
L3 (full)Full KYC + biometric
4 KYC levels × 2 algorithms = 8 main runs + 2 adversarial = 10 total
Convergence Comparison
PageRank
~40
epochs to Navigator
PPR (expected)
~15
epochs to Navigator
2.5× faster
PPR expected speedup
Adversarial Runs
  • Run 9: Sybil fast-tracking via compromised Trust Anchor (PageRank)
  • Run 10: Sybil fast-tracking via compromised Trust Anchor (PPR)
Evaluation Dimensions
Epochs to Navigator
📈Score trajectory
🛡️Sybil fast-track resist.
🆔KYC level impact
Expected Outcome

PPR converges ~2.5× faster than PageRank (~15 vs ~40 epochs). Higher KYC levels should accelerate convergence. Adversarial runs should show that compromised TAs can fast-track sybils under PageRank but not PPR.

⚠ Risk

If PPR doesn't significantly outperform PageRank on cold start, the complexity overhead of PPR may not be justified. Also, if sybils can fast-track under PPR too, Trust Anchor security becomes a blocking prerequisite.