RMT Research Hypotheses — Decision Dashboard

Baseline Metrics

132K agentsHelixa + ERC-8004Base L2

Experiment Budget (110 total)

H1: Calibration — 10

H2: Benchmark — 20

H3: Adversarial — 40

H4: Sensitivity — 30

H5: Cold Start — 10

Execution Dependency Chain

Algorithms Under Test

📊

PageRank

Current production

🎯

PPR

Personalized PageRank

🔗

EigenTrust

Fast convergence

⚡

HITS

Hub/authority

Decision Summary

H1: Real-Data Calibration Pending

H2: Algorithm Benchmark Pending

H3: Adversarial Stress Pending

H4: Param Sensitivity Pending

H5: Cold Start Pending

Budget Summary

Hypothesis	Priority	Phase	Experiments	Key Metric	Status
H1: Real-Data Calibration	P0	1	10	Spearman ≥0.7	⏳ Pending
H2: Algorithm Benchmark	P0	2	20	PPR +3-5% det.	⏳ Pending
H3: Adversarial Stress Test	P1	3	40	>95% on 4/8 attacks	⏳ Pending
H4: Parameter Sensitivity	P1	4	30	≥97% det. / ≤2% FP	⏳ Pending
H5: Cold Start Convergence	P0	1	10	PPR 2.5× faster	⏳ Pending
Total			110

H1: Real-Data Calibration

Validate current parameters against 132K real agents from Helixa API

P0 — Critical 10 experiments

What & Why

Test current production parameters (alpha=0.85, reciprocalPenalty=0.95) against 132K real agents sourced from Helixa API + ERC-8004 on Base. Compare PageRank rankings vs Helixa's 13-factor trust scores to establish ground truth correlation.

Experiment Design

5 runs varying trust initialization strategies (uniform, degree-based, KYC-weighted, random, Helixa-seeded)
5 runs varying edge-weight schemes (binary, transaction-volume, recency-weighted, Helixa-endorsed, composite)

Parameters

Parameter	Value	Notes
alpha (damping)	0.85	Production default
reciprocalPenalty	0.95	Reduces mutual-endorsement weight
Graph size	132K nodes	Full Helixa dataset
Ground truth	Helixa 13-factor	External validation source

Evaluation Dimensions

📈Spearman correlation

🎯Rank agreement (top-100)

⏱Convergence time

📊Score distribution

Expected Outcome

Spearman correlation ≥0.7 for honest agents between PageRank scores and Helixa trust ratings. This establishes that our algorithm is directionally aligned with real-world trust signals.

⚠ Risk

If correlation is <0.5, the current parameter regime may be fundamentally misaligned with real trust patterns, requiring a larger parameter sweep before proceeding to H2.

H2: Algorithm Benchmark

Head-to-head comparison of all 4 algorithms on the same real graph

P0 — Critical 20 experiments

What & Why

Run PageRank, PPR, EigenTrust, and HITS on the same real 132K-node graph at 4 sybil injection ratios (5%, 10%, 15%, 20%). Determines which algorithm family provides the best sybil detection for our specific topology.

Experiment Matrix

Algorithm	5% Sybil	10% Sybil	15% Sybil	20% Sybil
PageRank	✓	✓	✓	✓
PPR	✓	✓	✓	✓
EigenTrust	✓	✓	✓	✓
HITS	✓	✓	✓	✓

4 algorithms × 4 ratios = 16 runs + 4 baseline (no injection) = 20 total

Evaluation Dimensions

🎯Sybil detection rate

🚨False positive rate

⏱Convergence time

📐Rank stability

🔍Hub/authority structure

💾Memory footprint

Expected Outcome

PPR achieves +3-5% sybil detection over PageRank but with higher false positives. EigenTrust shows best convergence time. HITS reveals hub/authority structure useful for Trust Anchor validation.

⚠ Risk

If all algorithms converge to similar performance (<1% delta), the bottleneck is likely in graph construction or feature engineering rather than algorithm choice — redirecting effort to H4 may be more productive.

H3: Adversarial Stress Test

8 attack types via MiroShark overlaid on real topology

P1 — Important 40 experiments

Attack Type Matrix

Each attack tested with 5 runs at varying injection scale. Color indicates expected detection rate.

Attack #1

Sybil Ring

50 bot accounts in a trust ring

>95%

Expected detection

Attack #2

Carousel

10 rotating identities cycling trust

>95%

Expected detection

Attack #3

Anchor Compromise

1-3 Trust Anchors flipped adversarial

<80%

Expected detection

Attack #4

Eclipse Attack

200 sybils isolating 1 honest node

<80%

Expected detection

Attack #5

TTL Farming

30 nodes farming trust-through-length

>95%

Expected detection

Attack #6

Star Topology

1 hub, 100 spoke accounts

>95%

Expected detection

Attack #7

Citation Laundering

20 nodes laundering trust via honest intermediaries

<80%

Expected detection

Attack #8

Mass Registration

500 accounts registered simultaneously

80-95%

Expected detection

Detection Rate Legend

● >95% — High confidence ● 80-95% — Moderate ● <80% — Weak / needs mitigation

Expected Outcome

Strong detection (>95%) on structurally obvious attacks (ring, carousel, TTL, star). Weak (<80%) on sophisticated attacks (anchor compromise, eclipse, citation laundering) — these identify where additional defenses are needed.

⚠ Risk

Attacks #3 (Anchor Compromise) and #4 (Eclipse) may reveal fundamental architectural weaknesses that require protocol-level changes, not just parameter tuning. Budget additional design work if detection rates are below 60%.

H4: Parameter Sensitivity Analysis

Sweep 5 key parameters using Latin hypercube sampling on real data

P1 — Important 30 experiments

What & Why

Systematically explore the 5-dimensional parameter space to find configurations that achieve ≥97% sybil detection with ≤2% false positives. Uses Latin hypercube sampling for efficient coverage, followed by validation runs on the top Pareto-optimal configurations.

Parameter Space

Parameter	Range	Current	Step/Method
`alpha` (damping factor)	0.60 – 0.95	0.85	Step 0.05
`reciprocalPenalty`	0.70 – 1.00	0.95	Continuous
`clusterDensityThreshold`	0.50 – 0.95	—	Continuous
`carouselPenalty3`	0.10 – 0.50	—	Continuous
`starInDegreeThreshold`	5 – 25	—	Integer

Experiment Breakdown

25 runs: Latin hypercube sampling across 5D space
5 runs: Validation on top-3 Pareto-optimal configurations

Evaluation Dimensions

🎯Sybil detection rate

🚨False positive rate

📐Pareto frontier

📊Sensitivity gradient

🔄Cross-validation stability

⚖️Detection/FP tradeoff

Target Outcome

Identify parameter configurations achieving ≥97% sybil detection with ≤2% false positives. Produce a Pareto frontier showing the detection/false-positive tradeoff surface.

⚠ Risk

If no configuration in the searched space meets the 97%/2% target, the algorithm itself may need augmentation (e.g., ensemble methods, additional features). This would trigger a scope expansion for H2 follow-ups.

H5: Cold Start Convergence

How fast new honest agents reach Navigator tier across KYC levels

P0 — Critical 10 experiments

What & Why

Measure how quickly a newly onboarded honest agent reaches Navigator tier under different KYC verification levels and algorithm choices. This directly impacts user experience and onboarding flow design.

Experiment Matrix

KYC Level	PageRank	PPR	Description
L0 (none)	✓	✓	No verification
L1 (email)	✓	✓	Email verified
L2 (ID)	✓	✓	Government ID
L3 (full)	✓	✓	Full KYC + biometric

4 KYC levels × 2 algorithms = 8 main runs + 2 adversarial = 10 total

Convergence Comparison

PageRank

~40

epochs to Navigator

→

PPR (expected)

~15

epochs to Navigator

2.5× faster

PPR expected speedup

Adversarial Runs

Run 9: Sybil fast-tracking via compromised Trust Anchor (PageRank)
Run 10: Sybil fast-tracking via compromised Trust Anchor (PPR)

Evaluation Dimensions

⏱Epochs to Navigator

📈Score trajectory

🛡️Sybil fast-track resist.

🆔KYC level impact

Expected Outcome

PPR converges ~2.5× faster than PageRank (~15 vs ~40 epochs). Higher KYC levels should accelerate convergence. Adversarial runs should show that compromised TAs can fast-track sybils under PageRank but not PPR.

⚠ Risk

If PPR doesn't significantly outperform PageRank on cold start, the complexity overhead of PPR may not be justified. Also, if sybils can fast-track under PPR too, Trust Anchor security becomes a blocking prerequisite.

⬡ RMT Research Hypotheses

Baseline Metrics

Experiment Budget (110 total)

Execution Dependency Chain

Algorithms Under Test

Decision Summary

Budget Summary