RHPool - Rendezvous Hashing for fast DSR Load Balancing
June 12, 2025
Overview #
RHPool
implements a Rendezvous Hash Pool, which holds multiple possible destinations (or servers) at a given time.RHPool
is esentially a skiplist with thatuint64_t
as the key.- For DSR load balancing (the intended purpose), a MAC address is mapped to the first 6 bytes in memory order, the remaining 2 bytes allow to insert the same MAC address multiple times to allow weighting a destination.
- A
RHPool
entry contains as the sole value a time stamp, which is also anuint64_t
. RHPool
uses the tabulation hash functionf2568()
to determine the HRW (Highest Random Weight) result for a 64 bit hash of the desired connection tuple inside an accepted packet.- A combination of Linux/POSIX rwlocks and atomic
uint64_t
access allows a maximum of concurrency during regular operation. - Locking for write access is only needed when a new entry is inserted or when a cleanup sweep to remove outdated entries is performed.
The RHPool API #
RHPool* RHPoolCreate();
void RHPoolDestroy(RHPool* rhpool);
void RHPoolInsert(RHPool* rhpool, uint64_t destination);
uint64_t RHPoolSelect(RHPool* rhpool, uint64_t tuple_hash);
void RHPoolCleanup(RHPool* rhpool, uint32_t maxage);
RPCL Control Words #
rhpool.cleanup remove outdated nodes ( maxage - )
rhpool.dtest distribution test ( nservers lookups - )
rhpool.info print rhpool info ( - )
rhpool.insert insert node ( uint64_t - )
rhpool.select select node ( uint64_t - uint64_t )
Distribution Testing #
The rather small machine that these tests have been run on has 8 cores with the following data:
vendor_id : GenuineIntel
cpu family : 6
model : 158
model name : Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
stepping : 9
microcode : 0x8e
cpu MHz : 800.000
cache size : 8192 KB
1 Entry #
ok timer-start 1 10000000 rhpool.dtest timer-show
0000000000000000 10000000 100.0000%
expected 10000000.0000
chi_squared 0.0000
327973 microseconds elapsed
ok ▂
- 30,490,314 selections/second
2 Entries #
ok timer-start 2 10000000 rhpool.dtest timer-show
0000000000000000 4997282 49.9728%
0000000000000001 5002718 50.0272%
expected 5000000.0000
chi_squared 2.9550
490679 microseconds elapsed
ok ▂
- 20,379,922 selections/second
10 Entries #
ok timer-start 10 10000000 rhpool.dtest timer-show
0000000000000000 1000089 10.0009%
0000000000000001 1001166 10.0117%
0000000000000002 997538 9.9754%
0000000000000003 999905 9.9990%
0000000000000004 1001008 10.0101%
0000000000000005 999991 9.9999%
0000000000000006 998782 9.9878%
0000000000000007 1000610 10.0061%
0000000000000008 1001388 10.0139%
0000000000000009 999523 9.9952%
expected 1000000.0000
chi_squared 12.4638
1534562 microseconds elapsed
ok ▂
- 6,516,517 selections/second
100 Entries #
ok timer-start 100 10000000 rhpool.dtest timer-show
0000000000000000 100191 1.0019%
0000000000000001 100101 1.0010%
... lines omitted ...
0000000000000062 100229 1.0023%
0000000000000063 100702 1.0070%
expected 100000.0000
chi_squared 92.4308
12174253 microseconds elapsed
ok ▂
- 821,405 selections/second
1000 Entries #
ok timer-start 100 10000000 rhpool.dtest timer-show
0000000000000000 10022 0.1002%
0000000000000001 9889 0.0989%
... lines omitted ...
00000000000003e6 10073 0.1007%
00000000000003e7 10030 0.1003%
expected 10000.0000
chi_squared 986.0008
122478626 microseconds elapsed
ok ▂
- 81,646 selections/second
Observations #
- The strong f2568() properties have been confirmed.
- The lookup time increases linear with the number of destinations (not surprising).
- The combination of POSIX rwlocks and atomic timestamp updates allow any parallel concurrency.
- The distribution is perfect according to the Chi-Squared results.
Conclusion #
- This will be the core distribution method for our new products and projects.
Resources #
- Rendezvous Hashing
- Skip Lists - A Probabilistic Alternative to Balanced Trees
- f2568() - A variable Length Tabulation Hash Function
- Chi-squared distribution
- Chi-square Distribution Table
- What is the 10GbE line rate?
Here’s a good explanation how to calculate the chi-squared value and to test a null hypothesis with a Chi-Squared distribution table: