RHPool - Rendezvous Hashing for fast DSR Load Balancing
July 23, 2025
Updated to reflectrhpool.test.load
which instantiates one testingRHPool
instance and makes the additional testing RPCL words available (operating on that instance).
Overview #
RHPool
implements a Rendezvous Hash Pool, which holds multiple possible destinations (or servers) at a given time.RHPool
is esentially a skiplist with thatuint64_t
as the key.- For DSR load balancing (the intended purpose), a MAC address is mapped to the first 6 bytes in memory order, the remaining 2 bytes allow to insert the same MAC address multiple times to allow weighting a destination.
- A
RHPool
entry contains as the sole value a time stamp, which is also anuint64_t
. RHPool
uses the tabulation hash functionf2568()
to determine the HRW (Highest Random Weight) result for a 64 bit hash of the desired connection tuple inside an accepted packet.- A combination of Linux/POSIX rwlocks and atomic
uint64_t
access allows a maximum of concurrency during regular operation. - Locking for write access is only needed when a new entry is inserted or when a cleanup sweep to remove outdated entries is performed.
The RHPool API #
RHPool* RHPoolCreate();
void RHPoolDestroy(RHPool* rhpool);
void RHPoolInsert(RHPool* rhpool, uint64_t destination);
uint64_t RHPoolSelect(RHPool* rhpool, uint64_t tuple_hash);
void RHPoolCleanup(RHPool* rhpool, uint32_t maxage);
RPCL Testing Control Words #
rhpool.test.load loads rhpool.test words ( - )
rhpool.test.load
instatiates one RHPool
for testing and loads
the following words operating on that:
rhpool.test.cleanup remove outdated nodes ( maxage - )
rhpool.test.dtest distribution test ( nservers lookups - )
rhpool.test.info print rhpool info ( - )
rhpool.test.insert insert node ( uint64_t - )
rhpool.test.load loads rhpool.test words ( - )
rhpool.test.select select node ( uint64_t - uint64_t )
Distribution Testing #
The rather small machine that these tests have been run on has 8 cores with the following data:
vendor_id : GenuineIntel
cpu family : 6
model : 158
model name : Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
stepping : 9
microcode : 0x8e
cpu MHz : 800.000
cache size : 8192 KB
1 Entry #
ok timer-start 1 10000000 rhpool.test.dtest timer-show
0000000000000000 10000000 100.0000%
expected 10000000.0000
chi_squared 0.0000
327973 microseconds elapsed
ok ▂
- 30,490,314 selections/second
2 Entries #
ok timer-start 2 10000000 rhpool.test.dtest timer-show
0000000000000000 4997282 49.9728%
0000000000000001 5002718 50.0272%
expected 5000000.0000
chi_squared 2.9550
490679 microseconds elapsed
ok ▂
- 20,379,922 selections/second
10 Entries #
ok timer-start 10 10000000 rhpool.test.dtest timer-show
0000000000000000 1000089 10.0009%
0000000000000001 1001166 10.0117%
0000000000000002 997538 9.9754%
0000000000000003 999905 9.9990%
0000000000000004 1001008 10.0101%
0000000000000005 999991 9.9999%
0000000000000006 998782 9.9878%
0000000000000007 1000610 10.0061%
0000000000000008 1001388 10.0139%
0000000000000009 999523 9.9952%
expected 1000000.0000
chi_squared 12.4638
1534562 microseconds elapsed
ok ▂
- 6,516,517 selections/second
100 Entries #
ok timer-start 100 10000000 rhpool.test.dtest timer-show
0000000000000000 100191 1.0019%
0000000000000001 100101 1.0010%
... lines omitted ...
0000000000000062 100229 1.0023%
0000000000000063 100702 1.0070%
expected 100000.0000
chi_squared 92.4308
12174253 microseconds elapsed
ok ▂
- 821,405 selections/second
1000 Entries #
ok timer-start 100 10000000 rhpool.test.dtest timer-show
0000000000000000 10022 0.1002%
0000000000000001 9889 0.0989%
... lines omitted ...
00000000000003e6 10073 0.1007%
00000000000003e7 10030 0.1003%
expected 10000.0000
chi_squared 986.0008
122478626 microseconds elapsed
ok ▂
- 81,646 selections/second
Observations #
- The strong f2568() properties have been confirmed.
- The lookup time increases linear with the number of destinations (not surprising).
- The combination of POSIX rwlocks and atomic timestamp updates allow any parallel concurrency.
- The distribution is perfect according to the Chi-Squared results.
Conclusion #
- This principle will be the core distribution method for our new products and projects.
Resources #
- Rendezvous Hashing
- Skip Lists - A Probabilistic Alternative to Balanced Trees
- f2568() - A variable Length Tabulation Hash Function
- Chi-squared distribution
- Chi-square Distribution Table
- What is the 10GbE line rate?
Here’s a good explanation how to calculate the chi-squared value and to test a null hypothesis with a Chi-Squared distribution table: