Speed up DNSDist with AF_XDP

Chinese readers can read the Chinese version of this article.
中文读者可阅读本文的中文版本

Preface

DNSDist is an excellent DNS load balancer, and AF_XDP is an emerging high-performance Linux asynchronous IO interface that benefits from eBPF.
It is a great honor for Y7n05h to participate in the AF_XDP transformation of DNSDist as a contributor.

It’s an honor to have Y7n05h as a contributor to improve DNSDist with AF_XDP.

The changes to the UDP part of DNSDist have long since come to an end. This performance-improving modification requires profiling data to validate.

So, let’s start the fun performance analysis.

Test environment information

Laptop
OS: ArchLinux
Kernel Version: 5.19.1-arch2-1
CPU: AMD Ryzen 7 4800H with Radeon Graphics
MEM: DDR4 64G
NIC: Intel Corporation Wi-Fi 6 AX200 (rev 1a)
DNSPerf Version: 2.9.0
GCC Version: 12.1.1
Libxdp Version:1.2.5

PC1
OS: ArchLinux
Kernel Version: 5.19.1-arch2-1
CPU: Intel(R) Core(TM) i5-8500 CPU @ 3.00GHz
MEM: DDR4 8G
NIC: Intel Corporation Wi-Fi 6 AX200 (rev 1a)

PC2
OS: Ubuntu
Kernel Version: 5.15.0-46-generic
CPU: 12th Gen Intel(R) Core(TM) i7-12700KF
MEM: DDR4 64G
NIC: Broadcom Inc. and subsidiaries BCM4360 802.11ac Wireless Network Adapter (rev 03)

Thanks to @yegetables for lending PC2 to Y7n05h.

Source Code:
DNSDist With AF_XDP(AF_XDP verion): https://github.com/Y7n05h/pdns/commit/d42e356a48a433a9f4efae9c3dd648101a37abdf
DNSDist Without AF_XDP(Normal version): https://github.com/Y7n05h/pdns/commit/f5e76c2a6932ec4360d38219fb515d26d538b40d

In this test, Y7n05h uses Laptop to generate the requests for testing, PC1 to run the DNSDist instance to be tested, and PC2 to run the SmartDNS service.
Laptop, PC1 and PC2 are all connected to the gateway (192.168.30.1) using WIFI, and Laptop, PC1 and PC2 access each other via WIFI. Unfortunately, the network used in this test environment was still used by numerous other devices during the test, so Y7n05h cannot exclude experimental errors caused by fluctuations in the network environment.

The test tool uses DNSPerf. The test resolves A records for 12018 different domains, only once for each domain in each test.

For the tests, SmartDNS was used as the DNS server. There is no particular reason to use SmartDNS, except that Y7n05h is familiar with it and it is easy to build and deploy SmartDNS on Ubuntu.

During the test, Y7n05h used Laptop to send DNS query requests to PC1 running DNSDist. SmartDNS then concurrently sends the DNS requests to the 4 DNS servers 1.1.1.1, 1.0.0.1, 8.8.8.8, 119.29.29.29 (if necessary) and replies to DNSDist with the first response received.

In this test, Y7n05h sent a lot of DNS query requests to the above 4 public DNS servers frequently due to testing needs, Y7n05h expresses sincere thanks to Cloudflare, Google, DNSPod for providing these public DNS services. (Although Y7n05h performed a large number of DNS queries during the test, these queries are cached by DNSDist, SmartDNS, and not every query sends a request to the above servers. (Therefore Y7n05h feels this is acceptable, which is fundamentally different from DDoS.) Also, in order to eliminate as much interference as possible from the DNS service’s cache for this test, Y7n05h has used DNSPerf to repeatedly send resolution requests to SmartDNS for all domains used in the test before the start of this test.

In the current network environment, Y7n05h’s subjective guess is that DNS query requests are still dominated by A records and AAAA records. Since Y7n05h’s environment does not have good IPv6 support, the resolution of A records was used as a performance indicator in this test. Since Y7n05h cannot verify the correctness of the DNS resolution results, we do not consider correctness as a metric in this performance analysis.

It should also be noted that the DNSDist configuration used in this test has been simplified for testing purposes and may differ significantly from the DNSDist configuration in the production environment. Therefore, this test by Y7n05h does not fully reflect the performance of AF_XDP’s optimization of DNSDist in a production environment.

As we all know, the DNS protocol uses recursive resolution, and the response time of requests is greatly affected by whether the query domain hits the DNS server’s cache or not.

In summary, this test by Y7n05h may be biased and may be inaccurate. The following is only the opinion of Y7n05h.

Performance Tests

The uniq.txt contains and only contains 12018 non-repeating domains.

In this article, the horizontal axis of all lines is the number of runs, and the number of DNSPerf runs is incremented by 1 for each DNSPerf run, the result of multiple DNSPerf executions for the same DNSDist process instance on the same curve. The points on the same curve are listed in chronological order from left to right. Any two DNSPerf’s do not overlap.

To simplify the exposition of this paper, Y7n05h hereby agrees with the reader that

  • The version of DNSDist that uses AF_XDP is omitted as “AF_XDP version”.
  • DNSDist version without AF_XDP is omitted as “Normal version”.

Test 1

The following command was used in this test to run DNSPerf:

1
dnsperf -s 192.168.30.170 -p 5300 -d uniq.txt

In test 1, Y7n05h ran the AF_XDP version first, then the Normal version, and finally the AF_XDP version again.

Looking at the average latency first, there is a decreasing trend in the average latency regardless of the fold. The decreasing average latency is generally due to the fact that after multiple DNSPerf executions, SmartDNS and the downstream servers have increased their hit rate for the domain name caches involved in the DNSPerf in this test. This is evidenced by the fact that the average latency of “re-running AF_XDP version” after “running Normal version” is significantly lower than the previous test results. Based on the data available at this time, it is not possible to determine the impact of AF_XDP on average latency.

For the metric of query loss, the AF_XDP version is significantly lower than the Normal version. The number of lost queries with the AF_XDP version tends to increase slowly with the number of tests.

For the average number of queries per second, or throughput, using the AF_XDP version is significantly better than the Normal version. Considering that the fetch time of the data in the blue curve is between the green and red curves, the effect of caching on the AF_XDP version is that it enhances the green curve and degrades the red curve. This is a strong enough comparison for the non-AF_XDP version of DNSDist with the blue curve to show the throughput advantage of AF_XDP.

The runtime is the time consumed for a complete execution of DNSPerf. The conclusion here is also that the AF_XDP version outperforms the Normal version. The conclusions here are similar to those from the throughput analysis, and Y7n05h does not repeat them.

Even considering the caching impact on DNS in queries such as DNSDist, SmartDNS, etc. results in DNSPerf speeding up in the time dimension one by one. What can be clearly concluded at this point is that AF_XDP significantly improves DNSDist’s throughput in the current scenario. The impact on query latency may still require further testing.

Test 2

The following command was used in this test to run DNSPerf:

1
dnsperf -s 192.168.30.170 -p 5300 -d uniq.txt -c 500 -T 16

In Test 2, the concurrency of the test was increased by adding command line arguments. Test 2 ran the Normal version first and the AF_XDP version second.

Note: Test 2 was not run consecutively with Test 1, which may have affected the DNS cache.

The average latency of the AF_XDP version is still decreasing and is not significantly different from the Normal version in the last two tests. y7n05h I personally guess that if we increase the number of tests, the average latency of the AF_XDP version may be lower than the Normal version as the cache command increases. The average significant latency of the AF_XDP version was higher than the Normal version in the first 3 tests, perhaps because the cache in DNSDist was cleared by stopping the Normal version and running the AF_XDP version. the effect of AF_XDP on the average latency still needs further testing.

Comparing the query misses for the Normal and AF_XDP versions, they remain similar to those in Test 1. There is also no significant change in the number of queries lost compared to Test 1.

In terms of throughput, the gap between the AF_XDP version and the Normal version increases further for more concurrent query requests, and tends to increase with the number of tests.

The AF_XDP version is significantly less time consuming than the Normal version for one DNSPerf execution. This is similar to what was found in Test 1.

Summary

AF_XDP significantly improves DNSDist throughput, but risks increasing the average latency per request (which needs to be further verified).

In terms of throughput alone, it is conservatively estimated that AF_XDP can more than double the throughput of DNSDist.

From the tests here, it appears that AF_XDP is a technique that has the potential to significantly improve the throughput of UDP-based web services.

用 AF_XDP 加速 DNSDist

英文读者可阅读本文的 英文版本
English readers can read the English version of this article.

前言

DNSDist 是一个优秀的 DNS 负载均衡器,AF_XDP 则是得益于 eBPF 而产生的新兴的高性能 Linux 异步 IO 接口。
很荣幸 Y7n05h 能作为一个贡献者,参与 DNSDist 的 AF_XDP 改造。

目前,对 DNSDist 的 UDP 部分的改造早已告一段落。这种意在提高性能的修改成果当然不该只是纸上谈兵。
收集压测数据,进行性能分析才是最有说服力的成绩单。

那么,便开始有趣的压测吧。

压测环境

Laptop
OS: ArchLinux
Kernel Version: 5.19.1-arch2-1
CPU: AMD Ryzen 7 4800H with Radeon Graphics
MEM: DDR4 64G
NIC: Intel Corporation Wi-Fi 6 AX200 (rev 1a)
DNSPerf Version: 2.9.0
GCC Version: 12.1.1
Libxdp Version:1.2.5
压测过程中 Laptop 的 CPU、MEM、NIC 始终保持着低负载。

PC1
OS: ArchLinux
Kernel Version: 5.19.1-arch2-1
CPU: Intel(R) Core(TM) i5-8500 CPU @ 3.00GHz
MEM: DDR4 8G
NIC: Intel Corporation Wi-Fi 6 AX200 (rev 1a)
压测过程中 PC1 除必要的系统进程外,仅运行了 DNSDist。

PC2
OS: Ubuntu
Kernel Version: 5.15.0-46-generic
CPU: 12th Gen Intel(R) Core(TM) i7-12700KF
MEM: DDR4 64G
NIC: Broadcom Inc. and subsidiaries BCM4360 802.11ac Wireless Network Adapter (rev 03)
压测过程中 PC2 的 CPU、MEM、NIC 始终保持着低负载。

感谢 @yegetables 提供的 PC2 ,让 Y7n05h 有了稳定运行的 SmartDNS 服务。这对本次压测有很大的帮助。

源码信息:
DNSDist With AF_XDP: https://github.com/Y7n05h/pdns/commit/d42e356a48a433a9f4efae9c3dd648101a37abdf
DNSDist Without AF_XDP: https://github.com/Y7n05h/pdns/commit/f5e76c2a6932ec4360d38219fb515d26d538b40d

本次测试中,Y7n05h 使用 Laptop 下发压测流量,使用 PC1 运行待测试的 DNSDist 实例,使用 PC2 运行 SmartDNS 服务。
Laptop、PC1、PC2 均使用 WIFI 连接至网关(192.168.30.1)。Laptop、PC1、PC2 通过 WIFI 相互访问。遗憾的是这次测试环境中使用的网络在测试过程中仍然被众多其他设备使用,因此 Y7n05h 也无法排除由网络环境波动导致的实验误差。

测试工具使用 DNSPerf。测试中解析 12018 个不同域名的 A 记录,每次测试中每个域名仅解析一次。

测试中,使用 SmartDNS 作为 DNS 服务器。使用 SmartDNS 并没有什么特别的原因,只是因为 Y7n05h 很熟悉它并且在 Ubuntu 上构建并部署 SmartDNS 很方便。

测试过程中,Y7n05h 使用 Laptop 向运行 DNSDist 的 PC1 发送压测请求,PC1 上运行的 DNSDist 解析并处理 DNS 请求后,发送给运行 SmartDNS 的 PC2(如果有必要的话),SmartDNS 则在收到 DNS 请求后并发的发送给 1.1.1.11.0.0.18.8.8.8119.29.29.29 这 4 个 DNS 服务器(如果有必要的话),并将最先接收到的响应回复给 DNSDist。

在本次测试中,因测试需要,Y7n05h 频繁的向上述 4 个公共 DNS 服务器发送了大量的 DNS 请求,Y7n05h 对 Cloudflare、Google、DNSPod 提供的这些公共 DNS 服务表示真心的感谢。(Y7n05h 虽然在测试中进行了大量的 DNS 查询,但这些查询会被 DNSDist、SmartDNS 缓存,并不是每次查询都会向上述服务器发送请求。因此 Y7n05h 自认为本次测试中的做法并未超出合理限度。)同时,为了尽最大可能的排除 DNS 服务的缓存为本次测试造成的干扰,在本次测试开始前,Y7n05h 已使用 DNSPerf 反复多次向 SmartDNS 发送在测试中使用的所有域名的解析请求。

目前的网络环境中,Y7n05h 主观猜测 DNS 请求仍旧以 A 记录 和 AAAA 记录为主,且 Y7n05h 所在的环境中 IPv6 支持并不好。因此 Y7n05h 在这次测试中就以 A 记录的解析能力作为衡量性能的指标。另一方面,Y7n05h 也无法验证 DNS 的解析结果的正确性,因此 Y7n05h 在这次性能分析中不考虑正确性这一指标。

还需要指明的是,这次测试中使用的 DNSDist 配置为了测试方便做出了众多简化,和生产环境中对 DNSDist 的配置可能有较大差异。因此 Y7n05h 的这次测试并不能完全反应 AF_XDP 对 DNSDist 的优化在生产环境中的表现。

众所周知,DNS 协议使用递归的解析方式,请求的响应时间极大的受查询域名是否命中 DNS 服务器的缓存影响。

综上所述,Y7n05h 的这次测试可能是有失偏颇的,可能是不精确的。以下内容仅代表 Y7n05h 观点。

压力测试

uniq.txt 中包含且仅包含 12018 个不重复的域名。

在本文中,所有折线的横轴均为运行的次数,每运行一次 DNSPerf 次数递增 1,同一条曲线对同一 DNSDist 进程实例多次执行 DNSPerf 的结果。同一条曲线上的点,按照时间顺序由左向右排列。任意两次 DNSPerf 均不重叠。

为简化本文论述,Y7n05h 在此与读者约定:

  • 使用 AF_XDP 的 DNSDist 版本省略为「AF_XDP 版」
  • 不使用 AF_XDP 的 DNSDist 版本省略为「Normal 版」

测试1

本次测试中使用如下命令运行 DNSPerf:

1
dnsperf -s 192.168.30.170 -p 5300 -d uniq.txt

在测试 1 中,Y7n05h 先运行了 AF_XDP 版本,然后运行 Normal 版本,最后再次运行 AF_XDP 版。

先看平均延迟,总体来看,无论哪条折线,平均延迟均呈递减趋势。平均延迟呈递减趋势的原因大致为多次执行 DNSPerf 后,SmartDNS 和上游服务器对本次测试中 DNSPerf 涉及的域名缓存命中率逐次提高所致。这一点在「运行完 Normal 版」后,「重新运行 AF_XDP 版」的平均延迟显著低于先前的测试成绩得到印证。根据目前现有的数据,无法判断 AF_XDP 本身对平均延迟的影响

对查询丢失这一指标而言,AF_XDP 版本显著低于正常版。使用 AF_XDP 版本的查询丢失数量随测试次数增加有缓慢递增趋势。

对平均每秒查询数量,或者说吞吐量来说,使用 AF_XDP 版本显著优于 Normal 版本。考虑蓝色折线中的数据的获取时间介于绿色折线和红色折线,缓存对 AF_XDP 版本的影响是增强了绿色曲线而劣化了红色曲线,故此,若以蓝色曲线执行时的缓存情况作为基准,使用 AF_XDP 的 DNSDist 的真实吞吐量大致引介于红色曲线和绿色曲线之间,这对蓝色曲线的不使用 AF_XDP 版本的 DNSDist 而言,有足够强的对比,足够体现 AF_XDP 的吞吐量优势。

运行时间为完整执行一次 DNSPerf 所消耗的时间。这里的结论也是 AF_XDP 版本优于 Normal 版本。这里的结论和从吞吐量分析得来的结论类似,Y7n05h 不再赘述。

即使考虑到 DNSDist、SmartDNS 等查询中对 DNS 的缓存影响导致 DNSPerf 在时间维度上逐次加速。目前能明确得出结论的是,AF_XDP 能显著提升 DNSDist 在当前场景下的吞吐量。对查询延迟的影响,可能仍然需要进一步测试。

测试2

1
dnsperf -s 192.168.30.170 -p 5300 -d uniq.txt -c 500 -T 16

在测试 2 中,通过添加命令行参数,提高了测试的并发量。测试2 中先运行了 Normal 版本,后运行了 AF_XDP 的版本。

注意:测试2 与 测试1 并不连续进行,这可能影响了 DNS 的缓存。

还是先看平均延迟,可以看到后运行的 AF_XDP 版本的延迟仍然呈现递减趋势,在最后的两次测试中与 Normal 版本并无显著差异。Y7n05h 个人猜测若增加测试次数,随着缓存命令中的提高,AF_XDP 版本的平均延迟或将低于 Normal 版。前 3 次测试中,AF_XDP 版的平均显著延迟高于 Normal 版,或许是停止 Normal 版并运行 AF_XDP 版,导致 DNSDist 中的缓存被清除所致。AF_XDP 对平均延迟的影响,仍需进一步测试。

对比 Normal 版和 AF_XDP 版的查询丢失情况,两者仍保持了与 测试1 中相似的情况。且查询丢失数量与 测试1 相比也无明显变化。

就吞吐量而言,加大并发量的查询请求中,AF_XDP 版与 Normal 版的差距进一步增大,且有随测试次数增加继续增大差距的趋势。

对执行一次 DNSPerf 的耗时而言,AF_XDP 版明显低于 Normal 版。这与 测试1 中得到的结论相似。

总结

AF_XDP 能显著提高 DNSDist 的吞吐量,但有提高平均每次请求的延迟的风险(这还需要进一步验证)。

仅就吞吐量而言,保守估计 AF_XDP 能提升 DNSDist 的吞吐量一倍以上。

从这里的测试来看, AF_XDP 这项技术有可能明显提升基于 UDP 的网络服务的吞吐量。