DNS data corruption

DNS data corruption

DNS data corruption
DNS data corruption
2024-10-20 09:06:17
Tags: #OneMesh #DNS
Model: RE655BE  
Hardware Version: V1
Firmware Version:

I just got this extender a couple of days ago, I'm having serious problems making it pass traffic correctly, and I'm wondering if anybody else has seen similar issues, or if I got massively bad hardware somehow.

 

As soon as I started using the extender, DNS lookups from my Mac started either failing or returning 10.0.0.1, depending on the site.  This happened for both internal (intranet) and external (public) sites.

 

When I disabled the range extender, everything works again.

 

According to tcpdump, the DNS requests do not reach my internal DNS server at all.

 

This failure occurs regardless of my Mac's DNS settings; DNS requests all get lost, even with 8.8.8.8.  I have verified that nothing in the Mac UI is lying to me by using dig on the command line, and the above observations are still 100% reproducible.

 

When the extender is configured to use my internal DNS server (which I have verified is working correctly), it can't connect to the cloud server to check for firmware updates.

 

If I change the extender to use 8.8.8.8 as its DNS server, it is able to check for updates and says that none are available.  Even with that setting, I still get corrupted/no DNS responses on downstream clients.

 

 

Troubleshooting steps performed:

 

1.  Noticed that DHCP server on the extender was set to auto, which could potentially compromise my wired network by having multiple DHCP servers.  Turned that off.

 

2.  Tried pointing the Mac client at a different DNS server.

 

3.  Noticed that the extender was forcibly inserting 192.168.1.1 as a secondary DNS server if I left it blank, so inserted my own DNS server in both slots to force it to provide the right one to clients, just in case the upstream router was providing bad DNS responses.  This appeared to fix the problem for about 24 hours, or maybe those sites just got cached on my laptop while it was connected to the other router, but the fix didn't hold.  (I later determined that this was a red herring, as 192.168.1.1 does not respond to DNS requests.)

 

4.  Added debugging to my DNS server to see what was actually coming through.

 

5.  Ran tcpdump when the DNS server's debugging didn't show even a single incoming request.

 

6.  Tried checking for firmware updates.

 

7.  Tried configuring the extender in standalone router mode to see if that fixed the inability to check for firmware updates.  (It did not.)  Note that I did not actually try connecting to it over WI-Fi while it was in that mode; I operated the web interface via the wired Ethernet side while connected to the upstream router.

 

8.  Reconfigured the extender with 8.8.8.8 as its resolver.  This fixed the inability to check for updates, but none were available.

 

9.  Switched back to extender mode.

 

10.. Determined that having the extender set to use 8.8.8.8 as its resolver did not fix the downstream clients' inability to perform DNS lookups (as expected, but worth verifying).

 

11.  Checked for updates on the mesh controller Wi-Fi router.  (None were available.)

 

12.  Verified that the web UI has no port blocking support that could somehow be misconfigured.

 

13.  Turning off stateful packet inspection on the upstream firewall.

 

14.  Power-cycling the repeater again.

 

At that point, I gave up and unplugged it from the wall so that I actually have a usable network again.

 

It is worth noting that ONLY DNS traffic is affected, as far as I can tell.  I can ping other hosts on the network, and can SSH to internal sites by mDNS while connected through the extender.  I have experienced zero problems while doing so.  So the router is passing traffic, but it is corrupting or dropping DNS packets — and only DNS packets.  (To be fair, I have not tried any other UDP traffic, so it could be corrupting or dropping all UDP traffic, and I would have no idea.)

 

 

Configuration:

 

This is with EasyMesh, configured with an Ethernet backhaul.  The extender has a static IP address at the top of the 192.168.1.x range.  The internal DNS server also has a static IP address near the top of that range.  The main Wi-Fi access point is at 192.168.1.1 an AX73/AX5400 Wi-Fi 6 router running firmware version 1.3.6 Build 20240325 rel.39241(5553).

 

The network switch providing the backhaul is a Cisco SG200-26.

 

Any thoughts before I box this thing up and ship it back to Best Buy as defective hardware?

 

  0      
  0      
#1
Options
3 Reply
Re:DNS data corruption
2024-10-21 05:18:57 - last edited 2024-10-21 05:24:03

One additional diagnostic. I just ran a UDP-based NDI HX stream over the connection, and it worked as expected. (I am certain that the NDI-HX hardware in question uses UDP transmission exclusively, because at one point, I was using NDI over a defective NIC that kept getting its driver into a wedged state where packets could only flow in the incoming direction, and yet the NDI stream kept flowing, despite a lack of any ACKs from the other side.)

 

So this is not a general UDP corruption issue. The TP-Link extender is specifically targeting DNS packets and corrupting them somehow, which implies a software bug, but if so, it's a showstopper, and I don't see how this product could possibly have shipped with a bug that severe.

 

Again, any thoughts would be appreciated. If I haven't heard back form TP-Link by EOD tomorrow, I'll give up and return it as defective.

 

  0  
  0  
#2
Options
Re:DNS data corruption
3 weeks ago

More diagnostics.

 

At one point, I thought that it might be leaking traffic from other VLANs, but I connected it directly to the Wi-Fi router without going through the Cisco switch where the other VLANs live, and the behavior continued.

 

I also tried resetting its settings and reconnecting.

 

Initially, it worked.

 

I changed the IP to a static IP, and it worked.

 

I power-cycled the router, and it went back to the previous behavior, just like it did the first time.

 

I don't know if this is a firmware bug or something wrong with the hardware that causes its settings to get corrupted in some way.  I sent the defective settings to someone on TP-Link's engineering team to look into further.  In the meantime, I'm going to return this hardware while I'm still within the store's narrow return window, and then buy a replacement to determine if it is a hardware issue or a software issue.

 

BTW, it turns out that the 10.0.0.1 or failure doesn't depend on the site.  It depends on the lookup code.  With nslookup or the Mac's built-in resolver, it returns 10.0.0.1 for all DNS lookups.  With dig, it returns an error that there was a parse error in decoding the response.

 

This is very, very strange.  I don't understand why a mesh router would be doing any deep packet inspection.  It should be wrapping raw Ethernet packets and sending them over the Ethernet backhaul to the main router as-is, without modification.  Obviously this is not the case, because there's really no plausible way for 8.8.8.8 to be pingable, but DNS packets sent to port 53 to be responded to by either the extender or the main router (no idea which) with 10.0.0.1 as the IP address.

 

This behavior absolutely does not make sense.  If it weren't straight out of a sealed factory box, I would suspect some sort of misconfigured spyware on a customer return.

 

  0  
  0  
#3
Options
Re:DNS data corruption
3 weeks ago

And here's where it gets more interesting.  I changed back to DHCP, and it started working.  Then I power-cycled it, and it stopped working again.  It seems to work exactly once per edit to settings, and then fails after any reboot of the extender.

 

Unclear if it is a hardware problem or a software problem.  Replacement is on order.

 

  0  
  0  
#4
Options