DNS data corruption

DNS data corruption

DNS data corruption
DNS data corruption
2024-10-20 09:06:17
Tags: #OneMesh #DNS
Model: RE655BE  
Hardware Version: V1
Firmware Version:

I just got this extender a couple of days ago, I'm having serious problems making it pass traffic correctly, and I'm wondering if anybody else has seen similar issues, or if I got massively bad hardware somehow.

 

As soon as I started using the extender, DNS lookups from my Mac started either failing or returning 10.0.0.1, depending on the site.  This happened for both internal (intranet) and external (public) sites.

 

When I disabled the range extender, everything works again.

 

According to tcpdump, the DNS requests do not reach my internal DNS server at all.

 

This failure occurs regardless of my Mac's DNS settings; DNS requests all get lost, even with 8.8.8.8.  I have verified that nothing in the Mac UI is lying to me by using dig on the command line, and the above observations are still 100% reproducible.

 

When the extender is configured to use my internal DNS server (which I have verified is working correctly), it can't connect to the cloud server to check for firmware updates.

 

If I change the extender to use 8.8.8.8 as its DNS server, it is able to check for updates and says that none are available.  Even with that setting, I still get corrupted/no DNS responses on downstream clients.

 

 

Troubleshooting steps performed:

 

1.  Noticed that DHCP server on the extender was set to auto, which could potentially compromise my wired network by having multiple DHCP servers.  Turned that off.

 

2.  Tried pointing the Mac client at a different DNS server.

 

3.  Noticed that the extender was forcibly inserting 192.168.1.1 as a secondary DNS server if I left it blank, so inserted my own DNS server in both slots to force it to provide the right one to clients, just in case the upstream router was providing bad DNS responses.  This appeared to fix the problem for about 24 hours, or maybe those sites just got cached on my laptop while it was connected to the other router, but the fix didn't hold.  (I later determined that this was a red herring, as 192.168.1.1 does not respond to DNS requests.)

 

4.  Added debugging to my DNS server to see what was actually coming through.

 

5.  Ran tcpdump when the DNS server's debugging didn't show even a single incoming request.

 

6.  Tried checking for firmware updates.

 

7.  Tried configuring the extender in standalone router mode to see if that fixed the inability to check for firmware updates.  (It did not.)  Note that I did not actually try connecting to it over WI-Fi while it was in that mode; I operated the web interface via the wired Ethernet side while connected to the upstream router.

 

8.  Reconfigured the extender with 8.8.8.8 as its resolver.  This fixed the inability to check for updates, but none were available.

 

9.  Switched back to extender mode.

 

10.. Determined that having the extender set to use 8.8.8.8 as its resolver did not fix the downstream clients' inability to perform DNS lookups (as expected, but worth verifying).

 

11.  Checked for updates on the mesh controller Wi-Fi router.  (None were available.)

 

12.  Verified that the web UI has no port blocking support that could somehow be misconfigured.

 

13.  Turning off stateful packet inspection on the upstream firewall.

 

14.  Power-cycling the repeater again.

 

At that point, I gave up and unplugged it from the wall so that I actually have a usable network again.

 

It is worth noting that ONLY DNS traffic is affected, as far as I can tell.  I can ping other hosts on the network, and can SSH to internal sites by mDNS while connected through the extender.  I have experienced zero problems while doing so.  So the router is passing traffic, but it is corrupting or dropping DNS packets — and only DNS packets.  (To be fair, I have not tried any other UDP traffic, so it could be corrupting or dropping all UDP traffic, and I would have no idea.)

 

 

Configuration:

 

This is with EasyMesh, configured with an Ethernet backhaul.  The extender has a static IP address at the top of the 192.168.1.x range.  The internal DNS server also has a static IP address near the top of that range.  The main Wi-Fi access point is at 192.168.1.1 an AX73/AX5400 Wi-Fi 6 router running firmware version 1.3.6 Build 20240325 rel.39241(5553).

 

The network switch providing the backhaul is a Cisco SG200-26.

 

Any thoughts before I box this thing up and ship it back to Best Buy as defective hardware?

 

  0      
  0      
#1
Options
6 Reply
Re:DNS data corruption
2024-10-21 05:18:57 - last edited 2024-10-21 05:24:03

One additional diagnostic. I just ran a UDP-based NDI HX stream over the connection, and it worked as expected. (I am certain that the NDI-HX hardware in question uses UDP transmission exclusively, because at one point, I was using NDI over a defective NIC that kept getting its driver into a wedged state where packets could only flow in the incoming direction, and yet the NDI stream kept flowing, despite a lack of any ACKs from the other side.)

 

So this is not a general UDP corruption issue. The TP-Link extender is specifically targeting DNS packets and corrupting them somehow, which implies a software bug, but if so, it's a showstopper, and I don't see how this product could possibly have shipped with a bug that severe.

 

Again, any thoughts would be appreciated. If I haven't heard back form TP-Link by EOD tomorrow, I'll give up and return it as defective.

 

  0  
  0  
#2
Options
Re:DNS data corruption
2024-10-26 19:15:56

More diagnostics.

 

At one point, I thought that it might be leaking traffic from other VLANs, but I connected it directly to the Wi-Fi router without going through the Cisco switch where the other VLANs live, and the behavior continued.

 

I also tried resetting its settings and reconnecting.

 

Initially, it worked.

 

I changed the IP to a static IP, and it worked.

 

I power-cycled the router, and it went back to the previous behavior, just like it did the first time.

 

I don't know if this is a firmware bug or something wrong with the hardware that causes its settings to get corrupted in some way.  I sent the defective settings to someone on TP-Link's engineering team to look into further.  In the meantime, I'm going to return this hardware while I'm still within the store's narrow return window, and then buy a replacement to determine if it is a hardware issue or a software issue.

 

BTW, it turns out that the 10.0.0.1 or failure doesn't depend on the site.  It depends on the lookup code.  With nslookup or the Mac's built-in resolver, it returns 10.0.0.1 for all DNS lookups.  With dig, it returns an error that there was a parse error in decoding the response.

 

This is very, very strange.  I don't understand why a mesh router would be doing any deep packet inspection.  It should be wrapping raw Ethernet packets and sending them over the Ethernet backhaul to the main router as-is, without modification.  Obviously this is not the case, because there's really no plausible way for 8.8.8.8 to be pingable, but DNS packets sent to port 53 to be responded to by either the extender or the main router (no idea which) with 10.0.0.1 as the IP address.

 

This behavior absolutely does not make sense.  If it weren't straight out of a sealed factory box, I would suspect some sort of misconfigured spyware on a customer return.

 

  0  
  0  
#3
Options
Re:DNS data corruption
2024-10-26 20:18:54

And here's where it gets more interesting.  I changed back to DHCP, and it started working.  Then I power-cycled it, and it stopped working again.  It seems to work exactly once per edit to settings, and then fails after any reboot of the extender.

 

Unclear if it is a hardware problem or a software problem.  Replacement is on order.

 

  0  
  0  
#4
Options
Re:DNS data corruption
3 weeks ago

This nonsense just started for me yesterday. Have an RE900XD, firmware: "1.0.10 Build 20240411 rel.70678" (which is "RE900XD(EU)_V1_240411" on the Australian TP-Link website). Have been setup with ethernet backhaul to an "AX3000 4-Stream Wi-Fi 6 Router" for months, and it was working great.

 

Then yesterday my 2024 Macbook Air running MacOS 15.1.1 started resolving everything to 10.0.0.1 while connected to the RE900XD. DNS server is set to 1.1.1.1. When I finally thought to try turning the RE900XD off, suddenly the problem goes away on the Macbook. DNS resolution starts resolving to the correct IP addresses. Turn the RE900XD back on and wait for the Macbook Air to move over to it, and we're back to 10.0.0.1 for everything.

 

Why the hell is the range extender messing with the DNS results from the Cloudflare DNS!? (1.1.1.1)

 

If I try "dig", I get the same "parse error".

 

I was unable to reproduce this DNS screwup on a Microsoft Surface Laptop 4, iPhone 14, iPhone 15 (both latest iOS), or a Macbook Pro running MacOS 14.6.1.

 

Whatever is going on renders my RE900XD useless. Would be really nice to hear something about this from TP-Link...

 

Configuration:

 

EasyMesh with Ethernet backhaul between a "AX3000 4-Stream Wi-Fi 6 Router" and a "RE900XD"

Router has a static IP: 192.168.1.1, firmware: "1.3.3 Build 20240628 rel.37017(4555)"

Extender has a static IP: 192.168.1.1, firmware: "1.0.10 Build 20240411 rel.70678"

Devices get IPs in the range 192.168.1.0/24 from the router.

Router DHCP DNS is set to 1.1.1.1 primary and 8.8.8.8 secondary

 

The range extender extends both my 2.4Ghz and 5Ghz networks. The same problem happened on both. The range extender's "Network Settings" are to "Use the following IP address" for 192.168.1.2, subnet: 255.255.255.0, default gateway: 192.168.1.1, primary DNS: 1.1.1.1, secondary DNS: 8.8.8.8. The "DHCP server" is "Off". EasyMesh says "Backhaul Type: Ethernet Backhaul". TP-Link Cloud is not setup. Region is set to "Australia".

 

I don't have anything fancy network wise. My NTD is connected to the AX3000 and I have switches and devices plugged into that. No VLANs or other complexity. Just the two TP-Link devices.

  1  
  1  
#5
Options
Re:DNS data corruption
3 weeks ago

Sorry for not following up here.  After some exchanges back and forth, including sending all of my configuration files to one of their engineers, they were able to reproduce the problem and are working on a firmware fix, but it is configuration-specific, and it is possible to work around the bug until they have a fix available.

 

To work around the bug, change the following settings:

 

  • DHCP server -> auto (not off)
  • IP address -> DHCP (not manual)

 

As long as the extender is getting its IP address by DHCP from the primary TP-Link router and the extender's DHCP server is set to auto, everything works.  It's when people who actually know what they are doing try to configure it in a way that's sane and reasonable in a complex network that it craps the bed.  😂

 

Hope that helps.

 

  0  
  0  
#6
Options
Re:DNS data corruption
3 weeks ago

  @dgatwood thank you for the tip! Will try that.

  0  
  0  
#7
Options