Ridiculous Router Resets
As you might have already read around these parts, my home broadband is not a great service. Ever since the switch to Post Office ADSL, we’ve had fairly regular problems with PPPoE dropouts on the line. As with most things in the world of “user friendly” broadband providers, trying to get any answers over why that might be is a lesson in futility - but I learnt that this is likely due to a slight incompatibility between the modem and the ISP’s hardware in the exchange. Annoyingly, switching between the ISP provided modem and another ADSL modem made no difference.
One day, the issue got worse, and suddenly the whole DSL connection was unreliable. The ISP was called, and Openreach came (multiple times) and (with some difficulty) identified physical faults with the telephone wiring on the street. Thankfully, the issue was resolved, and for a day everything seemed okay… until the PPPoE dropouts returned.
Symptoms
The symptoms of the dropouts are quite peculiar. Some open connections continue to function, while others are refused much sooner. For example, the first sign of a problem is often an SSH session hanging, Slack messages not sending, or Twitch streams freezing. At the same time, voice calls over Zoom or Discord or online gaming sessions continue to function. An interesting outcome of this is that often times I’m able to tell Hazel in advance that her voice chat with her friends is about to cut out in advance of it actually happening - and she’s able to tell her friends that it’s about to happen! It’s quite fun to sound like a fortune teller.
I assume that the difference here is based on the choice of transport layer1. Online games and video conferencing software are usually UDP based, whereas HTTP and SSH use TCP. When the connection starts to drop, somewhere at the ISP side TCP connections must be reset, but UDP traffic must not be filtered out. Eventually, some internal process on the Zyxel must detect the degraded connection state, and will finally count the connection as down. In this state, the modem will then shutdown the remaining working part of the connection, and any application that’s still working will finally die off.
Restoration
If you’re patient, after a short wait the modem will automatically reconnect.
If you’re impatient, you can log on to the web interface, and click the connect button to force the reconnection.
If you’re really impatient, you can log on to the web interface, forcibly disconnect the modem, and forcibly reconnect the modem - reducing the overall downtime if you can navigate the interface quickly (I’ve had a lot of practice).
This is annoying, but fine. The last thing I want to do is ring Post Office again, and organise more engineer visits for something that I don’t think Openreach will be equipped to repair, and only pops up about once a day. Our broadband contract will be over soon anyway!
Unfortunately, what makes the workaround worse is that the webserver on the Zyxel AMG1302-T11C is incredibly slow, sessions are short lived such that every disconnect requires a login, and the UI story for the one function I need is closer to a novel. To reconnect, I need to:
- Load the web interface
- Wait for the redirect to the login page
- Provide credentials
- Wait for the redirect through the auth endpoint to the (useless) main page
- Click through to the detailed status page
- Wait for an AJAX request to load in the content of the status page
- Click the disconnect button
- Wait to be redirect back to the (useless) main page
- Repeat steps 5 and 6
- Click the connect button
- In a couple of seconds, the connection should be restored
It’s a painful process with such a slow webserver, but if you’re fast enough it’s possible to restore the connection before anything UDP-based gives up sending packets. You’ll be a bit behind in your dark age, but at least you can play on.
Not wanting to deal with more callouts, to diagnose an issue that seemingly was unrelated to anything Openreach would be able to find, I decided to write a little something to fix the issue on my own terms!
I am the Tech Support Now
I decided to write a little tool to perform the manual reconnect on my behalf, to make the process as quick as possible. I chose to use Go for the task - primarily in this case as it builds statically linked executables, and is thus easy to deploy wherever I want. I can run the tool on my Windows desktop, or I can deploy it to a Raspberry Pi if I want it ready to go at all hours.
That, and because I wanted to!
I decided to make use of Cobra, a popular Go library for the creation of CLI tools.
Using Cobra isn’t particularly necessary - the standard library’s flag
package is full featured enough (although annoyingly opinionated in not supporting GetOpt style long and short options), but Cobra does provide a lot of nice-to-haves.
It also comes bundled with Viper, which has shortcuts for loading options from environment variables or config files.
Interacting with the AMG1302-T11C Automatically
I thought that automating the interaction with the modem’s web interface would be easy. At work, I have done a lot of web automation tasks, and I figured that the simplistic but clunky web interface would be a piece of cake to interact with in comparison to the cases I have dealt with in the past. What I didn’t quite predict is how clunky a web interface could be!
I dived in by first opening up dev tools, and inspecting what details were sent by the login form. On the upside, I didn’t see any kind of CSRF protection on the UI. Terrible from a security standpoint, but at least my automated script would be simpler. To my surprise, I saw a mess of inline Javascript on the login page, primarily focused on implementing base64 encoding.
I was preparing to write some code to do a fairly standard HTTP POST, with some form encoded credentials.
To my horror, the login form instead produced the following request (with sensitive parts replaced with $VAR$
placeholders, of course):
POST /cgi-bin/index.asp?$BASE_64$ HTTP/1.1
Host: 192.168.1.1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-GB,en-US;q=0.7,en;q=0.3
Accept-Encoding: gzip, deflate
Content-Type: application/x-www-form-urlencoded
Content-Length: 51
Origin: http://192.168.1.1
DNT: 1
Connection: keep-alive
Referer: http://192.168.1.1/cgi-bin/login.html
Cookie: SESSIONID=$WOW_I_BET_THIS_APP_DOESNT_REUSE_SESSION_IDS$
Upgrade-Insecure-Requests: 1
Pragma: no-cache
Cache-Control: no-cache
Loginuser=$MYUSER$&LoginPassword=&Prestige_Login=Login
So… authentication is a POST request, but the actual response body doesn’t actually contain the password.
Instead, the $BASE_64$
in the URL actually decodes to (again with substitutions):
$MYUSER$:$MYPASSWORD$
Oh dear.
In my state of shock, I also discovered that the login endpoint works just as well without the request body, and with a GET request - sorry RFC7231.
Questionable Implementations Are Easy to Interface With… Right?
So the implementation of authentication on the Zyxel router is awful, but at least it’s easy to implement. There’s no CSRF protection we need to circumvent, so the only hoop we need to jump through is a fairly straightforward string concatenation and base64 encode step to build the login URL.
credParam := fmt.Sprintf("%s:%s", c.Username, c.Password)
credParamEncoded := base64.StdEncoding.EncodeToString([]byte(credParam))
loginURL := url.URL{
Scheme: "http",
Host: c.Hostname,
Path: "/cgi-bin/index.asp",
}
// The T11C doesn't pass this as a key/value pair, and doesn't escape any trailing '='!
loginURL.RawQuery = credParamEncoded
loginResp, err := c.client.Get(loginURL.String())
Easy stuff, albeit not very standard. I added a small bit of test logging, to check if the response from the router showed a successful login.
It did not!
Filling the Cookie Jar
Running in Delve, the Go debugger, I noticed that my http.CookieJar
was never actually getting a session cookie.
After a bit of poking at my code, and then a confused return to the web browser, I realised the issue - the T11C only sends the Set-Cookie
header in response to a request that redirects because of a lack of authentication!
When using a web browser and navigating directly to the login form, this issue doesn’t rear it’s head.
In these cases, we are saved by an unlikely hero:
GET /favicon.ico HTTP/1.1
Host: 192.168.1.1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0
Accept: image/webp,*/*
Accept-Language: en-GB,en-US;q=0.7,en;q=0.3
Accept-Encoding: gzip, deflate
DNT: 1
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
HTTP/1.0 302 Moved Temporarily
Date: Sun, 30 Aug 2020 22:53:27 GMT
Server: Boa/0.94.13
Connection: close
Content-Type: text/html; charset=ISO-8859-1
Set-Cookie: SESSIONID=$SOMESESSION$; path=/
Location: http://192.168.1.1:80/cgi-bin/login.asp
Thankfully, the Zyxel engineers tactically did not include a favicon, to fix their session handling.
To fix the issue on my end, I added an initial fetch to the root path, in order to trigger a redirect response with a Set-Cookie
header.
Plain Sailing
After this hiccup, interacting with the T11C was fairly straightforward.
The POST
request required to trigger the connect/disconnect of the modem worked as expected, although at least one of the parameters it uses seems to do nothing.
I decided to go a little further, and implement an extra method and command to query the current state of the modem as reported by the web interface.
I could use this to quickly see whether the router was reporting a working connection when I started to experience connectivity issues.
Implementing this was fairly simple, thanks to the Go html
package.
I decided to have a bit of fun and implement the parse tree navigation myself, rather than pull in a dependency like Cascadia.
Adding a Watch Mode
I had a working command line tool to quickly reset the modem on the Zyxel. This was pretty good, I could now start a game of AoE2, and as soon as I noticed the connection drop, run the tool and bring the connection back up ASAP. The restore was fast enough to stay in the game without dropping, although does still lead to some lost time.
I wanted to then add a command to my tool to automatically call the reconnect as soon as the connection drops.
I could have implemented this with some shell script trickery and the ping
command, but I wanted to make it as easy to run as possible.
I decided to use the go-ping library to implement connectivity checking.
This worked well for the most part, bearing a small issue with error reporting and a misunderstanding on my part in how percentages are represented.
I ran go-ping periodically using a time.Ticker
, and added support for a simple context.Context
for cancellation purposes, as well as structured and levelled logging with go-kit.
Conclusion
I now have a useful Go based tool to automatically monitor my connection and reset the modem when appropriate - it has reduced the interruption from a PPPoE drop from a good 30 seconds plus, down to about 5 seconds, depending on how responsive the server is. In the process, I’ve learnt some horrifying things about the Zyxel’s web interface - I’m not surprised so many home routers get compromised!
If you are also the proud owner of an AMG1302-T11C and also suffer from the same ISP woes, then check out the code.
I might implement it as a Windows service in the easy way using something like NSSM, or I might install it as a SystemD service on my old Raspberry Pi.
Either way, I think it’s very useful as-is, and maybe someone else finds it useful too!
-
I say assume because I haven’t actually watched the traffic in Wireshark when the connection drops. It’s far too unpredictable and infrequent to catch it without just running Wireshark 24/7. ↩︎