Every network troubleshooting methodology I have ever seen starts the same way: check the hardware, run ping, review the logs. Those are fine steps. But the hard part is what happens in your head.
By Alan DeKok, CEO, InkBridge Networks
Getting to New Zealand takes 30 hours. It was worth it.
I was invited to speak at NZNOG 2026 (the New Zealand Network Operators Group conference) after years of conversations with operators across the APAC region. Nathan Ward, one of the organisers, and I had crossed paths in various places over the years, and met in person at IETF 120 in Brisbane. When he invited me to come to New Zealand, I said yes.
The goal was twofold: a half-day workshop on working with RADIUS, and a 30-minute talk on the upcoming RADIUS standards that the IETF is developing. The audience was exactly the right one - ISPs, managed service providers, telcos, and the engineers who run production networks day to day.
The workshop ended up being three hours and was oversubscribed. I may need to go back and do it again.
After the conference, I did a wine tour in the Waipara region, and discovered just how small the world it. I got on the bus to find that there was a couple from Ottawa on board, and I ended up seated next to a man from just outside Grenoble, where I used to live. He was familiar with where I lived in the city, because I explained it as “Just a block from the worlds best cheese store: Les Alpages”.
The world is very small, even from 30 hours away.
But I want to talk about what I taught in that room, because I think it matters to anyone who manages complex network infrastructure.
The first thing I said: there is no magic trick
You find this kind of thing everywhere – “for three easy payments of $9.95, here is the one easy product that solves all your problems!” Network software has its own version of this. People come to RADIUS with the expectation that there is a single correct configuration, a magic setting, something that just works once you find it.
There is not.
What I offered instead was a way of thinking; a framework for understanding what is happening when a network system does not do what you expect.
The feedback I got was that this was more useful than any “step by step” tutorial would have been.
Where network troubleshooting methodology goes wrong
You can find a lot of good checklists. Check the hardware. Run ping. Use ipconfig. Check DNS. Review the logs. These are all reasonable things to do. The guides that cover network troubleshooting process as a sequence of actions are useful references.
But if you do not know why you are running those commands, or what you expect to see, or what the result tells you about the system, you are not troubleshooting.
You need the cognitive layer - what is happening in your head when you troubleshoot.
Debugging is a psychological process
Here is the definition of debugging I gave at NZNOG, and I think it applies to every piece of complex software you will ever work with:
Debugging
is the process of making your mental model of a system match reality -
or fixing the system so that it matches your mental model.
That’s it. Everything else follows from that.
When something is broken, there are only two possibilities:
- Either you believe the system does X, and it actually does Y - in which case, your mental model is wrong, and you need to update it.
- Or you believe the system should do X, it was doing X, and now it is doing Y - in which case, something has changed and you need to find what.
Until you are clear on which of those situations you are in, you are just guessing.
Most of the questions I see on the FreeRADIUS mailing list - and I have been answering questions on the list for 25 years - come from engineers who are operating with an incorrect mental model of how the system works. They’re educated and experienced, but for some reason that doesn’t help them solve problem. They just have not been given the right framework. They don’t have the right mental model, and they don’t have the right procedures.
This makes it incredibly difficult for them to solve problems. They get frustrated that the answers they’re getting aren't helping. The people answering questions get frustrated because their help isn’t being respected.
Getting your mental model right: the RADIUS example
I asked the room at NZNOG: what is RADIUS?
Most people gave the expected answer. Authentication, Authorisation, and Accounting. AAA. The security layer that sits between users and the network.
Wrong.
RADIUS is the world's dumbest database query protocol. And I mean that as a precise technical description, not an insult.
A lot of engineers operate with the mental model that the RADIUS server is in charge.
It is not.
When a user tries to connect to an access point, the access point sends a query to the RADIUS server: should I let this user in? The RADIUS server responds. And then the access point decides what to do with that response.
The access point can ignore it entirely. It can put the user in a different VLAN than the one the server specified. It can grant access the server denied. There is nothing that the server can do.
I have fielded many variations of this question for 25 years: "the server is sending back the right attribute and the NAS is not doing it, what's wrong with the server?" The answer is: nothing. The problem is the client.
Once you have that mental model corrected - once you understand that RADIUS is a query protocol, and the NAS is the decision-maker - a significant category of troubleshooting problems becomes immediately clearer. You know where to look. You know what question to ask.
For practical help with the error messages you encounter, the guides on common FreeRADIUS debug messages and the 8 most common RADIUS mistakes are good starting points - but read them with the mental model question in mind, not just as a lookup table.
I intend to post a video of this RADIUS workshop on the InkBridge Networks YouTube channel, so you could subscribe there to get notified when it comes out.
Why the twice-a-year problem makes this worse
Most engineers do not touch their RADIUS server - or their complex network infrastructure generally - very often. Maybe twice a year. Maybe less. And every time they do, they have largely forgotten what they knew. The configuration is opaque. The log output is cryptic. The documentation is dense.
This is a structural issue. It’s hard to maintain a mental model of complex software when you interact with it twice a year. The answer is to have a systematic approach to refreshing that model quickly as needed.
What I heard from the NZNOG operators
After the standards talk, I had hallway conversations with operators throughout the conference. Their feedback on the upcoming RADIUS standards - including RADIUS 1.1 and the other specifications being developed at the IETF - was broadly positive, with some useful technical input on implementation details.
What was more striking was the gap they described between the standards work and what ends up in field equipment. As I have written about in the context of IETF 124 in Montreal, the theory and the practice of network standards can diverge significantly.
The IETF concentrates on requirements. Operator conferences show you what people are running - and what has not changed in 15 years despite multiple standard updates.
That gap is part of why I think the conversation on network troubleshooting methodology matters. You can have perfectly designed standards and still have engineers in the field who are troubleshooting by trial and error at 2 a.m. The standards help. The mental models help even more.
Need more help?
If your team is wrestling with network configuration, a troubleshooting problem you cannot resolve, or a system that needs to be more resilient, we can help. InkBridge Networks has 25 years of expertise - we wrote the standards, maintain FreeRADIUS, and have seen every failure mode there is. Reach out to request a quote.
FAQ: common questions about network troubleshooting methodology
What is a network troubleshooting methodology?
A network troubleshooting methodology is a structured approach to diagnosing and resolving network problems. Rather than working through a fixed checklist, a good methodology starts with forming a clear hypothesis about what the system should be doing, comparing that to what it is actually doing, and working systematically to close that gap.
What is the first step in troubleshooting a network problem?
Before touching anything, get clear on what you expect the system to do and what it is actually doing instead. That gap is the problem. Everything else - logs, commands, configuration checks - is evidence you use to understand and close it.
What are the most common network troubleshooting mistakes?
The most common mistake is acting without a clear hypothesis. Engineers often start changing things - restarting services, modifying configuration - before they understand what the fault is. This can make the problem harder to diagnose and easier to mask without fixing. For RADIUS-specific examples, the error messages behind a shared secret error or an unresponsive child message are classic cases where engineers blame the wrong thing because their mental model points them in the wrong direction.
How do you debug a RADIUS server?
Start by enabling debug mode and reading the output carefully. FreeRADIUS, in particular, produces detailed logs that tell you exactly what the server received, what it decided, and what it sent back. The key is knowing what to look for - which comes back to having a correct mental model of how RADIUS works before you start. If the server is making the right decision and the NAS is ignoring it, no amount of server-side debugging will help.
Why is my RADIUS server not responding?
"Not responding" covers several distinct failure modes. The server may not be receiving the packet (firewall or routing issue), it may be receiving it but rejecting it silently (shared secret mismatch or client not in the clients.conf), or it may be processing it and the NAS is not receiving the response (return routing issue). Each requires a different diagnosis. Start with the debug log to establish which stage the packet reaches.
Related Articles
Common FreeRADIUS debug messages
FreeRADIUS can feel overwhelming at first, but its debug output helps manage complexity. Once you understand how to read it, the system becomes surprisingly straightforward to troubleshoot.
8 Most common RADIUS mistakes
Many FreeRADIUS issues stem not from RADIUS itself but from infrastructure or poor network practices. These problems aren’t always obvious, so we highlight common mistakes and how to avoid them.