RC38. Building a DNS Resolver in C, Part 1 (of many)
Daily dispatches from my 12 weeks at the Recurse Center in Summer 2023
A few weeks back, a few recurser friends and I implemented a DNS resolver in Python using this excellent guide put together by a legendary RC alum. Knowing nothing about networking or how the internet works, I found this to be a wonderful and approachable introduction to the topic.
My appetite thusly whet (whetted?), I figured, hey, since I’ve also been focusing on C these past few months, why not try to re-implement this DNS resolver thing in C ? How hard could it be?
My initial thought was to take the approach of line-by-line translation, so I began dutifully building structs to handle the DNS header and DNS question as I had in the Python version. In that case, the DNS header class consisted of 6 integer attributes, and the DNS question class had 2 integer attributes and a string (the domain we want to resolve). The main thing that needed to happen with the values contained by these objects was to translate them into one long hex sequence, with each integer attribute represented as a two-byte hex value.
I was about to start implementing an integer-to-hex function so that I could encode my header and question structs to hexstrings, but then I stopped, since I realized that I’d better be damn sure that whatever C libraries I’ll be using are expecting the same information in the same format. My understanding is that these protocols are strictly defined, so I’m not expecting something wildly different. All the same, I do want to be sure that the C socket API wants, for instance, this kind of hexstring. That’s when I realized this translation would not be so straightforward after all. That’s also when I began consulting Beej’s Guide to Network Programming.
Along with a few friends also interested in this project, I made several small-step-but-giant-leaps forward today with Beej’s guide on our side. Our rough plan for a DNS resolver that attempts to circumvent higher-level abstractions like getaddreinfo()
looks like this (which basically mirrors the Python tutorial that is our model):
- open a socket with
socket()
- connect to it with
connect()
- send stuff with
sendto()
- get stuff with
recv()
- examine it and decide what to do next
- close socket with
close()
We sorted out how socket()
works. But our next obstacle came when we started thinking through how to send stuff with sendto()
.
In the Python tutorial, the high level process looks like this:
import socket
query = build_query("www.example.com", 1)
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.sendto(query, ("8.8.8.8", 53))
response, _ = sock.recvfrom(1024)
For the moment let’s ignore the build_query()
part, which I’m anticipating will be the bulk of the work. Instead I’m focused on the sock.sendto()
line, in particular the part where we specify that we’d like to send our query to Google’s DNS server (“8.8.8.8”) on port 53.
When we use C’s version of sendto()
, we will be passing to it our socket descriptor (returned when we open our socket with socket()
) a buffer string (the content we’re sending – presumably something like the header and question combo alluded to above), the length of our query in bytes, and then a pointer to a sockaddr
struct (or its equivalent sockaddr_in
) and its size. This last part, we realized, is where we are specifying where and how we want to send our stuff – in this case, to “8.8.8.8” on port 53. Now we’re getting somewhere! And in that sense, this sockaddr_in
struct seems key to unlocking this small part of the puzzle.
Here’s what the struct looks like:
struct sockaddr_in {
short sin_family;
unsigned short sin_port;
struct in_addr sin_addr;
char sin_zero[8];
};
struct in_addr {
unsigned long s_addr;
};
This sockaddr_in
structure holds an address family (in our case AF_INET
), a port number, and an IPv4 address, AKA one of those things that looks like 142.250.176.14
, which is four one-byte integers strung together (i.e., each of the four component parts can range from 0-255). But what’s interesting is that this sin_addr
component is not some string of four integers or something; it’s a single unsigned long, which I believe is either a 4-byte or 8-byte integer. Either way it’s a big-ass number. That means we need to translate our address’s component parts to one-byte bytestrings, concatenate them together into a single 4-byte string, and then interpret that as a single integer.
For example, if we replace each one-byte integer component of
142.250.176.14
with binary, we’d get
10001110 11111010 10110000 00001110
and if we concatenate those together into a single 4-byte binary value:
10001110111110101011000000001110
which corresponds to the integer
2,398,793,742
Now, I believe there is a convenient way to tranlsate a human-readable IPv4 address to an integer and vice-versa with inet_pton()
, and we very well may avail ourselves of this convenience. But why do that when we could just . . . implement it ourselves?? In which case, ultimately what we’re going to need to do is take an IPv4 address (“8.8.8.8”), parse it into four bytestrings, concatenate the result into a single 4-byte bytestring, and return it as an integer. Then we’ll have that component part of our sockaddr_in
struct and, in turn, a component of what we’ll need to pass along to sendto()
.
And with that, I present to you a fun REPL implementation of btoi()
, our friendly bitstring-to-integer parser which will handle that last step of translating a 4-byte bytestring to an unsigned, 32-bit integer.
#include <stdio.h>
#include <stdint.h>
/* convert bitstring[32] to 32-bit uint */
uint32_t btoi(char *binstr, int len)
{
int i;
uint32_t result;
result = 0;
for (i=0; i<len; ++i)
if (binstr[i] == '1')
result += 1 << (len - 1 - i);
return result;
}
int main()
{
char c, binstr[32];
int i = 0;
while ((c = getchar()) != EOF)
{
if (c != '\n')
{
if (i < 32)
// add `c` to `binstr` and increment `i` while `i` < 32
binstr[i++] = c;
} else {
// print result
printf("-> %u\n", btoi(binstr, i));
// reset all `binstr` elements to null
for (; i>0; --i)
binstr[i] = '\0';
}
}
return 0;
}
You know what? I’m proud of this little function.
First because it handles strings less than and greater than 32 characters in length, no problem. Any characters in excess of 32 just never get added to the binstr
array in the first place. And for strings less than 32 characters in length, the len
parameter of btoi()
ensures that the most significant bit is at place len
(not at place 32!).
And second because I landed on what feels like a clever solution to resetting all the binstr
elements to null at the same time as I reset counter i
, which I accomplish by decrementing i
back to 0 and setting binstr[i]
to the null string at each step. That way, I ensure there’s no garbage left over in any of those memory locations.
Here it is in action:
> 10011
19
> 001
1
> 100
4
> 100110110100101010110101011110101001011011
2605364602
The next step is to do the first part of the parsing operation, namely walking through the IPv4 address, extracting each one-byte integer, translating it to a bytestring, and concatenating the results to pass to btoi()
. And at that point we will be well on our way to building a sockaddr_in
struct, which means we’ll be one step closer to making a connection.