22.08.2023

CVE-2023-31147: Insufficient randomness for DNS query identifiers in c-ares

Despite having DNSSEC, DoH or DoT to secure DNS lookups, many systems still rely on plain old DNS from 1983. Earlier this year, we’ve been part of a larger team that audited c-ares v1.19.0. c-ares is an asynchronous DNS client library with support for a wide range of platforms. It is around for quite some time now and a few of its more prominent users include libcurl, node.js and Wireshark.

In this post, we delve into one specific outcome of our work, namely the weak DNS query ID generation in c-ares identified as CVE-2023-31147.

DNS Query ID Generation in c-ares

Even though plain DNS does not include any cryptographic measures for authenticity, DNS queries use two properties for being more resilient against forged answers1:

  1. Source port randomization: DNS queries originate from a randomly chosen port, and the response must be directed back to the same port of the originating query.
  2. Random DNS query identifiers: DNS queries contain a randomized 16-bit ID, which the corresponding response must match to gain acceptance.

In the good old days, this was not the default. For instance, DNS resolvers used fixed source ports, allowing attackers to focus solely on correctly predicting the 16-bit query ID. The practical exploitation of this vulnerability was most prominently shown by Dan Kaminsky in 2008 through the publication of CVE-2008-1447. Thus, selecting the source port and query ID using a cryptographically secure random number generator (CSPRNG) is crucial: It makes it quite hard for attackers to construct and inject a malicious response before the real response arrives at the client.

When auditing a DNS protocol implementation like c-ares, we always check if these mitigations are properly implemented. In the context of this blog post, we primarily focus on random DNS query identifiers, as source port randomization is nowadays the default and is managed by the OS2.

The high-level design of DNS-Query ID generation in c-ares appears fairly simple to users: Upon initialization with ares_init(ares_channel *channelptr), c-ares collects random bytes from the OSes CSPRNG. These bytes serve as the seed for an internal CSPRNG. This internal CSPRNG is then used to generate the 16-bit DNS query ID for each individual query. The CSPRNG state is stored in the opaque type ares_channel and updated every time a DNS query ID is generated through the use of ares__generate_new_id(...).

Hello 1987

Once we start looking at the code more closely, we’ll realize that not everything is well designed: First, we notice that DNS query IDs are generated using a pseudo random number generator (PRNG) based on RC4:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
static void rc4(rc4_key* key, unsigned char *buffer_ptr, int buffer_len)
{
  unsigned char x;
  unsigned char y;
  unsigned char* state;
  unsigned char xorIndex;
  int counter;

  x = key->x;
  y = key->y;

  state = &key->state[0];
  for(counter = 0; counter < buffer_len; counter ++)
  {
    x = (unsigned char)((x + 1) % 256);
    y = (unsigned char)((state[x] + y) % 256);
    ARES_SWAP_BYTE(&state[x], &state[y]);

    xorIndex = (unsigned char)((state[x] + state[y]) % 256);

    buffer_ptr[counter] = (unsigned char)(buffer_ptr[counter]^state[xorIndex]);
  }
  key->x = x;
  key->y = y;
}

// ...

/* a unique query id is generated using a rc4 key. Since the id may already
   be used by a running query (as infrequent as it may be), a lookup is
   performed per id generation. In practice this search should happen only
   once per newly generated id
*/
static unsigned short generate_unique_id(ares_channel channel)
{
  unsigned short id;

  do {
    id = ares__generate_new_id(&channel->id_key);
  } while (find_query_by_id(channel, id));

  return (unsigned short)id;
}

unsigned short ares__generate_new_id(rc4_key* key)
{
  unsigned short r=0;
  rc4(key, (unsigned char *)&r, sizeof(r));
  return r;
}

In case you’re too young or it’s been a while: RC4 is a stream cipher designed in 1987 and has gained widespread usage over the years. However, since its inception, it has been shown to be flawed and insecure multiple times. It is also the reason why Wifi protocols WPA-TKIP and WEP were both famously broken. So it is not a cipher you’d use in 2023.

Using RC4 as cryptographically secure PRNG (CSPRNG) was also quite common some time ago and it was the core of arc4random(3) which originated in OpenBSD and is nowadays part of all BSD-descendants including Apple’s macOS and iOS. However, with the discovery of more and more ways the RC4 key stream is biased and not properly random, it became clear that RC4 was unsuitable for use as a CSPRNG. As a result, today’s arc4random(3) implementations use CSPRNGs based on ChaCha20 or AES.3

Shifting our focus back to c-ares, there is more to be considered: While functions like arc4random(3) did ensure that new entropy is added after a certain amount of random bytes has been generated, c-ares takes a different approach. It simply seeds the PRNG once and uses it throughout the entire lifespan of the ares_channel. This may not be a concern for tools like adig (the dig version of c-ares), since the process will never live long enough. On the other hand, if we consider services which run for a much longer time and perform a lot of DNS queries, this might be a different story. Especially, if said service initializes a single ares_channel upon startup and uses it until it is stopped. While this is probably not really an issue with c-ares, it is common practice to reseed the PRNG after generating a certain amount of bytes.

Broken Twice

With our interest peaked, we can start looking more closely into how RC4 is seeded. This is done in the function init_id_key(rc4_key* key,int key_data_len) which gets handed in a pointer to the rc4_key type stored in the ares_channel and a number dubbed key_data_len:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
static int init_id_key(rc4_key* key,int key_data_len)
{
  unsigned char index1;
  unsigned char index2;
  unsigned char* state;
  short counter;
  unsigned char *key_data_ptr = 0;

  key_data_ptr = ares_malloc(key_data_len);
  if (!key_data_ptr)
    return ARES_ENOMEM;
  memset(key_data_ptr, 0, key_data_len);

  state = &key->state[0];
  for(counter = 0; counter < 256; counter++)
    /* unnecessary AND but it keeps some compilers happier */
    state[counter] = (unsigned char)(counter & 0xff);
  randomize_key(key->state,key_data_len);
  key->x = 0;
  key->y = 0;
  index1 = 0;
  index2 = 0;
  for(counter = 0; counter < 256; counter++)
  {
    index2 = (unsigned char)((key_data_ptr[index1] + state[counter] +
                              index2) % 256);
    ARES_SWAP_BYTE(&state[counter], &state[index2]);

    index1 = (unsigned char)((index1 + 1) % key_data_len);
  }
  ares_free(key_data_ptr);
  return ARES_SUCCESS;
}

We notice that the buffer key_data_ptr is actually useless, since it is never populated with something other than all zero bytes. So we can actually ignore it in the calculation of index2 in line 25.

Furthermore, the 256-byte buffer key->state[] is filled with numbers from 0 to 255, and then handed to a function randomize_key(...) which one would assume shuffles the contents of key->state[]. Also notice that we hand key_data_len to this function. Digging through the header files, we can find that its value is always ARES_ID_KEY_LEN, which is 31. So this is not the length of key->state[] which is 256 bytes.

Therefore, it begs the question where the true randomness is fetched from the OS. Looking at the code above, randomize_key(...) is the only sensible candidate. Prior to delving into that, let’s briefly compare how RC4 is implemented in OpenSSL:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
void RC4_set_key(RC4_KEY *key, int len, const unsigned char *data)
{
    register RC4_INT tmp;
    register int id1, id2;
    register RC4_INT *d;
    unsigned int i;

    d = &(key->data[0]);
    key->x = 0;
    key->y = 0;
    id1 = id2 = 0;

#define SK_LOOP(d,n) { \
                tmp=d[(n)]; \
                id2 = (data[id1] + tmp + id2) & 0xff; \
                if (++id1 == len) id1=0; \
                d[(n)]=d[id2]; \
                d[id2]=tmp; }

    for (i = 0; i < 256; i++)
        d[i] = i;
    for (i = 0; i < 256; i += 4) {
        SK_LOOP(d, i + 0);
        SK_LOOP(d, i + 1);
        SK_LOOP(d, i + 2);
        SK_LOOP(d, i + 3);
    }
}

We’ll notice that init_id_key(...) in c-ares is slightly different and actually broken:

OpenSSL’s key schedule implementation receives the raw key via the data buffer, initializes the state buffer d and then shuffles d and data.

In c-ares, key_data_ptr is OpenSSL’s data buffer and state is the equivalent of pointer d. Knowing this, we can see that c-ares confused key->state with key_data_ptr when calling randomize_key(...) which we assume retrieves a random key. Consequently, the 256 ARES_SWAP_BYTE(...) operations in c-ares’ key schedule incorrectly depend on key->state only, but not key_data_ptr.

Without going into all the math details here, this definitely looks worse than what the original RC4 key schedule does as it is likely resulting in fewer possible permutations of key->state[]. We can assume that this has a downside on the quality of random numbers it generates.

Predictable Randomness

Finally, let’s take a look at the randomize_key(...) more closely:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
/* initialize an rc4 key. If possible a cryptographically secure random key
   is generated using a suitable function otherwise the code defaults to
   cross-platform albeit less secure mechanism using rand
*/
static void randomize_key(unsigned char* key,int key_data_len)
{
  int randomized = 0;
  int counter=0;
#ifdef WIN32
  BOOLEAN res;

  res = RtlGenRandom(key, key_data_len);
  if (res)
    randomized = 1;

#else /* !WIN32 */
#  ifdef CARES_RANDOM_FILE
  FILE *f = fopen(CARES_RANDOM_FILE, "rb");
  if(f) {
    setvbuf(f, NULL, _IONBF, 0);
    counter = aresx_uztosi(fread(key, 1, key_data_len, f));
    fclose(f);
  }
#  endif
#endif /* WIN32 */

  if (!randomized) {
    for (;counter<key_data_len;counter++)
      key[counter]=(unsigned char)(rand() % 256);  /* LCOV_EXCL_LINE */
  }
}

At first glance, this appears okay: On WIN32 systems, RtlGenRandom() is used to query key_data_len bytes of randomness from the OS and place them into key which is key->state[] in the caller init_id_key(...).

On non-WIN32 targets, we either open the file path of CARES_RANDOM_FILE and read key_data_len bytes from there or fall back to using rand() to get the same amount of random numbers. We assume that CARES_RANDOM_FILE is set to /dev/urandom or /dev/random for now.

An initial observation here is that the fallback relies on rand(3), which is not designed for generating cryptographically secure random numbers. Since it is only a fallback if nothing else works it is better than return 4;, but it would be a better choice to first try to use arc4random(3) on *BSD or getrandom(2) on Linux, just in case.

More concerning though, is the absence of any srand(3) in the whole source, which would seed the PRNG used by rand(3). Without it, rand(3) will output the same sequence of numbers every single time! This means that all our DNS query IDs will be fully predictable every time we end up in this fallback case.

Looking at the CARES_RANDOM_FILE case, it becomes evident that any error with fread(3) will silently fail and result again in rand(3) being used. While c-ares does the best it can in this case, it probably shouldn’t fail silently. At least a few sysadmins would want to know that their DNS queries are all predictable due to some configuration issue.

Compiled Non-Randomness

Finally, there is one more thing in the above code. Whenever CARES_RANDOM_FILE is not set, it automatically falls back to using rand(3) for seeding the RC4 PRNG. That this can be a problem becomes apparent when we look at the Autotools configure.ac file:

dnl Check for user-specified random device
AC_ARG_WITH(random,
AS_HELP_STRING([--with-random=FILE],
               [read randomness from FILE (default=/dev/urandom)]),
    [ CARES_RANDOM_FILE="$withval" ],
    [
        dnl Check for random device.  If we're cross compiling, we can't
        dnl check, and it's better to assume it doesn't exist than it is
        dnl to fail on AC_CHECK_FILE or later.
        if test "$cross_compiling" = "no"; then
          AC_CHECK_FILE("/dev/urandom", [ CARES_RANDOM_FILE="/dev/urandom"] )
        else
          AC_MSG_WARN([cannot check for /dev/urandom while cross compiling; assuming none])
        fi

    ]
)

As we can see, the existence of /dev/urandom is determined at compile time. This will likely break in cross-compile situations where this file does not exist on your build host. We’ll then always use the fallback case with the RC4 PRNG seeded by rand(3)! Luckily enough, c-ares also brings CMake as build system and this is used for example by the Yocto meta-oe recipe for c-ares, so not all is lost.

Nevertheless, the check for /dev/urandom’s existence in c-ares should probably be done during runtime instead of determining it at compile time.

Putting All Together

Combining all these issues, we now know that:

  • c-ares employs an RC4-based PRNG, which is no longer suitable for generating cryptographically secure random numbers.
  • Due to the bug in the key schedule, the resultant random numbers will likely be of worse quality than those of a standard RC4-based PRNG.
  • The fallback case will always generate the same sequence of DNS query IDs.
  • Cross-building c-ares without /dev/urandom on the host will result in the fallback case being used, even if /dev/urandom would be available on the target.

This means that DNS query IDs generated by c-ares are not fully random, raising the likelihood of query IDs becoming completely predictable. Consequently, an attacker’s search space for the tuple of source port and DNS query ID is smaller and makes it more likely to succeed. It basically brings us closer to the good old days of CVE-2008-1447 where source port randomization was not used by default. :-)

After reporting this issue, the c-ares maintainers published v1.19.1, which rectifies this problem (fix commit) and other recent vulnerabilities. So, better be sure to update c-ares to the latest version!


  1. RFC5452 lists all measures DNS implementations should take to be more resilient against response forgery. ↩︎

  2. Of course we also checked if the c-ares code base does anything special with respect to selecting the source port. ↩︎

  3. Shout out to DragonFlyBSD who also managed this just recently in 2023! ;-) ↩︎