David

Reasoning about null-terminated strings in C/C++

I described above how eCv supports reasoning about array access, by allowing you to refer to the bounds of an array in a specification. If the code itself needs to know the size of an array, then the size is provided by other means, for example by passing it as an extra parameter. However, when using arrays of characters, standard practice in C is not to pass the number of elements, but to use a null character to indicate the end. Indeed, C and C++ string literals are compiled to static arrays of characters with a terminating null; and many standard C library functions assume that their string arguments are null-terminated.

There are dangers here. Consider the following code:

#include "string.h"
void foo(const char* msg)
{ char buf[20];
     strncpy(buf, msg, 25);
     ...
     ... strlen(buf)...
}

The programmer thinks he has protected against overflowing buf by using strncpy instead of strcpy. However, a buffer size of 25 has been passed to strncpy, when the buffer is only 20 characters long. Maybe the buffer used to be 25 characters but was subsequently changed to 20. Using “magic numbers” like 20 and 25 in code is bad practice. We should instead declare const int fooBuflen = 20; and then use fooBuflen wherever we need the length of the buffer, thereby avoiding this kind of error.

Even when we do correct this error, there is another problem. strncpy doesn’t guarantee to include a null terminator when writing to the destination. So the subsequent strlen call can end up reading well beyond the end of buf, resulting in undefined behaviour.

Here’s the same example annotated for eCv:

#include "ecv.h"
#include "ecv_string.h"
#include "string.h"
void foo(const char* array msg)
pre(isNullTerminated(msg))
{ char buf[20];
     strncpy(buf, msg, 25);
     ...
     ... strlen(buf)...
}

As well as specifying that msg is an array, I’ve included an extra header file ecv_string.h and added precondition isNullTerminated(msg). Here’s part of file ecv_string.h:

ghost bool isNullTerminated(const char* array src)
post(result == (exists i in 0..src.upb :- src[i] == (char)0));

We’ve provided the function isNullTerminated, so that in specifications we can tell whether a char* array pointer addresses a null-terminated string. It’s declared ghost to indicate that it has no code, so it can only be called from specifications. Its postcondition states that it returns true if and only if there is some index in the range 0 to the upper bound of src such that the corresponding element of src is a null character. That’s what the exists expression means (the “:-” symbol can be read as “such that”).

Here’s another extract from ecv_string.h:

spec size_t strlen(const char* array src)
pre(isNullTerminated(src))
post(result < src.lim)
post(src[result] == (char)0)
post(forall i in 0..(result - 1) :- src[i] != (char)0);

spec char* array strncpy(char* array dst, const char* array src, size_t len)
pre(isNullTerminated(src))
pre(len <= dst.lim)
post(forall i in 0..(len - 1)
       :- dst[i] == (i <= strlen(src) ? src[i] : old dst[i]))
post(result == dst);

These provide eCv with specifications of the standard library functions strlen and strncpy. The spec keyword tells eCv that this is the specification for a function declared elsewhere in un-annotated C/C++, so that it doesn’t complain about conflicting declarations. We’ve specified in preconditions that both functions require null-terminated source parameters, and that the destination parameter of strncpy must be an array of at least len bytes. We’ve also provided postconditions to described the effect of calling these functions, so that eCv can reason about the program state after they are called. The forall expression is similar to the exists expression described earlier, except that it returns true if the condition after the “:-” symbol is true for all the values in the specified range.

When we run eCv over the annotated example, it reports two verification problems as expected:

c:\escher\ecvtest\ecvtest6.c (15,5): Warning! Unable to prove:
     Precondition of ‘strncpy’ satisfied
        (defined at c:\escher\ecvtest\ecv_string.h (16,14))
        (see c:\escher\ecvtest\ecvtest6_unproven.htm#17),
	    cannot prove: 25 <= char* array {buf$uninitialised$}.lim().

c:\escher\ecvtest\ecvtest6.c (16,18): Warning! Unable to prove:
     Precondition of ‘strlen’ satisfied
       (defined at c:\escher\ecvtest\ecv_string.h (8,10))
       (see c:\escher\ecvtest\ecvtest6_unproven.htm#18),
	   cannot prove: isNullTerminated(char const* array {buf$15,5$}).

If the expressions in the warning messages look strange, that’s because when I wrote this in February 2010 Escher C Verifier was in a pre-release stage, and the expressions were being produced out in a mixture of C and Perfect syntax. We did tidy these up before we released eCv in 2011!

ARTICLES   List of Articles     TOPTOP