Escher Technologies Articles on Formal Verification>

Verifying pointer arithmetic

Today I’ll look at whether code that uses pointer arithmetic is any harder to verify than equivalent code that does not use pointer arithmetic.

Consider this function for copying an array (or part of an array) into another array (or part of another array):

void arrayCopy(const int* src, int* dst, size_t num) {
  size_t i;
  for (i = 0; i < num; ++i) {
    dst[i] = src[i];
  }
}

To verify it, we first need to specify what it is supposed to achieve in a postcondition and add any necessary preconditions. Then we need to add a loop invariant so that eCv can verify that the loop achieves the desired state when it terminates, and a loop variant so that eCv can prove that the loop terminates:

void arrayCopy(const int* array src, int* array dst, size_t num)
writes(dst.all)
pre(src.lim >= num; dst.lim >= num)
pre(disjoint(src.all, dst.all))
post(forall i in 0..(num - 1) :- dst[i] == src[i])
{ size_t i;
  for (i = 0; i < num; ++i)
  keep(i <= num)
  keep(forall j in 0..(i - 1) :- dst[j] == src[j])
  decrease(num - i)
  { dst[i] = src[i];
  }
}

This is fairly straightforward. As usual, the main loop invariant (the keep(forall…) clause) is a generalization of the desired postcondition, and we need another invariant (the keep(i <= num) clause) to ensure that the main invariant and the body don’t violate the preconditions of the indexing operator. These invariants and the loop variant (the decrease(…) clause) are sufficient for eCv to verify this function.

Many C programmers would write the body as *dst++ = *src++; instead of explicitly indexing dst and src. If we make this change, then because we are changing src and dst with in the loop, we need to describe their values in the invariant. Here is how we can do that:

void arrayCopy(const int* array src, int* array dst, size_t num)
writes(dst.all)
pre(src.lim >= num; dst.lim >= num)
pre(disjoint(src.all, dst.all))
post(forall i in 0..(num - 1) :- dst[i] == src[i])
{ size_t i;
  for (i = 0; i < num; ++i)
  keep(i <= num)
  keep(forall j in 0..(i - 1) :- (old dst)[j] == (old src)[j])
  keep(src == (old src) + i; dst == (old dst) + i)
  decrease(num - i)
  { *dst++ = *src++;
  }
}

Applying the keyword old to an expression in a loop invariant or loop variant yields the value of that expression just before the first iteration of the loop. I’ve had to replace src in the second loop invariant by old src and similarly for dst, because in that invariant I am referring to array elements computed relative to the original values of src and dst. I’ve described the modification to src and dst by adding a third invariant, that says that at the start and end of any iteration, src has its original value plus i, and similarly for dst. The post-increment operators applied to src and dst in the body ensure that these relationships are maintained. Once again, the function is verifiable by eCv.

What if we go further and get rid of the loop counter i as well? We can do that if we precompute the end pointer src + num at the start, and then iterate until src reaches this point. We still need to describe the values of src and dst in the invariant, but we must do so without the benefit of i. Here’s one way of doing it:

void arrayCopy(const int* array src, int* array dst, size_t num)
writes(dst.all)
pre(src.lim >= num; dst.lim >= num)
pre(disjoint(src.all, dst.all))
post(forall i in 0..(num - 1) :- dst[i] == src[i])
{ const int* array const srcEnd = src + num;
  while(src != srcEnd)
  keep(src.base == old(src.base))
  keep(0 <= src - (old src))
  keep(src - (old src) <= n) 
  keep(forall j in 0..((src - (old src)) - 1) :- (old dst)[j] == (old src)[j]) 
  keep(dst == (old dst) + (src - (old src))) 
  decrease(srcEnd - src)
  { *dst++ = *src++;
  }
}

Here’s how I arrived at the new loop annotations:

I’ve taken the three keep-clauses from the previous version and substituted (src – (old src)) for the loop counter i. That’s how I got the 3rd, 4th and 5th keep-clauses. For the last of the three, I also removed the resulting tautology src == (old src) + (src – (old src)), leaving just the component that talks about dst.
The original first invariant keep(i <= n) placed an upper bound on i, thereby ensuring that src[i] and dst[i] in the original code or *src and *dst in the modified code were in bounds. I didn’t need to put a lower bound on i because its type was size_t which is an unsigned type, so its lower bound was zero implicitly. But I’ve now replaced i by (src – (old src)), which has no such constraint. So I need to put a zero lower bound on this expression, which is what the second keep-clause is for.
Unfortunately, this isn’t quite enough. The expression src – old src uses the pointer-difference operator, which has the precondition that the operands are pointers into the same array. In general, when we re-assign an array pointer such as src, there is no requirement that it points into the same array as before. In this case, all we ever do to src is increment it; so it obviously continues to point into the same array as long as it is not over-incremented (which eCv verifies). However, eCv isn’t yet clever enough to spot this and generate an implicit loop invariant. So I’ve added the first invariant src.base == old(src.base) right at the beginning of the loop. In the expression src.base, base is a ghost field of src, just like lim and lwb. It subtracts src.lwb from src, yielding a pointer to the very start of the array that src points into. So to state that two array pointers point into the same array, I just need to say that their bases are equal.
Rather than just substitute for i in the loop variant, I’ve changed it to say that I expect the difference between srcEnd and src to decrease on each iteration.

I hope this example demonstrates to you that verification of C code that uses pointer arithmetic can be done, although it is likely to require more loop invariants than equivalent code that does not use pointer arithmetic, and getting them right may be a little harder. The number of verification conditions to be proved may be higher too – there were 24 from the original code, 32 from the second version, and 35 from the last version. So indexing should generally be preferred to explicit pointer arithmetic – as recommended by MISRA-C 2004 rule 17.4 – when writing verifiable code.

Finally, we could implement this function using memcpy:

void arrayCopy(const int* array src, int* array dst, size_t num)
writes(dst.all)
pre(src.lim >= num; dst.lim >= num)
pre(disjoint(src.all, dst.all))
pre(num * sizeof(int) <= maxof(size_t))
post(forall i in 0..(num - 1) :- dst[i] == src[i])
{ memcpy(dst, src, num * sizeof(int));
}

I’ve had to add another precondition to ensure that the computation of the number of bytes num * sizeof(int) doesn’t overflow a size_t. If eCv were to assume that the number of bytes in any array cannot exceed maxof(size_t) – as is the case for many implementations – then it could infer that dst.lim <= maxof(size_t)/sizeof(int) and we would not need the precondition. Unfortunately, the C standard makes no such promise.

The call to memcpy involves implicit conversions of its first two parameters from int* to void* and from const int* to const void*. In general, pointer conversions should be avoided when writing for eCv, because such conversions break eCv’s strong type model. However, eCv knows enough about the semantics of memcpy to prove the 10 verification conditions generated by this version too.

List of Articles

TOP