Binary Search#
Binary search finds a target value in a sorted array by repeatedly halving the search range. It’s one of the first algorithms where the loop invariant does real work — without it, there’s no way to see why the algorithm is correct.
2.1 A First Attempt (No Spec)#
Here’s the basic algorithm without any specification:
method BinarySearch(a: array<int>, target: int) returns (index: int)
{
var lo := 0;
var hi := a.Length - 1;
while lo <= hi
{
var mid := (lo + hi) / 2;
if a[mid] == target {
return mid;
} else if a[mid] < target {
lo := mid + 1;
} else {
hi := mid - 1;
}
}
return -1;
}
This compiles and runs fine. But Dafny can’t verify anything about it — we haven’t told it what the method is supposed to do, and we haven’t told it why the loop makes progress toward that goal. Right now this is just code with no guarantees.
2.2 What Should the Spec Say?#
Before writing invariants, we need to decide what BinarySearch promises to its caller. There are two cases:
- Found: If we return a valid index, then
a[index] == target. - Not found: If we return
-1, then the target doesn’t appear anywhere in the array.
In Dafny:
method BinarySearch(a: array<int>, target: int) returns (index: int)
ensures 0 <= index < a.Length ==> a[index] == target
ensures index == -1 ==> forall k :: 0 <= k < a.Length ==> a[k] != target
But there’s something missing. Try to verify this — Dafny will reject it, even if we add the right invariants. The reason: the algorithm assumes the array is sorted. When a[mid] < target, we discard everything to the left of mid — but that’s only valid if we know those elements are also smaller than target. Without sortedness, that reasoning is wrong.
So we need a precondition:
requires forall i, j :: 0 <= i < j < a.Length ==> a[i] <= a[j]
This says: for every pair of indices \(i < j\), we have \(a[i] \leq a[j]\). That’s the definition of a sorted (non-decreasing) array.
2.3 The Loop Invariant#
Now the hard part. We need a loop invariant that:
- Is true when the loop starts.
- Is preserved by each iteration.
- Implies the postconditions when the loop ends.
The key insight is: at every point in the algorithm, everything outside the range lo..hi has been ruled out. We’ve already checked those elements (directly or via sortedness) and they can’t be the target.
We can state this as two invariants:
invariant forall k :: 0 <= k < lo ==> a[k] != target
invariant forall k :: hi < k < a.Length ==> a[k] != target
The first says: everything to the left of lo isn’t the target. The second says: everything to the right of hi isn’t the target.
Let’s check the three requirements:
True at the start. Initially lo == 0 and hi == a.Length - 1. The range 0 <= k < 0 is empty, and the range a.Length - 1 < k < a.Length is also empty. Both invariants hold vacuously.
Preserved by each iteration. Consider the case where a[mid] < target. Because the array is sorted and a[mid] < target, every element at index <= mid is also <= a[mid] < target. So none of them are the target. Setting lo := mid + 1 is safe — the first invariant still holds with the new, larger lo. The case where a[mid] > target is symmetric: everything at index >= mid is too large, so setting hi := mid - 1 preserves the second invariant.
Implies the postconditions. When the loop exits, lo > hi. The first invariant covers indices 0 through lo - 1. The second covers indices hi + 1 through a.Length - 1. Since lo > hi, these two ranges cover every index. So the target isn’t anywhere in the array, and returning -1 is correct.
We also need bounds invariants to keep Dafny happy:
invariant 0 <= lo <= a.Length
invariant -1 <= hi < a.Length
These ensure lo and hi stay in range so the forall invariants make sense.
2.4 The Complete Verified Method#
method BinarySearch(a: array<int>, target: int) returns (index: int)
requires forall i, j :: 0 <= i < j < a.Length ==> a[i] <= a[j]
ensures 0 <= index < a.Length ==> a[index] == target
ensures index == -1 ==> forall k :: 0 <= k < a.Length ==> a[k] != target
{
var lo := 0;
var hi := a.Length - 1;
while lo <= hi
invariant 0 <= lo <= a.Length
invariant -1 <= hi < a.Length
invariant forall k :: 0 <= k < lo ==> a[k] != target
invariant forall k :: hi < k < a.Length ==> a[k] != target
{
var mid := (lo + hi) / 2;
if a[mid] == target {
return mid;
} else if a[mid] < target {
lo := mid + 1;
} else {
hi := mid - 1;
}
}
return -1;
}
Notice the structure: the invariants are the postconditions, restricted to the current search range. This is the same pattern we saw with Max in Chapter 1 — the invariant says “we’ve established the postcondition for the portion of the array we’ve examined so far.” The loop widens that portion until it covers everything.
2.5 Exercises#
Exercise 1
What goes wrong without the sortedness precondition?
Consider the array [5, 1, 3, 7, 2] and target = 1. The middle element is a[2] = 3 > 1, so binary search discards the right half and searches [5, 1]. Then the middle element is a[0] = 5 > 1, so it discards the right half again and searches [5]. It returns -1 — but 1 is in the array at index 1.
The invariant forall k :: hi < k < a.Length ==> a[k] != target breaks on the very first iteration: we set hi := 1, but a[4] = 2 != target only by coincidence — the reasoning “everything right of mid is >= a[mid] > target” doesn’t hold without sortedness.
Exercise 2
Modify BinarySearch to return the leftmost occurrence of target (i.e., the smallest index i such that a[i] == target). How do the invariants change?
Hint: When a[mid] == target, don’t return immediately. Instead, set hi := mid - 1 to keep searching the left half. You’ll need a variable to track the best candidate found so far.
method BinarySearchLeft(a: array<int>, target: int) returns (index: int)
requires forall i, j :: 0 <= i < j < a.Length ==> a[i] <= a[j]
ensures 0 <= index < a.Length ==>
a[index] == target && forall k :: 0 <= k < index ==> a[k] != target
ensures index == -1 ==> forall k :: 0 <= k < a.Length ==> a[k] != target
{
var lo := 0;
var hi := a.Length - 1;
var best := -1;
while lo <= hi
invariant 0 <= lo <= a.Length
invariant -1 <= hi < a.Length
invariant forall k :: 0 <= k < lo ==> a[k] != target
invariant forall k :: hi < k < a.Length ==> a[k] != target
invariant best == -1 ==> forall k :: 0 <= k < lo ==> a[k] != target
invariant 0 <= best < a.Length ==> a[best] == target
invariant 0 <= best < a.Length ==> forall k :: 0 <= k < lo ==> a[k] != target
{
var mid := (lo + hi) / 2;
if a[mid] == target {
best := mid;
hi := mid - 1; // keep searching left for an earlier occurrence
} else if a[mid] < target {
lo := mid + 1;
} else {
hi := mid - 1;
}
}
index := best;
}
The key change: when a[mid] == target, we record mid in best but set hi := mid - 1 to continue searching the left half. The invariant forall k :: 0 <= k < lo ==> a[k] != target still holds — we haven’t changed lo. And we’ve saved the found index in best, so we won’t lose it.
When the loop ends, lo > hi, so the invariant on lo covers everything up to best (since lo will have advanced past all indices left of the leftmost occurrence). This guarantees best is the leftmost index where a[best] == target.
Exercise 3
Why do we need two ensures clauses instead of one like ensures index != -1 <==> target in a[..]?
Try writing the spec this way. It’s actually a valid spec — but it’s weaker. It tells you whether the target is present, but in the found case, it doesn’t tell you where. The caller gets back an index but has no guarantee that a[index] == target. Our two-clause spec gives strictly more information.
Exercise 4
Binary search is \(O(\log n)\). Can you express this as a decreases clause in Dafny?
Add decreases hi - lo to the loop:
while lo <= hi
invariant 0 <= lo <= a.Length
invariant -1 <= hi < a.Length
invariant forall k :: 0 <= k < lo ==> a[k] != target
invariant forall k :: hi < k < a.Length ==> a[k] != target
decreases hi - lo
{
var mid := (lo + hi) / 2;
if a[mid] == target {
return mid;
} else if a[mid] < target {
lo := mid + 1;
} else {
hi := mid - 1;
}
}
Why does hi - lo decrease in every branch? In the a[mid] < target branch, lo increases to mid + 1, so hi - lo decreases. In the a[mid] > target branch, hi decreases to mid - 1, so hi - lo decreases. In the a[mid] == target branch, we return immediately, so the loop doesn’t continue.
Note that decreases doesn’t directly express \(O(\log n)\) — it only proves termination. But the connection is there: hi - lo starts at \(n - 1\) and roughly halves each iteration (since mid is the midpoint), so it reaches 0 after about \(\log_2 n\) steps. Dafny doesn’t reason about runtime complexity, but the decreases clause captures the same quantity that the complexity analysis uses.