Binary Search on Sorted Arrays¶

Chris Tralie¶

N objects in the data structure

Unordered array¶

Add: Make a new array (N+1), Copy over old elements and put new one in (N+1) Overall: ~2N + 2 steps

Remove: Find the element (N), Make a new array with (N-1), copy over everything except the one we're trying to remove (N) ~ 3N

Contains: Search through everything (N)

Ordered array¶

Add: Make a new array (N+1), Copy over old elements and put new one in (N+1) Overall: ~2N + 2 steps

Remove: $\log_2(N)$, Make a new array with (N-1), copy over everything except the one we're trying to remove (N) ~ 2N + $\log_2(N)$

Contains: Use binary search $\log_2(N)$

How do we actually do this in code? The code below shows one incorrect implementation that gives an infinite loop, followed by a correct implementation

In [1]:
import numpy as np

Bad Implementation with Infinite Loop!¶

In [2]:
def binarysearchbad(arr, target):
    """
    There's a problem in this version!!
    
    arr: ndarray(N)
        An array of elements that are sorted
    target: float
        Element I'm looking for

    Returns
    -------
    idx: int
        Index where we found the element, or -1 if the
        element isn't there
    """
    lo = 0
    hi = len(arr)-1
    count = 0
    while lo != hi and count < 10: # Keep going until we converge to 1 element
        mid = (lo+hi)//2 # Round to the nearest integer below their average
        ## Problem: hi = lo+1 and arr[lo] = target
        ## mid = (lo + lo + 1)//2 = (2lo + 1)//2 = lo
        ## So we get stuck in an infinite loop!
        ## A good way to debug this is to restrict the number of loops
        ## (using this count variable) and print out the indices where we get stuck
        print(lo, hi, mid)
        if target < arr[mid]:
            hi = mid-1
        else:
            lo = mid
        count += 1
    idx = -1
    if arr[lo] == target:
        idx = lo
    return idx

Good Implementation!¶

In [3]:
def binarysearch(arr, target):
    """
    arr: ndarray(N)
        An array of elements that are sorted
    target: float
        Element I'm looking for

    Returns
    -------
    idx: int
        Index where we found the element, or -1 if the
        element isn't there
        NOTE: This version will always return the leftmost index
        of a contiguous chunk where target occurs multiple times
    """
    lo = 0
    hi = len(arr)-1
    ## Proof of correctness: I'm always getting a smaller interval at every iteration
    ## until we get to the "base case" where lo=hi
    while lo != hi: # Keep going until we converge to 1 element
        mid = lo + (hi-lo)//2 # Round to the nearest integer below their average
        if target > arr[mid]:
            lo = mid + 1
        else:
            hi = mid
    idx = -1
    if arr[lo] == target:
        idx = lo
    return idx

Let's Test It!¶

In [4]:
np.random.seed(0)
arr = np.sort(np.random.randint(0, 100, 70))
print(arr)
for target in range(100):
    idx = binarysearch(arr, target)
    if idx > -1:
        print(target, idx, arr[idx])
    else:
        print(target, idx)
[ 0  0  1  4  5  9  9  9 12 14 17 19 19 20 21 23 25 28 29 31 31 32 32 34
 35 36 36 37 38 39 39 41 42 44 46 47 47 49 53 55 57 57 58 58 64 64 65 65
 65 67 67 69 70 72 74 75 77 79 79 80 81 82 83 87 87 88 88 88 88 99]
0 0 0
1 2 1
2 -1
3 -1
4 3 4
5 4 5
6 -1
7 -1
8 -1
9 5 9
10 -1
11 -1
12 8 12
13 -1
14 9 14
15 -1
16 -1
17 10 17
18 -1
19 11 19
20 13 20
21 14 21
22 -1
23 15 23
24 -1
25 16 25
26 -1
27 -1
28 17 28
29 18 29
30 -1
31 19 31
32 21 32
33 -1
34 23 34
35 24 35
36 25 36
37 27 37
38 28 38
39 29 39
40 -1
41 31 41
42 32 42
43 -1
44 33 44
45 -1
46 34 46
47 35 47
48 -1
49 37 49
50 -1
51 -1
52 -1
53 38 53
54 -1
55 39 55
56 -1
57 40 57
58 42 58
59 -1
60 -1
61 -1
62 -1
63 -1
64 44 64
65 46 65
66 -1
67 49 67
68 -1
69 51 69
70 52 70
71 -1
72 53 72
73 -1
74 54 74
75 55 75
76 -1
77 56 77
78 -1
79 57 79
80 59 80
81 60 81
82 61 82
83 62 83
84 -1
85 -1
86 -1
87 63 87
88 65 88
89 -1
90 -1
91 -1
92 -1
93 -1
94 -1
95 -1
96 -1
97 -1
98 -1
99 69 99
In [ ]: