Binary Search on Sorted Arrays¶
Chris Tralie¶
N objects in the data structure
Unordered array¶
Add: Make a new array (N+1), Copy over old elements and put new one in (N+1) Overall: ~2N + 2 steps
Remove: Find the element (N), Make a new array with (N-1), copy over everything except the one we're trying to remove (N) ~ 3N
Contains: Search through everything (N)
Ordered array¶
Add: Make a new array (N+1), Copy over old elements and put new one in (N+1) Overall: ~2N + 2 steps
Remove: $\log_2(N)$, Make a new array with (N-1), copy over everything except the one we're trying to remove (N) ~ 2N + $\log_2(N)$
Contains: Use binary search $\log_2(N)$
How do we actually do this in code? The code below shows one incorrect implementation that gives an infinite loop, followed by a correct implementation
In [1]:
import numpy as np
Bad Implementation with Infinite Loop!¶
In [2]:
def binarysearchbad(arr, target):
"""
There's a problem in this version!!
arr: ndarray(N)
An array of elements that are sorted
target: float
Element I'm looking for
Returns
-------
idx: int
Index where we found the element, or -1 if the
element isn't there
"""
lo = 0
hi = len(arr)-1
count = 0
while lo != hi and count < 10: # Keep going until we converge to 1 element
mid = (lo+hi)//2 # Round to the nearest integer below their average
## Problem: hi = lo+1 and arr[lo] = target
## mid = (lo + lo + 1)//2 = (2lo + 1)//2 = lo
## So we get stuck in an infinite loop!
## A good way to debug this is to restrict the number of loops
## (using this count variable) and print out the indices where we get stuck
print(lo, hi, mid)
if target < arr[mid]:
hi = mid-1
else:
lo = mid
count += 1
idx = -1
if arr[lo] == target:
idx = lo
return idx
Good Implementation!¶
In [3]:
def binarysearch(arr, target):
"""
arr: ndarray(N)
An array of elements that are sorted
target: float
Element I'm looking for
Returns
-------
idx: int
Index where we found the element, or -1 if the
element isn't there
NOTE: This version will always return the leftmost index
of a contiguous chunk where target occurs multiple times
"""
lo = 0
hi = len(arr)-1
## Proof of correctness: I'm always getting a smaller interval at every iteration
## until we get to the "base case" where lo=hi
while lo != hi: # Keep going until we converge to 1 element
mid = lo + (hi-lo)//2 # Round to the nearest integer below their average
if target > arr[mid]:
lo = mid + 1
else:
hi = mid
idx = -1
if arr[lo] == target:
idx = lo
return idx
Let's Test It!¶
In [4]:
np.random.seed(0)
arr = np.sort(np.random.randint(0, 100, 70))
print(arr)
for target in range(100):
idx = binarysearch(arr, target)
if idx > -1:
print(target, idx, arr[idx])
else:
print(target, idx)
[ 0 0 1 4 5 9 9 9 12 14 17 19 19 20 21 23 25 28 29 31 31 32 32 34 35 36 36 37 38 39 39 41 42 44 46 47 47 49 53 55 57 57 58 58 64 64 65 65 65 67 67 69 70 72 74 75 77 79 79 80 81 82 83 87 87 88 88 88 88 99] 0 0 0 1 2 1 2 -1 3 -1 4 3 4 5 4 5 6 -1 7 -1 8 -1 9 5 9 10 -1 11 -1 12 8 12 13 -1 14 9 14 15 -1 16 -1 17 10 17 18 -1 19 11 19 20 13 20 21 14 21 22 -1 23 15 23 24 -1 25 16 25 26 -1 27 -1 28 17 28 29 18 29 30 -1 31 19 31 32 21 32 33 -1 34 23 34 35 24 35 36 25 36 37 27 37 38 28 38 39 29 39 40 -1 41 31 41 42 32 42 43 -1 44 33 44 45 -1 46 34 46 47 35 47 48 -1 49 37 49 50 -1 51 -1 52 -1 53 38 53 54 -1 55 39 55 56 -1 57 40 57 58 42 58 59 -1 60 -1 61 -1 62 -1 63 -1 64 44 64 65 46 65 66 -1 67 49 67 68 -1 69 51 69 70 52 70 71 -1 72 53 72 73 -1 74 54 74 75 55 75 76 -1 77 56 77 78 -1 79 57 79 80 59 80 81 60 81 82 61 82 83 62 83 84 -1 85 -1 86 -1 87 63 87 88 65 88 89 -1 90 -1 91 -1 92 -1 93 -1 94 -1 95 -1 96 -1 97 -1 98 -1 99 69 99
In [ ]: