Get the code on GitHub

Sorting and Searching

7.2 Complexity Analysis

To evaluate an algorithm, a common approach is to analyze its complexity. That is, we essentially count the number of steps involved in executing the algorithm.

An intuitive explanation of complexity analysis is the following. We caution you that our explanation is clearly an oversimplification, but it suffices for our purposes. Given a certain input size, assuming that to process a single element takes one unit of time, how many units of time are involved in processing n elements of input? This is essentially what complexity analysis attempts to answer. As can be seen, it is unnecessary to determine exactly the computer time involved in each step; instead, we simply determine the number of logical steps that occur in a given algorithm. In reality, we can have families of steps (say, one family is addition and subtraction, the other multiplication and division). We then count how many steps of each family are required.

A simple example will be to evaluate the complexity of computing a2+ab+b2 for a large number, say, n, of pairs of a and b. Computing directly, we should have 3n multiplications and 2n additions. However, we can compute the same expression using (a + b)2ab, which can be done in 2n multiplications and 2n additions (note that we consider addition and subtraction to be in the same family of steps). Thus, the second expression is better than the original. For very large values of n, this may make a significant difference in computation time. This is a very simple example, but it provides a background for our discussions of the complexity of sorting algorithms.

In complexity analysis, we forgo constants; thus, the distinctions between n and n – 1 in terms of complexity are nonexistent. More so, we often assume that all computations are of the same family of operations. In terms of our complexity analysis, it does not matter whether the list shrinks or grows; for simplicity, assume it shrinks.

Now consider the three presented comparison-based sorting algorithms. For all, the outer loop has n steps, and for the inner loop the size of the list shrinks by one with each pass. So the first time it takes n steps, the next time n – 1, the next time n – 2, and so on. Thus, the number of steps is:

n + (n – 1) + (n – 2) + ... + 3 + 2 = 2 + 3 + ... + (n – 2) + (n – 1) + n

If you add 1 to the rewriting of the sum, it becomes a well-known arithmetic series, and its total is n ( n + 1 ) 2 . So the total number of steps for these sorts is n ( n + 1 ) 2 - 1 .

Clearly, for any n > 0, n ( n + 1 ) 2 - 1 is less than n2; however, the complexity is still considered roughly n2. The official notation is O ( n 2 ) and is pronounced “on the order of” or “big oh.” The reason the complexity is O ( n 2 ) is because complexity is only an approximation, and clearly the dominant portion of n ( n + 1 ) 2 - 1 is n2. For our purposes, if you grasp the concept of the dominant portion to determine complexity, you are ahead of the game.

For the complexity analogies, cn2, where c is a constant, is considered O ( n 2 ) for any finite c. For actual computations, however, the value of c may be important. Again, refer to a book on complexity theory to understand, in detail, this important concept. For a reading list on algorithms and complexity, see Appendix A.

As an aside, order computation typically involves best-, average-, and worst-case analyses. For the selection and bubble sort algorithms presented, the best-, average-, and worst-case analyses are the same, since regardless of the initial ordering of the list, the processing is identical. As previously mentioned, there is an implementation of bubble sort that checks for no swapping with potential early termination. In such a case, the best-case analysis, which occurs when the initial list is already sorted, is O ( n ) .

Now let’s turn to insertion sort, which is somewhat trickier to analyze. Here we are finding the rightmost element at which point to insert the value. For an already sorted list, the rightmost element will occur immediately, and we will end up at only n steps! Thus, the best-case analysis for insertion sort is O ( n ) . However, if the list is precisely the opposite of sorted, namely, in descending order, we must process until the end of the list for each step. Thus, once again, we end up with n ( n + 1 ) 2 - 1 steps. Thus, the worst-case analysis for insertion sort is O ( n 2 ) . It turns out that the average-case analysis is likewise O ( n 2 ) .

The radix sort works in O ( dn ) , where d is the number of digits that must be processed and n is the number of entries that are to be sorted. Hence, it should run much faster than the other examples. It should be noted that other algorithms—quicksort, mergesort, and heapsort—all run in O ( n log ( n ) ) time. The radix sort might at first appear to be faster than these, but it depends on how many digits are processed. A 64-bit integer might require processing each bit as a digit. Hence, the runtime where d = 64 will be O ( 64 n ) . This might sound good, but n log(n) time will be faster where n < 264. So, in comparing the three sorts as presented, the average- and worst-case analyses for each on the comparison-based sorts is O ( n 2 ) , while the radix sort can vary from linear time (sorting values with a single bit) to an infinite amount of time, as the number of digits is theoretically not constrained.

7.3 Searching

Searching a list of names or numbers is another very common computer science task. There are many search algorithms, but the key in developing a search algorithm is to determine which type of candidate search process matches the particular need. The following are the two questions (parameters) that affect our search algorithm selection:

  • Is the list we are searching sorted or unsorted?

  • Are the searched list elements unique, or are there duplicate values within the list?

For simplicity, we illustrate the search process using only a unique element list. That is, our implementation assumes that there are no duplicate values. We then discuss what needs to be modified in the algorithm and corresponding Ruby implementation to support duplicate values. Given the level of programming sophistication you now possess, we forgo presenting the only slightly modified implementation that supports duplicate values and leave it as an exercise for you. Once again, we revisit the final exam grade example we used in the sections on sorting.

We now discuss two types of searches. The first is for an unsorted list called a linear search, and the second is for an ordered or sorted list; it is called a binary search.

7.4 Summary

We discussed the various sorting schemes, both comparison-based and non-comparison-based, and the strengths of each. We introduced the field of complexity analysis, a tool used to quantify the performance of an algorithm. Finally, we discussed searching and provided a real-world example of searching techniques.

Specifically, we described four elementary sorting algorithms. The first three presented—selection, insertion, and bubble sort—are comparison based. That is, elements are compared against one another to determine a sorted ordering. The fourth algorithm, radix sort, differs substantially. Instead of comparing elements, radix sort orders the elements according to their representation, starting from the rightmost digit. Once all digits of the element representation are processed, the list is sorted. The complexity of these sorting algorithms is presented in Table 7-1, where n represents the number of elements in the list and k represents the number of digits needed to represent the largest element. The best-case scenario occurs when the original list is already sorted; the worst-case scenario occurs when the original list is in reserve order; and the average-case scenario represents an original random ordering.

Table 7-1. Sort algorithm complexity summary
Best case

Selection sort

O ( n 2 ) O ( n 2 ) O ( n 2 )

Insertion sort

O ( n ) O ( n 2 ) O ( n 2 )
Bubble sort O ( n ) O ( n 2 ) O ( n 2 )
Radix sort O ( k n ) O ( k n ) O ( k n )

Likewise, we presented two searching algorithms: linear and binary search. Linear search can be used to search any list, whereas binary search requires a sorted list. The complexity of these searching algorithms is presented in Table 7-2, where n represents the number of elements in the list searched. The best-case scenario occurs when the first element encountered is the element sought; the worst-case scenario occurs when the sought after element is missing; and the average-case scenario represents a random list.

Table 7-2. Search algorithm complexity summary
Best case

Linear search

O ( 1 ) O ( n ) O ( n )

Binary search

O ( 1 ) O ( log ( n ) ) O ( log ( n ) )

7.4.1 Key Concepts

  • Sorting is a common problem that occurs in many places in computer science. We focus primarily on comparison-based sorting, where we simply compare the items to determine the order. Radix sort sorts numbers without directly comparing them.

  • Searching can be done naively by linearly searching through a list, but if the list is sorted, we can take advantage of binary search to improve performance.

  • When discussing algorithm performance, computer scientists use complexity analysis.

7.4.2 Key Definitions

  • Comparison-based sort: A sorting method that relies on directly comparing the elements.

  • Complexity analysis: A mathematical way of analyzing an algorithm’s performance.

7.5 Exercises

  1. Radix sort, as presented, works for integers. Modify the algorithm in Example 7-4 to sort English names.

  2. For each input sequence provided in the following list, state which presented comparison-based sort or sorts would require the fewest steps. Explain why.

    1. 5 2 4 3 1

    2. 1 2 3 4 5

    3. 5 4 3 2 1

  3. You are provided a lengthy unsorted list and told to search it.

    1. Which search algorithm would you use?

    2. If you were told that you will need to search the list many times, would your search strategy change? If so, how?

    3. At which point would you change your approach if you were to change it?

  4. The complexity of the comparison-based sorting algorithms presented, on the average case, is O ( n 2 ) . Design a comparison-based sorting algorithm with a lower complexity. What is the underlying premise that lowers its complexity?

  5. Generate a 100-element list containing integers in the range 0–10. Sort the list with selection sort and with radix sort.

    1. Which is faster? Why?

    2. Try this again, but with 10,000 elements. Note the relative difference. Why does it exist?

Sampler Contents: