Tuesday, 1 April 2014

Week 11: Sorting and Efficency

In this final week, we will review the concepts of efficiency that were introduced in the previous post, Preliminary Algorithm Analysis, and apply them to sorting.  Assume that we have a list of prime numbers in non-ascending order:

[3, 5, 11, 7, 13, 2, 17]

Our goal is to rearrange the numbers in ascending order.  For us humans, the list above takes little time to sort.  However, what if we have to sort a list of a million numbers that are in non-ascending order?  We can make some computer do it for us with one of the sorting algorithms: bubble, selection, insertion, merge and quick sort.  Perhaps you are wondering why we have so many sorting algorithms.  The answer is that every one of them has a unique sorting strategy.  However, in the world of efficiency, not every sorting algorithm is created equal.  We can compare the sorting algorithms to see their efficiencies but first some review.  In the previous post, we discussed how an algorithm’s efficiency is determined by its memory usage and time complexity.   Time complexity is how the size of the input influences the number of steps that it takes to accomplish its task.  We also became acquainted with the Big-O, which is the notation that represents time complexity.  For example, an algorithm whose number of steps increases by n log n as the input increases by n has a time complexity of O(n log n).  Additionally, we deliberated on how algorithms vary in efficiency depending on the input and how this leads to an algorithm’s best, average and worst-case complexity, which correspond to an algorithm’s maximum, average and minimum efficiencies.  With these tools, we can now compare the algorithms:

Algorithm
Similarities in Characteristics
Differences in Characteristics
Bubble Sort
Brute Force Algorithms:  Let n  be the number of elements in the list.  Bubble, selection and insertion sort tackle the problem of unsorted lists head on by performing n outer iterations of the list and less than n inner iterations for every outer iteration.

Average and Worst-Case Complexities:  Bubble, insertion and selection sort perform at average efficiency if the input is a randomized list; they perform at minimum efficiency if the input is a sorted list in reverse.  In both cases, they perform outer iterations of the list and inner iterations for every outer iteration.  This causes them to have an average and worst-case complexities of O(n ** 2).

Best-Case Complexities: Bubble and insertion sort perform at maximum efficiency if the input is a sorted list.  They do not have to perform inner iterations and therefore, their best-case complexity is O(n).
Outer Iterations:  Unlike selection and insertion sort, bubble sort does not traverse the list in the outer iterations.  Instead, the outer iterations act as a brake that only terminates if the list is sorted.

Inner Iterations:  It is here where bubble sort traverses the list.  Bubble sort switches the current element with the next element in the list if they are not in order.

Too Many Switches:  A single switch rarely places the elements in the correct position the first time so bubble sort usually makes additional switches.

Selection Sort
Outer Iterations:  Selection sort switches the current element with the minimum element.

Inner Iterations: It is here where selection sort finds the minimum element.  It does so by iterating through the sublist that is to the right of the current element.  

Fewer Switches:  Every switch of the current and minimum element guarantees that the minimum element is in the correct position.

Best-Case Complexity:  Selection sort performs at maximum efficiency if the input is a sorted list.  However, the design of the algorithm prevents it from terminating early if the list is sorted.  Therefore, it has a best-case complexity of O(n ** 2).

Insertion Sort
Outer Iterations:  Insertion sort determines if the current element is greater than the previous element.  If not, then it has to determine the correct position of the current element.

Inner Iterations:  It is here where insertion sort determines the correct position of the current element.  It does so by iterating through the sorted sublist that is to the left of the current element.

Inserts:  Unlike bubble and selection sort, insertion sort does not switch the positions of two elements.  Instead, it inserts the current element into the correct position, which is determined by the inner iterations.


You are probably wondering why I left out merge and quick sort.  Merge and quick sort are too dissimilar to bubble, selection and insertion and sort.  It becomes clear why this is so when we compare merge and quick sort in a new table:

ALGORITHM
SIMILARITIES IN CHARACTERISTICS
DIFFERENCES IN CHARACTERISTICS
Merge Sort
Divide and Conquer Algorithms:  Merge and quick sort tackle the problem of unsorted lists by recursively partitioning the list until the sublists are of length one.

Best and Average-Case Complexities:  Let n be the number of elements in the list.  The number of partitions that they perform is usually either log n or approaches log n.  For every partition, they traverse through the list to compare the elements and rearrange them.  This gives them a best and average-case complexities of O(n log n).
Partitioning of List:  Merge sort partitions the list down the middle into equal or nearly equal halves.

Rearranging of Elements:  Unlike quick sort, merge sort rearranges the elements after the partitioning is complete.  This rearranging works by comparing the elements from the elements from one sorted sublist with the elements from the other sorted sublist and combining them into a sorted list.

Worst-Case Complexity:  Merge sort performs at minimum efficiency if the input causes it to make the maximum number of comparisons.  However, the number of partitions is still at or close to log n.  Therefore, its worst-case complexity is still O(n log n).

Quick Sort
Partitioning of List:  Quick sort partitions the list by placing every element that is less than the first element, which is called the pivot, into the left sublist and every element that is greater than the pivot into the right sublist.  The left and right sublists are usually of unequal lengths but quick sort requires less memory than merge sort so it is generally more efficient for large inputs.

Rearranging of Elements:  Unlike merge sort, quick sort rearranges the elements with every partition so that by the time it partitions the list into one element sublists, the union of the sublists is the sorted list.

Worst-case Complexity:  Quick sort performs at minimum efficiency if the input is a sorted list or a sorted list in reverse.  This causes quick sort to partition the list into sublists such that either the left or right sublists have a length of one for every partition.  In that case, it performs n partitions.  Since quick sort compares n elements for every partition, then its worst-case complexity is O(n ** 2).

So those are the five sorting algorithms.  Bubble, selection and insertion sort are brute force sorting algorithms; they attack the problem of unsorted lists head-on with multiple traversals without any thoughts about how inefficient they can be.  In contrast, merge and quick sort are divide and conquer sorting algorithms; they divide the problem of unsorted lists into smaller unsorted lists before conquering them.  To help with my general understanding of the sorting algorithms, I like to use an analogy from Roman times.  You should try this too.  When I think of bubble, selection and insertion sort iterating through lists, I think of the barbarian tribes attacking the ailing Roman Empire head-on without any thoughts about the destruction that they may cause.  When I think of merge and quick sort partitioning lists, I think of the Roman general and politician, Julius Caesar, dividing the Gallic Tribes before conquering them one-by-one in the Gallic Wars.  Perhaps you find this strange.  If so, then how do represent the sorting algorithms in your mind?

Barbarians After Sacking Rome (410 CE)


Julius Caesar, the Divider and Conqueror of the Gauls (100 - 44 BCE)

7 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. In my opinion, for bubble sort, insertion sort and selection sort, because they have same complexity of O(n^2) for their worst cases, so they have the same efficiency. I think efficiency is quite same as time complexity. What do you think?

    ReplyDelete
  3. I am afraid I disagree Lucas. I believe that efficiency is determined by memory usage and time complexity. So time complexity is not equivalent to efficiency. Additionally, although bubble, insertion and selection have the same worst-case time complexity, bubble sort still tops the three for being the least efficient. I believe that the number of iterations of bubble sort is still greater, even though the tightest bounds is still O(n^2).

    ReplyDelete
  4. See Lucas' comments section in the post Sorting and Efficiency for the continuation of this argument:

    http://tianxia2014.blogspot.ca/2014/04/week-10-sorting-and-efficiency.html

    ReplyDelete
  5. time complexity definitely varies for each algorithm, but that is dependent mostly on how sorted the numbers are and how big the input is.
    i think that in order to fully appreciate the efficiency of any of the above algorithms, we will have to make a lot of assumptions regarding the properties of the input data - which are not always representative of reality.

    ReplyDelete
  6. Hmm you are right. The best and worst cases are very rare... Well for academic purposes, I think that these assumptions are the only way to analyze these algorithms under extreme inputs (which as you say, is not representative of reality).

    ReplyDelete
  7. Whoa! I just came across this slog looking for something interesting and different... and that is some amazing chart you have there. I agree with the above posts... just like with many other disciplines of science (ex. physics), many assumptions have to be made about the "world" in order to learn anything about the behaviour of things.

    ReplyDelete