Friday, April 4, 2014

Sort and Efficiency

      At the end of this semester, we focused on  several sorting methods such as quick sort, merge sort, selection sort and so on, and their efficiency on different kinds of cases, which is related in Big-O notation.

      Therefore, what is Big-O notation? Big-O notation is used to describe a limit behavior of a function, or in other words, it is an upper-bound of a function. For two functions f(x) and g(x) which are subsets of real number, f(x) = O(g(x)) as x goes to infinity if and only if there is a positive real number M such that f (x) <= Mg(x) for all x. For example, f(x) = 4x^2 + 6x + 7 belongs to O(g(x)) where g(x) = 5x^2 for all x(real numbers).

      Why studying the efficiency of sorting methods important especially the worst case? This is because in the real world, some insufficient sorting methods may take so long to compute which is meaningless. Hence, computing the efficiency of the program before applying it is very crucial.

      Here is a table from Wikipedia about the Big-O notation of each sorting method:
     
NameBestAverageWorstMemoryStableMethodOther notes
Quicksortn \log nn \log nn^2\log n on average, worst case is n; Sedgewick variation is \log n worst casetypical in-place sort is not stable; stable versions existPartitioningQuicksort is usually done in place with O(log n) stack space.[citation needed] Most implementations are unstable, as stable in-place partitioning is more complex. Naïve variants use an O(n) space array to store the partition.[citation needed] Quicksort variant using three-way (fat) partitioning takes O(n) comparisons when sorting an array of equal keys.
Merge sortn \log nn \log nn \log nn worst caseYesMergingHighly parallelizable (up to O(log n) using the Three Hungarian's Algorithm[clarification needed] or, more practically, Cole's parallel merge sort) for processing large amounts of data.
In-place merge sortn \log^2 n1YesMergingCan be implemented as a stable sort based on stable in-place merging.[2]
Heapsortn \log nn \log nn \log n1NoSelection
Insertion sortnn^2n^21YesInsertionO(n + d),[clarification needed] where d is the number of inversions.
Introsortn \log nn \log nn \log n\log nNoPartitioning & SelectionUsed in several STL implementations.
Selection sortn^2n^2n^21NoSelectionStable with O(n) extra space, for example using lists.[3]
Timsortnn \log nn \log nnYesInsertion & MergingMakes n comparisons when the data is already sorted or reverse sorted.
Shell sortnn \log^2 n
or
n^{3/2}
Depends on gap sequence;
best known is n \log^2 n
1NoInsertionSmall code size, no use of call stack, reasonably fast, useful where memory is at a premium such as embedded and older mainframe applications.
Bubble sortnn^2n^21YesExchangingTiny code size.
Binary tree sortnn \log nn \log n (balanced)nYesInsertionWhen using a self-balancing binary search tree.
Cycle sortn^2n^21NoInsertionIn-place with theoretically optimal number of writes.
Library sortn \log nn^2nYesInsertion
Patience sortingn \log nnNoInsertion & SelectionFinds all the longest increasing subsequences in O(n log n).
Smoothsortnn \log nn \log n1NoSelectionAn adaptive sort: n comparisons when the data is already sorted, and 0 swaps.
Strand sortnn^2n^2nYesSelection
Tournament sortn \log nn \log nn[4] ?Selection
Cocktail sortnn^2n^21YesExchanging
Comb sortnn \log nn^21NoExchangingSmall code size.
Gnome sortnn^2n^21YesExchangingTiny code size.
UnShuffle Sort[5]kNkNkNIn place for linked lists. N*sizeof(link) for array.Can be made stable by appending the input order to the key.Distribution and MergeNo exchanges are performed. Performance is independent of data size. The constant 'k' is proportional to the entropy in the input. K = 1 for ordered or ordered by reversed input so runtime is equivalent to checking the order O(N).
Franceschini's method[6]n \log nn \log n1Yes?
Block sort[7]nn \log nn \log n1YesInsertion & MergingCombine a block-based O(n) in-place merge algorithm[8] with a bottom-up merge sort. Turns into a full-speed merge sort if additional memory is optionally provided to it.