Wednesday, April 6, 2011

Why Tailors Use Insertion Sort

Insertion sort is a trivial, although potentially inefficient, algorithm for sorting. Given an array of numbers, the algorithm iteratively builds an increasingly longer sorted prefix within the array, until the entire array is sorted.

At the i-th iteration of the algorithm, the first i numbers in the array are in sorted order and the remainder might not be. The algorithm then: takes the i+1 number in the array, finds the correct location to insert it in the sorted prefix, and inserts it (shifting the following numbers down). Thus, after this step, the first i+1 numbers in the array are sorted.

"Insertion sort!" scoffed Peter. After only three months into his apprenticeship at the library of Alexandria, Peter was already arrogant from the computational knowledge that he now wielded.

"Who would ever use insertion sort?" Peter continued, working himself into a full rant. "It is an O(N^2) algorithm! You need to learn the wonders of merge sort. I have the scroll right here."

"Oh," remarked the shocked patron, hesitantly taking the scroll. "Thank you?"

"What is going on?" asked the head librarian as he returned from the stacks.

"Your young apprentice was 'educating' me on the correct sorting algorithm to use in my shop," explained the patron.

"He is using Insertion Sort," added Peter mockingly. The librarian did not burst into laughter as Peter expected. Confused, Peter looked back and forth between the librarian and the patron waiting for some reaction.

"I thought you came for a scroll about removing blackberry stains. It took me a while to find that one." said the librarian. Blackberries were out of season and the librarian had had to go deep into the stacks to find the relevant scroll.

The patron nodded. "While I was waiting, I was just explaining to young Peter here how my tailoring shop works. He had some interesting questions about how I sort the jackets."

"Ah," said the librarian. He knew the tailor well. "I think insertion sort makes sense in your case."

Peter was shocked. "What?" he screamed. He felt the blood rise to his face. How could anyone disagree with such a clear difference in computational complexity? That was almost as bad as cheering against the East Alexandria soccer team.

The librarian looked sternly at Peter. Embarrassed, Peter looked down at the floor.

"I do not understand," Peter added quietly.

"Perhaps you should explain how your shop actually works," suggested the librarian.

"It is simple," explained the tailor. "The shop is small, so I only have a single rack for all my jackets. I like to keep them sorted by size, because it is easier when patrons come in. I use binary search to quickly find a jacket in any size I need."

"That makes sense," agreed Peter. Binary search was another one of Peter's favorite algorithms for solving problems. At least the tailor used a reasonable search algorithm.

"Each monday, I get a new shipment of jackets, which I need to add to the rack. I start by putting them on the end of the rack, so that they won't sit on the floor and get wrinkled. The new fabrics are finicky this year. I wish that silk could have stayed in style a little longer. Anyway, after I put them on the rack, I have N sorted jackets at one end and then M new unsorted jackets after them. So I take each of the new jackets and insert it into the already sorted set of jackets. After inserting the first jacket, I have N+1 sorted jackets and M-1 unsorted jackets. I keep doing this until everything is inserted."

Before Insertion

During Insertion

After Insertion

"But…" started Peter. The librarian cut him off.

"I think what Peter wants to know is: 'Why you do not just take all the jackets off the rack and use something like MergeSort?'" explained the librarian. "MergeSort can be faster."

"Because jackets are heavy," explained the tailor. "It is a pain to take them off the rack. But they do slide easily down the rack."

Peter looked confused. Why did it matter how hard it was to take things off the rack? This was an argument about computational complexity.

"I think the factor that you are missing is: when most people use insertion sort, the insertions are expensive," the librarian explained to Peter. "Consider an accountant who is trying to sort a list of accounts in one of his books. Each line is one account -- like entries in a computer's array. You cannot just insert something and have the rest shift down automatically. That would take a most tedious form of magic. Every time the accountant wants to do an insertion, he has to manually shift down everything below that. A single insertion is an O(N) operation. That is a lot of erasing and rewriting."

"But, for a tailor, the insertion is an O(1) operation. You just push the coats down," added the librarian.

Peter nodded an acknowledgement. Inside he chaffed at the stupid technicality of the physical world impacting the cost of different operations. The theoretical world was so much cleaner. However, he did agree with the librarian and the tailor; in this case it actually seemed like insertion sort was reasonable. He wondered what other real world applications might challenge his computational complexity assumptions.

Later that day, Peter tried using insertion sort on several carts worth of books. Unfortunately, the books did not move easily between the carts’ shelves, and Peter found himself spending the entire night shifting books between shelves. It took him an extra three hours to finish the sorting. He left the library at 2am, angry at himself for not determining the cost of the insertion operation before he had started sorting.


Read more about sorting algorithms with Merge Sort and Lines of Kindergarteners or Bullies, Bubble Sort, and Soccer Tickets.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.