Thursday, July 14, 2011

The Hamming Distance Option: Part 3 of Peter and the Postal Service

Hamming distance is a measure of the difference between strings of two equal length. Formally, the hamming distance between two strings is a count of the number of positions in which those strings differ. It is computed by traversing both strings, character by character, and counting the differing characters. Two strings with a hamming distance of zero are perfect matches.

Peter was shocked that the mailman had refused to deliver the letter to "Petee Fencer" because it was not a perfect match with his name. The 'e' at the end of the name was obviously supposed to be an 'r'. Sowhat if Peter had been the one to insist on perfect string matching? The mailman should have been able to pick up this error easily.

"Okay. New plan." started Peter. "Let me tell you about hamming distance…"

The mailman continued to smile politely. He was clearly enjoying this new game.

"Hamming distance is a measure of the distance between two strings. You go through each position and count how many of the characters are different. For example, the distance between "Petee" and "Peter" is 1 -- the last letters do not match."

"Okay." Agreed the mailman. "And the distance between 'Peter' and 'Puget' is 3, right? And the distance between 'Peter' and 'Smart' is 5, right?"

"Uh... yeah." confirmed Peter. Obviously the mailman was well versed in hamming distance. Peter wondered how he had had those two examples so readily available.

"What if the lengths of the strings are not equal?" the mailman asked. "Doesn't hamming distance apply to equal length strings?"

"Well... Yes. Technically, they are supposed to be equal length in order to compute hamming distance. I guess that we should be using a slightly different metric in that case..." commented Peter. He thought over the situation for a moment. "Let's just pretend they are the same length. Just add spaces to the end of shorter one until they are the same length."

"Really? Isn't that cheating?" The mailman gave Peter a shocked look. It was one thing to insist on using a specific metric to match mail, but it was another matter entirely to cheat with that metric.

Peter dismissed this concern. "Filter out all letters with hamming distance greater than 2."

"Are you sure?" the mailman asked.

Peter nodded.

"So you do not want anything to 'Mr P. Fencer'?" confirmed the mailman.

"Wait! Of course I do. That is obviously me. P is just an abbreviation." yelled Peter.

"But the hamming distance is much more than 2. In fact, it is 5. "Mr P. Fencer" and "Peter Fencer" do not match until the space before your last name." argued the mailman innocently.

"Yes, but…" started Peter.

"Nope." interrupted the mailman. "You told me to filter all names with a hamming distance greater than 2."

"Wait." Peter argued. "How about string edit distance?" he tried.

The mailman looked at him for a full minute. "Do you want the letter I have for 'Apprentice P. Fencer'? Or do you want to keep trying to help me improve my job?"

Peter sighed heavily. He had underestimated the complexities of string comparisons for mail delivery. He should have listened to the mailman's warning.

"If I apologize, can you go back to the algorithm that you were using before?" Peter asked.

The mailman looked at Peter for a moment. "I suppose I could. You know that it is not 100% accurate though, right? It even uses a few heuristics."

Peter nodded humbly.

From that day on, Peter started receiving all of his mail. He learned not to argue string comparison algorithms with postal carriers.


See how it started with Part 1 and Part 2 of Peter and the Postal Service.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.