Thursday, April 28, 2011

How to Shuffle an Array or the Fisher-Yates Algorithm

Question: how can you shuffle an array in O(n) time and O(1) space? For example, if input is an array of (1, 2, 3, 4, 5), one of the output can be (5, 4, 3, 2, 1) or (4, 2, 3, 1, 5).

Solution: this problem can be solved using Knuth Shuffling Algorithm a.k.a Fisher-Yates Shuffling Algorithm. The core strategy behind the algorithm is to pick a random index between 0 and N, where N is the greatest index of the unshuffled array, and then move that element to the shuffled part of the array. In other words, the algorithm partitions the original array into two parts, the unshuffled and shuffled part. It randomly picks an element from the unshuffled part and puts it into the shuffled part. It does that until no elements left in the unshuffled part. Here is the implementation in C:

#include<stdio.h>
#include<stdlib.h>
#include<time.h>

void knuthShuffle(int orgArray[], int arraySize)
{
   if (arraySize == 0 || arraySize == 1)
      return;

   srand(time(NULL));

   int i;
   int index, temp;
   for (i = arraySize - 1; i > 0; i--)
   {
      index = rand() % (i+1);
      temp = orgArray[index];
      orgArray[index] = orgArray[i];
      orgArray[i] = temp;
   }
}

Explanation: in order to partition the array, we traverse the array from the end to start. The subarray from iterator index i to the last index of the array (arraySize - 1) is the shuffled part. And, the subarray from the first index (0) to the iterator index i is the unshuffled part. As, the index i moves up the array, we randomly generate a number between 0 and i + 1 using C-function rand(). We then swap the element at the index that equals the random number with the element at the last index (i) of the unshuffled part. As the result of decreasing i by 1 for every loop iteration, the partitions are automatically reserved. Here is an example with illustration that will hopefully make everything clearer.

Example: we'll shuffle this array(1, 2, 3, 4, 5). Notice that the value for index is randomly picked, so at run time the algorithm may not generate the values in the exact order that they are here. We just need to understand that, the random values are always greater than or equal 0 and less than i. Here is what the array starts out with:

  1. First iteration, i = arraySize - 1 = 4. Let's say that index is randomly generated as 3, so we switch array[3] with array[4]. Here is the picture:



  2. Second iteration, decrement i by 1, so i = 3 and index randomly generated as 0. We swap array[0] and array[3]:



  3. Third iteration, i = 2 and index = 1. Swap array[2] with array[1].



  4. Fourth iteration, i = 1 and index = 0. Swap array[0] and array[1].



  5. Fifth iteration: i = 0, so loop terminates. Nothing happens. Here is the final shuffled array:

If there is any bug or better solution, please let me know in the comment section below. Thanks for reading :)

2 comments:

  1. All the implementations of this algorithm that I find on the 'net have the same problem: modulo bias

    http://en.wikipedia.org/wiki/Fisher-Yates#Modulo_bias

    ReplyDelete
  2. Knuth shows how to do this without introducing modulo bias in his book. So why don't people just follow Knuth more closely?

    ReplyDelete