A permutation is a way of ordering of a set of objects or symbols where each object occurs exactly once: a permutation of {A, B, C, D, E}, for example, is (E, B, A, D, C). There are n! (that is, 1 × 2 × 3 × ... × n), distinct permutations of n objects.
It's often useful to be able to generate a random permutation of n objects on a computer, for the purpose of randomly ordering a list. We would like the permutation distribution to be unbiased - that each of the n! permutations be generated with equal probability. In order to generate the abstract mathematical object of a random permutation on a computer, it must first be represented numerically: we therefore generate instead a list of n distinct random numbers a1,...,an with 1 ≤ ai ≤ n. The random permutation can then be defined from this list by moving each object i to position ai.
The naive way of generating such a list:
- Generate a trial random number k from 1 to n.
- If k is already in the list, go back to step 1
- Otherwise, add k to the list
- If fewer than n numbers have been added to the list, go back to step 1
is hopelessly inefficient. The time required to check whether each trial number is already in the list is of order
n. Furthermore, when nearly
n numbers have been added, the number of trial random numbers required before one is found that is not already in the list, is also of order
n. This leads to an overall
O(N3) running time.
A more efficient algorithm generates a list of n (not necessarily distinct) random numbers, taken from a range much larger than 1 to n. The probability that the numbers are, in fact, distinct is therefore high. These numbers are then replaced with their rank relative to each other, producing the required list of n distinct random numbers from 1 to n. However, this algorithm is at least O(n log n) (the complexity of the fastest known ranking algorithms), and has the added complication of dealing with the case where the same number is by chance assigned to two different objects.
Fisher-Yates shuffle
The preferred way of generating a random permutation is the
Fisher-Yates shuffle. which achieves the optimal
O(n) complexity, and requires no storage in addition to the list of length
n since it operates by repeatedly swapping pairs of numbers. The algorithm is as follows:
- Start with a list of numbers 1, ..., n, and let m = n
- Pick a random number i between 1 and m inclusive.
- Swap the ith and mth numbers in the list.
- Decrease m by one.
- Go to step 2 until m = 1.
The algorithm first selects a random number to put in the
nth place in the array, then works backwards, selecting numbers randomly from those remaining to put in the
n-1 th place, the
n-2 th place, and so on, until every number has been assigned randomly. That the resulting permutation is unbiased can be shown by considering the sequences of random numbers chosen in the algorithm. There are
n! possible random number sequences that can be generated, with each corresponding to a unique resulting permutation: thus if the random number sequence is unbiased, the resulting permutation is also unbiased.
Cyclic permutations
Defining the function f(i) as returning the position that the ith object goes to under a given permutation (i.e. f(i) = ai), a cyclic permutation is one for which f(m)(i) = i is true for all i iff m is a multiple of n. In other words, a cyclic permutation is one in which the object at p1 is taken to position p2, the object at p2 is taken to position p3 and so on, with the object at pn being taken to the position p1. These permutations form a subset of size (n-1)!.
Sandra Sattolo (Information Processing Letters, 22, 1986, 315-317) showed that a modified Fisher-Yates shuffle, can also generate random unbiased cyclic permutations. In short, replacing the range 1,...,m by 1,...,m-1 in step 2 of the Fisher-Yates shuffle causes the number of cycles in the permutation to decrease by one every time a swap is made. Starting with the identity permutation with n cycles, after n - 1 swaps the remaining random permutation has one cycle (i.e. is cyclic).
Pseudo-random number generation
Generation of random permutations requires that a certain amount of care is taken in pseudo-random number generation. A pseudo-random number generator (PRNG) cannot generate more distinct sequences than the number of distinct seed values that it can be initialised with. To have a chance of generating all n! permutations, at least this many (and preferably orders of magnitude more) distinct seed values must be possible. For n = 100, n! ≈ 2525, which requires a internal state size (and seed size) of at least 525 bits: this is much larger than the 32-bit internal state of most general-purpose PRNGs.
No Comments