Wednesday, June 17, 2009

Markov Name Generator C# Class

A Markov Chain is a way of generating a result based on the statistical probabilities in a set of sample data. The MarkovNameGenerator class can generate random words (names) based on a set of sample words that it takes as input.

It works by building "chains" that count the number of occurrences of each letter in the alphabet after each group of "N" letters in the sample name. "N" is called the "Order" of the Markov Chain.

The class generates a new name by first selecting a random group of "N" letters to begin the name. It then goes forward letter by letter, examining the Chains and picking a new letter from the pool of letters that followed the previous "N" letter groups in the sample data.

For example, the words "Sally Sells Seashells" breaks down into the following 2-Order Chains:

SA-(L)
AL-(L)
LL-(Y)
LY-(End)
SE-(L,A)
EL-(L,L)
LL-(S,S)
LS-(End,End)
EA-(S)
AS-(H)
SH-(E)
HE-(L)

Note the Chains in yellow. These represent 2 letter pairs that occurred more than once in the sample words. The Chain contains all of the letters that follow the single pair of letters.

Now, when the generator makes a new random name it follows this sequence:

1 - Pick a starting pair of letters randomly:

AS

2 - Pick a random letter in the Chain following AS. In our example the only one is H.

AS->H

3 - Continue picking random letters in the chain until we get an end.

A(SH)->E
AS(HE)->L
ASH(EL)->L
ASHE(LL)->S

Result = Ashells

In this sample, we never encountered the SE pair, but if we had, we would have appended one of the two candidate letters in the chain randomly.

The fun comes when you feed the name generator with a large number of sample names. The generator, particularly when set to an Order of 3, can produce new names that bear a remarkable similarity in style to the sample names, while still being original.

Using the Class

The MarkovNameGenerator class constructor takes 3 arguments:

public MarkovNameGenerator(IEnumerable sampleNames, int order, int minLength)

The first parameter is the collection of sample names. This can be an array or list of strings. Each string can contain one sample name, or a number of samples separated by commas.

The second parameter determines the Order of the Markov Chains, and the last parameter specifies the minimum length of a generated name.

Accessing the NextName property returns a new random name based on the sample data. The class ensures that the same random name is not returned twice.

Call the Reset method to clear out the internal list of previously generated names.

Download

Click here to download the MarkovNameGenerator.cs C# file.

Silverlight Sample App

http://www.siliconcommandergames.com/markov/TestPage.html

No comments:

Post a Comment