Project to Generate Random Words

Since I write software for a living (and for a hobby), I thought it would be interesting to write a quick Python program to see how many random words I could generate that were valid words. In this post, I’m not drawing any conclusions or extrapolations from the data, rather just reporting the results.  The key to making this work was to find a way to determine if the words I was generating were valid or not.  So, I decided to bounce the words off of the Merriam-Webster dictionary online.  After looking at how they construct the URL, I figured out how I could do it.  I have provided the source code and you can download it here:
Random Word Generator code
Here is a view of the code where you can see it formatted with syntax highlighting and coloring:
Image of formatted source code

The code essentially uses the letters of the alphabet to try to construct random words in the range of 3 to 12 characters long. Immediately discarded are any randomly generated words with 4 or more consonants in a row OR any words that already exist in the list of words already found to be valid. This saves the app from having to make another web page read when I can determine ahead of time that the word is not valid. Another feature of the program is that it sleeps for a random range of 3 to 8 seconds between each call to Merriam Webster, that way I’m not hammering their server, but rather am behaving more like a real user of the site with “think time” built in.

I ran the program 2 separate times for a total running time of 3.115 days (almost 75 hours – exactly 4484.91 minutes). During these 2 runs, I generated a total of 123,459 words. 640 of the generated words (0.52%) were found to be valid words according to the Merriam Webster dictionary. Of those 640 valid words, 535 of them were 3 characters, 97 of them were 4 characters, 7 of them were 5 characters and there was 1 valid 6 character word generated. The script is supposed to eliminate abbreviations, because there is a way to programmatically detect that Merriam Webster is reporting the word as an abbreviation. However, as I look at the words determined to be valid, many of them appear either to be abbreviations, acronyms or otherwise unrecognizable. Here is the list of valid words generated, I will let you make your own decision. Here are the 226 “valid” words generated from run 1:

‘HLA’, ‘UPFOR’, ‘BUN’, ‘CHI’, ‘LUM’, ‘COW’, ‘BUM’, ‘SINE’, ‘ADE’, ‘TAI’, ‘TIS’, ‘CEE’, ‘SUE’, ‘PRE’, ‘SUR’, ‘PAY’, ‘FRO’, ‘APC’, ‘UGH’, ‘NOB’, ‘EOS’, ‘OEM’, ‘LETT’, ‘GAB’, ‘TAB’, ‘ZDV’, ‘BOY’, ‘CATT’, ‘DID’, ‘APL’, ‘GLOM’, ‘GON’, ‘MOON’, ‘ADO’, ‘LYO’, ‘GIA’, ‘HID’, ‘THE’, ‘WAY’, ‘FRA’, ‘OUD’, ‘JOB’, ‘LAO’, ‘IER’, ‘EAT’, ‘RIM’, ‘CORI’, ‘DRAYS’, ‘ZOO’, ‘KUN’, ‘AIX’, ‘LSD’, ‘HOD’, ‘EVE’, ‘BIZ’, ‘ELM’, ‘BUB’, ‘HSI’, ‘SLO’, ‘XTC’, ‘SANA’, ‘OHM’, ‘LAS’, ‘POOR’, ‘WAG’, ‘YON’, ‘VAV’, ‘HIT’, ‘RBI’, ‘ISM’, ‘PEU’, ‘XML’, ‘POI’, ‘SEW’, ‘ZUG’, ‘LOU’, ‘SIV’, ‘JET’, ‘AHI’, ‘GAT’, ‘RSS’, ‘PLY’, ‘RPV’, ‘COZ’, ‘MUD’, ‘DOW’, ‘SUI’, ‘WAC’, ‘MIA’, ‘YUK’, ‘SHM’, ‘HAWK’, ‘DAX’, ‘FIX’, ‘ACL’, ‘DIT’, ‘TEE’, ‘BEY’, ‘DANTE’, ‘WIG’, ‘SET’, ‘PAZ’, ‘VOW’, ‘TIC’, ‘MCO’, ‘GNAT’, ‘GLOW’, ‘VOG’, ‘MEGA’, ‘SOS’, ‘MAB’, ‘FTP’, ‘PALY’, ‘SICK’, ‘NING’, ‘YOD’, ‘ORR’, ‘IGA’, ‘GAN’, ‘ODE’, ‘BUG’, ‘OUR’, ‘JIG’, ‘RAN’, ‘RUG’, ‘YER’, ‘KANT’, ‘ROY’, ‘KAME’, ‘LOW’, ‘HET’, ‘DULL’, ‘LOSE’, ‘HOL’, ‘FEL’, ‘PAU’, ‘FIR’, ‘NIP’, ‘HIB’, ‘CEO’, ‘PPO’, ‘GOA’, ‘MUG’, ‘SAY’, ‘GOT’, ‘MOW’, ‘ATTU’, ‘GET’, ‘AUK’, ‘SEA’, ‘FOI’, ‘ECU’, ‘PUS’, ‘TRY’, ‘VCR’, ‘OOH’, ‘PRY’, ‘IOUS’, ‘AMI’, ‘HEED’, ‘ORB’, ‘TIP’, ‘TUP’, ‘CUI’, ‘ONO’, ‘WEN’, ‘HUM’, ‘PICA’, ‘ROW’, ‘EEK’, ‘KITH’, ‘ABY’, ‘IBN’, ‘PUT’, ‘HAE’, ‘HUN’, ‘DII’, ‘YIP’, ‘EAR’, ‘MHO’, ‘MUR’, ‘TRIX’, ‘FIRS’, ‘VEG’, ‘DUE’, ‘SHAD’, ‘PIS’, ‘ASH’, ‘KOO’, ‘USB’, ‘BAH’, ‘LOFT’, ‘YEA’, ‘ABLY’, ‘PDQ’, ‘BUY’, ‘AIR’, ‘ECK’, ‘IGG’, ‘FUN’, ‘HOST’, ‘UKE’, ‘JIH’, ‘END’, ‘LAG’, ‘PAD’, ‘TETH’, ‘ADZ’, ‘PAL’, ‘SIR’, ‘SAP’, ‘LELE’, ‘OAK’, ‘RETZ’, ‘CIAO’, ‘JUDE’, ‘PUL’, ‘TOUT’, ‘CUT’, ‘COWS’, ‘MIM’, ‘OVI’, ‘JIM’, ‘DDT’, ‘IUD’, ‘LAW’

In looking at this list, these 2 stuck out to me:

‘THE’, ‘WAY’

See Acts 9:2,19:9,19:23,24:14,24:22 for the biblical usage of this phrase…

and here are the 414 “valid” words generated from run 2:

‘HEE’, ‘SUR’, ‘OUR’, ‘DID’, ‘BRA’, ‘DRAB’, ‘PEE’, ‘KIN’, ‘CHU’, ‘RIB’, ‘MAT’, ‘AWE’, ‘RAJ’, ‘UVC’, ‘ASK’, ‘ALES’, ‘VRE’, ‘DPN’, ‘XML’, ‘HOY’, ‘TEE’, ‘TIU’, ‘PAW’, ‘DES’, ‘TUP’, ‘ROB’, ‘KYD’, ‘ABU’, ‘AVO’, ‘GOA’, ‘RUN’, ‘LOP’, ‘SUI’, ‘SEE’, ‘DEZ’, ‘KNUR’, ‘NIM’, ‘FEZ’, ‘BUN’, ‘MOJO’, ‘NOH’, ‘LYS’, ‘UNO’, ‘AGE’, ‘ELK’, ‘LAC’, ‘CHI’, ‘VISE’, ‘HIP’, ‘HUB’, ‘WEN’, ‘ZIG’, ‘WEI’, ‘MEW’, ‘ATE’, ‘END’, ‘LELY’, ‘TOW’, ‘GHAT’, ‘HAI’, ‘MEN’, ‘MUNRO’, ‘BAD’, ‘COX’, ‘RAT’, ‘ETH’, ‘ICS’, ‘HUE’, ‘OUD’, ‘PILED’, ‘PIG’, ‘PARD’, ‘DITZ’, ‘AIN’, ‘SALP’, ‘TSHI’, ‘FOY’, ‘SKI’, ‘PUT’, ‘IVY’, ‘ALE’, ‘HET’, ‘III’, ‘UAV’, ‘XTC’, ‘KUT’, ‘IUD’, ‘GNU’, ‘AWNS’, ‘WAX’, ‘QUA’, ‘ZOO’, ‘QOM’, ‘ULM’, ‘KEYS’, ‘WHY’, ‘JOW’, ‘WOP’, ‘LEE’, ‘CUP’, ‘ZITI’, ‘TEN’, ‘ZAP’, ‘CWM’, ‘YUK’, ‘RAG’, ‘BIO’, ‘TUX’, ‘MOP’, ‘FAN’, ‘HUG’, ‘GEL’, ‘FLU’, ‘DUNG’, ‘HIE’, ‘POI’, ‘SIC’, ‘OAK’, ‘VII’, ‘BOAS’, ‘POM’, ‘IFS’, ‘ONO’, ‘IGD’, ‘BHC’, ‘JIB’, ‘LUM’, ‘TIL’, ‘FUN’, ‘FAT’, ‘TIC’, ‘WET’, ‘SET’, ‘YID’, ‘DOL’, ‘TWA’, ‘IPO’, ‘DING’, ‘URO’, ‘OVI’, ‘SRI’, ‘KOCH’, ‘NAN’, ‘FOX’, ‘RAW’, ‘SOD’, ‘VOW’, ‘EAT’, ‘REM’, ‘RUT’, ‘LYE’, ‘ALLO’, ‘TAX’, ‘TOWS’, ‘TED’, ‘OFT’, ‘HMO’, ‘WOK’, ‘OCA’, ‘RRNA’, ‘FEE’, ‘PRE’, ‘UTE’, ‘NET’, ‘DDE’, ‘DDD’, ‘JAY’, ‘LID’, ‘ISH’, ‘FIRM’, ‘PED’, ‘FIX’, ‘LAN’, ‘PDQ’, ‘DIE’, ‘LOD’, ‘WAD’, ‘POST’, ‘WAY’, ‘WOE’, ‘ALOW’, ‘DEK’, ‘YAK’, ‘SNP’, ‘DHU’, ‘BASE’, ‘CRI’, ‘PAZ’, ‘SPY’, ‘ODE’, ‘CEE’, ‘MHO’, ‘GON’, ‘BALK’, ‘BOSC’, ‘INTI’, ‘OAF’, ‘DAX’, ‘FTP’, ‘ELL’, ‘NUT’, ‘TAU’, ‘HGE’, ‘NEO’, ‘USK’, ‘UFA’, ‘TOL’, ‘DIT’, ‘EOS’, ‘ATP’, ‘SUM’, ‘TWI’, ‘REX’, ‘UCL’, ‘SST’, ‘YALL’, ‘TAW’, ‘ABM’, ‘ANE’, ‘SIR’, ‘VERY’, ‘KAT’, ‘UPAS’, ‘PEW’, ‘HSU’, ‘GTP’, ‘TWO’, ‘MID’, ‘STY’, ‘JOY’, ‘DEE’, ‘YACK’, ‘HUM’, ‘RABI’, ‘GAUD’, ‘DUE’, ‘OAR’, ‘TAO’, ‘JAP’, ‘CRUD’, ‘YAP’, ‘KAY’, ‘EAR’, ‘YON’, ‘JAW’, ‘KOS’, ‘TOM’, ‘DUI’, ‘FOP’, ‘CHA’, ‘DUN’, ‘OUT’, ‘KOO’, ‘TAM’, ‘AWL’, ‘BHA’, ‘POX’, ‘LOW’, ‘GOT’, ‘LULL’, ‘TUB’, ‘MIM’, ‘HOW’, ‘FUG’, ‘KOI’, ‘HIB’, ‘SLO’, ‘PAPA’, ‘XER’, ‘FAG’, ‘PEA’, ‘IST’, ‘TAB’, ‘ROY’, ‘GHB’, ‘SON’, ‘LDL’, ‘JAB’, ‘LOB’, ‘BENXI’, ‘RAE’, ‘NAB’, ‘TARE’, ‘YIP’, ‘GAY’, ‘OIL’, ‘PIE’, ‘WERT’, ‘FIS’, ‘LAK’, ‘ZUG’, ‘PAL’, ‘BOK’, ‘QAT’, ‘WARN’, ‘LICK’, ‘LANK’, ‘LIKENS’, ‘HORN’, ‘MARE’, ‘DAW’, ‘PYX’, ‘ECK’, ‘FIE’, ‘TUBS’, ‘DME’, ‘TOE’, ‘REB’, ‘NCO’, ‘NOG’, ‘THOR’, ‘AYR’, ‘THO’, ‘BVD’, ‘PAR’, ‘TEPA’, ‘THY’, ‘LAO’, ‘TOD’, ‘OKA’, ‘RYE’, ‘ASS’, ‘GUY’, ‘GEE’, ‘TRI’, ‘CTL’, ‘SAN’, ‘LAH’, ‘TOP’, ‘HEP’, ‘WAS’, ‘AGO’, ‘CIG’, ‘AZA’, ‘HEM’, ‘SOW’, ‘TNT’, ‘TIRL’, ‘GOY’, ‘HOF’, ‘RNA’, ‘GAP’, ‘GOUGH’, ‘ORR’, ‘NEW’, ‘SHY’, ‘EME’, ‘URI’, ‘SAL’, ‘FRA’, ‘HAN’, ‘PEP’, ‘UGLI’, ‘HIT’, ‘ATV’, ‘HAD’, ‘BMX’, ‘PYA’, ‘BARD’, ‘PIT’, ‘RAY’, ‘PISA’, ‘RBI’, ‘GUAN’, ‘AMI’, ‘DUMP’, ‘KOA’, ‘HEN’, ‘FID’, ‘WEE’, ‘HUN’, ‘CPU’, ‘BIS’, ‘CRED’, ‘PILL’, ‘TPN’, ‘GAG’, ‘LOT’, ‘AZT’, ‘PUMP’, ‘DFP’, ‘RAX’, ‘AIX’, ‘EER’, ‘PUN’, ‘PUL’, ‘POW’, ‘JUG’, ‘FIB’, ‘NIN’, ‘THUS’, ‘ZIP’, ‘GOO’, ‘WHIZ’, ‘TYR’, ‘CEL’, ‘MAX’, ‘ALI’, ‘RIG’, ‘KIP’, ‘ZED’, ‘BOB’, ‘NEVE’, ‘FEW’, ‘ROH’, ‘ODD’, ‘UTA’, ‘ADE’, ‘ROD’, ‘GYP’, ‘IVE’, ‘CORM’, ‘IBN’, ‘VID’

I certainly welcome anyone to draw conclusions from this data.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s