I noticed that one of the articles updated in the official FileMaker Knowledge Base on September 23, 2011 was explaining how to select a random set of records in a FileMaker database. I was wondering how fast the currently recommended technique is and whether I can make it faster with the help of FM Bench. I tried to apply it to a set of 50,000 records of randomly generated data. Then I made it much faster...
This is the script recommended by the knowledge base article:
- Show All Records
- Replace Field Contents [No Dialog, MyFile::RandomField, Random]
- Sort Records by RandomField [No Dialog]
- Go to Record/Request/Page [First]
- Omit Multiple Records [No Dialog, 10]
- Show Omitted Only
I created an example to try this script out. Actually, you can download the example below and try it out yourself. As you can see this script selects a random set of 10 records from the current table.
When I tried to apply the script to a table containing 50,000 records of generated test data, it took 23 seconds to select the random set. This was the first time I ran the script after opening the file. Every following run took nearly 8 seconds, most likely because of some index caching.
Actualy, I originally tried to apply the script to a much smaller set of data, about 5,000 record, and it was taking aroudn 1 second. That was pretty acceptable but I really wanted to optimize it, so I was looking for circumstances under which it would fail...
I discovered that the Replace Field Contents script step can be much faster than I expected. But however fast it is, it is the weakest point of this script. Note that regardless of how many random records we want to select, this script always goes through all available records and modifies them. Then it again has to go through all records to sort them, although it is done internally by FileMaker Pro by executing the Sort Records script step. My first thought was: Do I really ever have to touch other records than those I want to have in the final found set?
Probably the fastest way to select one random record is using the Go To Record/Request/Page script step, calculating a random record number to jump to. So I based my optimized script on this technique. I simply use this script step as many times as I need to select the requested number of records:
- Allow User Abort [ Off ]
- Show All Records
- Set Variable [ $targetfoundcount; Value:Get(FoundCount) - RandomRecords::SetSize ]
- Exit Loop If [ Get(FoundCount) ≤ $targetfoundcount ]
- Go to Record/Request/Page [ Int(Random * Get(FoundCount)) + 1 ] [ No dialog ]
- Set Field [ RandomRecords::RandomField; Random ]
- Omit Record
- End Loop
- Show Omitted Only
- Sort Records [ Specified Sort Order: RandomRecords::RandomField; ascending ] [ Restore; No dialog ]
- Go to Record/Request/Page [ First ]
This script takes less than 50 milliseconds to select 10 records out of 50,000. That's at least 158 times faster!
Looks like a magic, but it isn't. When you try to select larger set of random records in my example, you'll discover that somewhere between 1,500 and 2,500 my optimized script starts taking longer than the original script from the knowledge base. Why is that?
The original script's execution time depends on the total number of records in the table. So it takes the same amount of time regardless of how many records you want to select. My optimized script directly depends on the requested number of records to select, so it is extremely fast in comparison with the original script especially when selecting small number of records from a large number of records. But beacause it goes through the individual records manually, while the original script loops through them internally in FileMaker Pro (inside the Replace Field Contents and Sort Records script steps), my script spends significantly more time per record.
So, not only is this yet another marvelous optimization example, but it also proves that a solution that's slower in one situation, can be actually faster in another situation.
Try it yourself...