Creating and testing a cheating detector

The core components of EDGAR are “detectors”. Each detector is a set of rules for looking at a group of similar decisions or results for signs of cheating. These rules determine which deals to include and define each decision/result as successful or unsuccessful (we use the terms “Hit” and “Miss” – some detectors include “Neutral”).

One example of a "detector" comes courtesy of Kit Woolsey: opening leads from an unsupported King against a suit contract. The rules for Kit’s King-Underlead Detector:

Included deals: All non-trump opening leads from Kx+ vs a suit contract
Hit: Partner has the Ace, Queen, or shortness
Miss: Partner doesn’t have the Ace, Queen, or shortness

If partner almost always held the Ace or Queen over a large enough sample, it makes bridge sense that this is evidence of cheating. In fact, players convicted of cheating by other means often have a remarkable hit rate on this measure, usually including some examples that are shocking.

Note: EDGAR’s opening lead detectors take a more nuanced approach, considering factors such as whether any player bid the suit and comparing the chosen lead to alternative options.

Depending on the detector, hit/miss might be nothing more than a good or bad decision/guess. Our simple rules for Hit/Miss are not perfect, so not every rating will be accurate, but these discrepancies go in both directions and are rare enough that the overall picture is accurate. One hit or miss only moves the needle a little bit, and the results are aggregated across many plays and many detectors before reaching a conclusion with confidence. How many deals depends on how blatant the cheating.

Once we create the detector, we run a simple test using a small collection of “known” cheaters (typically from ACBL’s MUD-list) vs an equal number of “known'' innocent pairs. To pass this first test, the detector must easily distinguish between the two groups – most cheaters fall on one side of the line, and most controls fall on the other.

We review a large sample of each detector’s ratings deal-by-deal to determine if it might capture something other than cheating. To pass this test, the detector must easily distinguish cheating from (1) an innocent player’s “style”, (2) expert play, and (3) novice play.

Once we are satisfied the detector works as intended, and our initial tests look promising, we run thousands of pairs to make sure there are no false positives. When first creating EDGAR, at this step every pair that EDGAR flagged was confirmed by human investigation. Now when we create a new detector (or make improvements to an existing one), we can compare the results against our existing detectors. Any outlier pairs are again investigated deal-by-deal to make sure they are not false positives.

Our overriding goal is to avoid false positives, so detectors are calibrated conservatively.