Blog post -

Know your odds

The Pitfall behind Probability-Based Automatic Dialect Discrimination 

The human mind is amazing, but subject to bias. Assessing probabilities is an area where it is particularly justified to carry out one’s arithmetic rather than trusting a hunch. Such as in the case of automatic dialect discrimination, which is used here to illustrate a frequently visited pitfall.

A decision to grant an applicant asylum is typically based on an assumption as to the geographical origin of the applicant. With documents to support a stated origin, you have a sound basis for a decision readily available, but this is more the exception than the rule.

The burden of proof may reside with the applicant, but even in the absence of documentary evidence, you would still not want to plainly reject the application. Not without offering alternative possibilities of corroboration, if such are to be found. Particularly not if your experience suggests that on average around 80% of the claims are actually correct.

If there were an ironclad test that never failed to distinguish false claims from true ones, you would probably not hesitate to put it to routine use. Rarely are we so lucky, so you would be interested in how common the inevitable errors are. Without that number, you would not know what weight to assign to the test.

A test that separates A from B with a 50% error rate is equivalent to tossing a coin. You know nothing. The error rate has to be lower than that and when it gets close enough to 0%, you will accept it. What “close enough” means to you depends on how you value the consequences of an erroneously granted and an erroneously rejected application.

Let us assume that you consider using a test with a 20% error rate. That could be tempting, since you would know that, for 80% of the rejections, you were right and a mere 20% of the rejections were mistakes. Or would you?

Time to think before you place your bet! Of the 80% claims that happened to be genuine, 80% will be correctly classified as such, or in other words: 64% of the initial total. Of the 20% claims that are false, 20% will be erroneously classified as genuine, making up 4% of the same initial total. So 68% of the claims will be classified as genuine and the remaining 32% as false. The false claims constitute 20%, but the test boosts this to 32%... A 60% increase!

It is clear that a fatal illusion is on the loose. What do you then know about the accuracy of the rejections? Well, out of the inflated 32% rejections, 16% stem from false claims recognized as such. The other 16% emanated from misclassified true claims (see table below). So, what do you know about whether the rejections are correct, stated in exact figures? You know that the rejection with 50% likelihood is correct and with 50% likelihood erroneous. Or, expressed in plain language: you know nothing!

Now, you may have identified the pitfall and cunningly avoided it. Do not let that make you think the pitfall is equally visible to everyone. Have some of your friends walk down that same garden path and see how they fare. You may even be able to make a fairly accurate assessment of the likelihood of their falling into the trap.

Topics

  • Language

Categories

  • ars
  • dialect
  • automatic
  • automatic dialect discrimination
  • språkanalys
  • language analysis
  • asylum

Regions

  • Utanför Sverige

Related content