Confidence and Confusion
The OCR Classifier tool determines not only the classification (character code/instance) of the run-time rectified image, but it also reports the score of that classification and the confidence of that classification. The score is an indication of the closeness of the match to the training instances. The confidence is computed as the difference between the score of the classification (the highest scoring training instance) and the score of the next highest classification (the highest scoring training instance from a different class). The OCR Classifier tool’s result includes a status (READ, CONFUSED, or FAILED) indicating the quality of the result.
The OCR Classifier tool performs internal classification validation checks to verify/validate that the highest scoring candidate class is the correct classification. If this validation fails, the highest scoring candidate is marked as CONFUSED and the confidence score is set to 0. Some examples of internal classification validation checks involve rescoring the candidates using different metrics. The correct match will always score highest regardless of the match metric. The result of this validation does not affect the result score.
In addition to determining the classification, the score, and the confidence score, the OCR Classifier tool also reports a set of alternative classifications. The alternative classifications are all of the classes that induce sufficiently high scores. The confusion character is defined to be the highest scoring alternative character that is not a swap character of the highest scoring character. (For the definition of swap characters, see section Swap Characters.) There will always be at least one alternative character/confusion character so long as the highest scoring character met the accept threshold and so long as there is at least one other (non-swap) class with non-zero score.
[(the lowest score that is greater than or equal to the accept threshold) – (confidence threshold)] + (one different character more than that)
The alternative characters are sorted in the order of decreasing score.
The status is:
- READ if its score satisfies the accept threshold and the confidence score satisfies the confidence threshold.
- CONFUSED if the score satisfies the accept threshold but either the confidence score does not satisfy the confidence threshold or an internal classification validation check does not pass.
- FAILED if the score does not satisfy the accept threshold.
When the result is CONFUSED, the confusionExplanation member of the classifier result indicates whether it was due to the confidence score being too low, or due to a failure of the classification validation check.