If you’ve made it past the title of this post, congratulations. While copying over the title from my source material, I had to check to make sure it was correct a couple of times.
Essentially, a CAPTCHA is a form of authentication that lets computer systems know that a human is at the other end of the connection. You can find an example of a CAPTCHA at the official website. Generally, you will see a CAPTCHA when you sign up for an account with a website. You’ll fill in your information, and then type two distorted words into a box. The point to this whole system is to prevent automated computer programs and bots from filling out these web forms for account on websites. Although my sources report that computer technology is catching up to human visual recognition capability, CAPTCHA is still an incredibly lucid tool.
Researchers at Carnegie Mellon University realized they could exploit the explosively effective CAPTCHA system in a positive manner. The researchers took words scanned from old printed materials and included them into a new CAPTCHA system called reCAPTCHA. In the process of digitizing old reading materials like newspapers, historical documents and other valuable texts, computer programs cannot scan all words successfully.
Click here to read the rest of the article
The CAPTCHA system handles over 100 million CAPTCHAs every day, and reCAPTCHAs can be found in use on over 40,000 websites. Here’s a run down on how the system determines when a reCAPTCHAs word is ready to be submitted for final approval:
The identification performed by each computer program is given a value of 0.5 points, and each interpretation by a human is given a full point. Once a given identification hits 2.5 votes, the word is considered called. Those words that are consistently given a single identity by human judges are recycled as control words.
The researchers tested the system using a random sampling of 250 New York Times articles from different eras where the identity of every word was confirmed by two independent transcription experts. Each OCR software program managed about 84 percent accuracy but, when their results were combined with the reCAPTCHA system, the overall accuracy shot up to 99.1 percent. That’s actually within the bounds of professional transcription services that use two independent experts to generate copies that are then examined by a third party. The few remaining problems typically came when the OCR software missed word breaks.
The authors of the research study report that the system has some limitations. Shorter words are not as likely to be recognized correctly. Additionally, in countries where English is a second language and keyboards are in different languages, the accuracy and ease of use goes down.
A very positive feature of reCAPTCHA is that the everyday user feels good about helping a good cause. In my opinion, if I have to type a distorted word and waste a few brain cycles, the words might as well be important to preserve historical texts.