Monday, 8 September 2008
Harnessing the power of internet users to digitize books
Luis von Ahn, the CMU professor who developed CAPTCHA (the authenticating method that has you type the letters that you see) has also developed reCAPTCHA, a new form of CAPTCHA that also helps digitize books, according to a press release from CMU. In reCAPTCHA, the words displayed to the user come directly from old books that are being digitized; they are words that OCR could not identify and are sent to people throughout the Web to be identified. Von Ahn said reCAPTCHAs are being used to digitize books for the Internet Archive and to digitize newspapers for The New York Times. Digitization allows older works to be indexed, searched, reformatted and stored in the same way as today's online texts.