Archivalia: Mass digitization of books

Mass digitization of books

From http://www.stoa.org/?p=535

Coyle, Karen. “Mass Digitization of Books” The Journal of Academic Librarianship 32(6)(November 2006): 641-645

Excerpt:

Scoping the Mass Digitization Project

There are two assumptions that are often made about mass digitization. The first is that you can digitize everything, and the second is that you can save money by not digitizing the same item more than once. For the first assumption, libraries will find that some items are either too fragile to be put through the mass digitization process, or are too far from the norm to be suitable to that process. Some books will be too large or too small; others will have odd-sized plates or folded maps that will need special handling. So digitizing an entire library will require some mass digitization and some special digitization projects.

The other part of the “digitize everything” goal is the desire to create at least one digital copy of every book available in any library. Google and the OCA are beginning this process by focusing on some large libraries in the Western world with impressively broad collections. How much of the world's literature will be digitized in this way? A statistical study of the five original Google Book Search collections27 shows that at the end of this project Google will have digitized 33 percent of the items in OCLC's WorldCat. The most important revelation from this study is that 40 percent of the items in WorldCat are held uniquely by only one institution. The long tail of the Google Book Search project will require involving many hundreds or thousands of libraries if they really intend to create an index to all of the books on library shelves today.

The second assumption is that time and money will be saved by keeping a registry of digitized books so that the work is not duplicated by other libraries.28,29 In the arena of mass digitization, this assumption is being challenged by some with the argument that it may be more economical to scan a full shelf of books than to determine if a true duplicate exists elsewhere. This is in part because of the difficulty of defining “same” in a world with many similar but not identical editions. It is also because the mass digitization process may not produce true duplicates due to the error rate of OCR programs, and because of differences in decisions made at the time of scanning.

Conclusion

Although a significant number of large research libraries are engaging in mass digitization projects, other than the Google Book Search, which is available today, we have little idea how the digitized books will be used. There are many questions that need to be answered, such as: who does this digitized library serve? How does it serve users? How will the system respond when there are ten million books in a database and a user enters the query “civil war”? (Note that Google has not yet determined how it will create an ordering principle for books.) Will some users read these books online in spite of the relative inconvenience of their formats and the computer screen's technology? Will it be possible to use the digitized pages to produce something more e-book like?

Google has clearly stated that their book project is solely aimed at providing a searchable index to the books on library shelves. They are quite careful not to promise an online reading experience, which would increase the quality control effort of their project and possibly make rapid digitization of the libraries impossible. Library leaders are enticed by the speed of mass digitization, but seem unable to give up their desire to provide online access to the content of the books themselves. If mass digitization is the best way to bring all of the world's knowledge together in a single format, we are going to have to make some reconciliation between the economy of “mass” and the satisfaction of the needs of library users.

KlausGraf - am Donnerstag, 21. Dezember 2006, 03:21 - Rubrik: English Corner

Kommentar verfassen