Although the new project will not bring direct revenue, Yahoo leads an alliance as it could give its search feature more visibility An unusual alliance of corporations, non-profit groups and universities plans to announce today an ambitious plan to digitise hundreds of thousands of books over the next several years and put them on the Internet, with the full text accessible to anyone.
The effort is being led by Yahoo, which appears to be taking direct aim at a similar project announced by its arch rival, Google, whose own programme to create searchable digital copies of entire collections at leading research libraries has run into a series of challenges since it was announced nine months ago. The new project, called the Open Content Alliance, has the wide-ranging goal of digitising historical works of fiction along with specialised technical papers.
In addition to Yahoo, its members include the Internet Archive, the University of California, the University of Toronto, as well as the National Archive in England among others. The digitisation of print material has been a continual effort on the part of various research libraries for the last several years. But the potential power of the new collaboration lies in the collective ability of many institutions to compare and cross-reference materials, said Daniel Greenstein, librarian for the California Digital Library at the University of California. “This is the kind of platform we’ve been looking for for a long time,” said Dr Greenstein. “Libraries digitise their stuff and put it up, but none of the libraries have comprehensive collections of everything. ”
The Library of Congress, for instance, has one of the largest library collections in the world, but even that collection is incomplete. “It’s all about gap-filling and collection development,” said Greenstein. Although the new project will not bring direct revenue to Yahoo, it could give the company’s search feature more visibility. The announcement also establishes a new round in the battle between Yahoo and Google over index size — the number of documents that can be found in a search engine’s database.
Yet the new project’s approach differs from Google’s in several ways. Once a book has been digitised, Yahoo will integrate the content into its index and provide an engine for the group’s website (opencontentalliance.org). “As soon as it’s made available on the OCA website, we’ll get a feed letting us know, so it can be indexed by us immediately,” said David Mandelbrot, v-p at Yahoo.
In a departure from Google’s approach, the Open Content Alliance will also make the books accessible to any search engine, including Google’s. (Under Google’s program, a digitised book would show up only through a Google search.) And by focusing at first on works that are in the public domain — such as thousands of volumes of early US fiction — the group is sidestepping the tricky question of copyright violation, Indian Express reported.