links for 2008-11-03
-
"[…]Last week, 22% of visits from Google Book Search went to an Education website, with Worldcat (a website that allows users to search library catalogs) the #3 downstream website overall and the #1 Education website.
Google Book Search receives most of its traffic from Google (82%) and last week was the #19 website visited after Google, accounting for 0.03% of all US Internet visits."
-
"[…]“As we understand it, the settlement contains too many potential limitations on access to and use of the books by members of the higher education community and by patrons of public libraries,†Darnton wrote.
“The settlement provides no assurance that the prices charged for access will be reasonable,†Darnton added, “especially since the subscription services will have no real competitors [and] the scope of access to the digitized books is in various ways both limited and uncertain.â€
He also said that the quality of the books may be a cause for concern, as “in many cases will be missing photographs, illustrations and other pictorial works, which will reduce their utility for research and education.â€
Harvard was not named in the lawsuit by the publishers because it has only allowed Google to digitize its uncopyrighted works[…]"
-
"[…]Under the agreement, 20% of any work not opting out will be available freely; full access can be purchased for a fee. That secures more access for this class of out-of-print but presumptively-under-copyright works than Google was initially proposing. And as this constitutes up to 75% of the books in the libraries to be scanned, that is hugely important and good. That's good news for Google, and the AAP/Authors Guild, and the public. (My favorable views about the AAP at least are not, of course, reciprocated.)
It is also good news that the settlement does not presume to answer the question about what "fair use" would have allowed. The AAP/AG are clear that they still don't agree with Google's views about "fair use." But this agreement gives the public (and authors) more than what "fair use" would have permitted. That leaves "fair use" as it is, and gives the spread of knowledge more that it would have had.[…]"
-
"[…]In the past, scanned documents were rarely included in search results as we couldn't be sure of their content. We had occasional clues from references to the document– so you might get a search result with a title but no snippet highlighting your query. Today, that changes. We are now able to perform OCR on any scanned documents that we find stored in Adobe's PDF format. This Optical Character Recognition (OCR) technology lets us convert a picture (of a thousand words) into a thousand words — words that can be searched and indexed, so that these valuable documents are more easily found. This is a small but important step forward in our mission of making all the world's information accessible and useful.[…]"
-
"[…]Using optical character recognition (OCR) technology, Google's search engine now can convert scanned PDF documents into text that can be searched and indexed, the company said. Thus, government reports, academic papers and other scanned documents can now show up in search results. Search engines generally interpret PDF documents as images of text rather than text.[…]2
-
"[…] This experiment is part of Google's broader effort to increase its coverage of the web. In fact, HTML forms have long been thought to be the gateway to large volumes of data beyond the normal scope of search engines. The terms Deep Web, Hidden Web, or Invisible Web have been used collectively to refer to such content that has so far been invisible to search engine users. By crawling using HTML forms (and abiding by robots.txt), we are able to lead search engine users to documents that would otherwise not be easily found in search engines, and provide webmasters and users alike with a better and more comprehensive search experience."
-
"Elsevier Labs is inviting creative individuals who have wanted the opportunity to view and work with journal article content on the web to enter the Elsevier Article 2.0 Contest. Each contestant will be provided online access to approximately 7,500 full-text XML articles from Elsevier journals, including the associated images, and the Elsevier Article 2.0 API to develop a unique yet useful web-based journal article rendering application. What if you were the publisher? Show us your preference![…]"
-
"[…]Institutional subscriptions to millions of additional books: Imagine never having to ask a patron to wait until a book is returned or arrives through inter-library loan. Beyond the free license, libraries will also be able to purchase an institutional subscription to millions of books covered by the settlement agreement. Once purchased, this subscription will allow a library to offer its patrons access to the incredible collections of Google’s library partners from any computer authorized by the library. In addition, for our Library Partners that contributed books to the project, Google will either pay for a discount to the subscription based upon the number of books they contribute, or provide a free subscription for their institution that contains the books scanned from their library that are included in the full institutional subscription offering.[…]"
-
""Ce règlement historique est bénéfique pour chacun d'entre nous. De notre point de vue, l'accord permet de créer un cadre innovant pour l'utilisation d'Å“uvres sous droits d'auteur, dans un monde en plein essor numérique. Il offre également aux lecteurs un accès plus large à une mine de livres rares et difficiles à trouver, et constitue, pour les éditeurs, un modèle commercial attirant, garantissant au détenteur des droits d'auteur un contrôle et un choix sans précédent." -Richard Sarnoff, Président de l'Association des éditeurs américains[…]"