The library of Congress continues to archive all tweets for 2006-2012

Two years ago, the Director of communications of the Library of Congress announced about the plan to make an archive of the entire Twitter since March 2006. Already at that time (March 2010) it was a very large amount: and then on Twitter were published 55 million messages per day and the total database size since the founding of the site was measured in terabytes.

/ > But that was only the beginning. By the summer of 2012, the year the traffic on Twitter has grown to 400 million messages a day, and the Library of Congress never reached the promised archive with full-text search. In this regard, some people began to doubt that the task of the forces librarians. Last week there were rumors that they secretly we abandoned the ambitious project. Actually it's not.

Journalists Nieman Journalism Lab took interview Jennifer Gavin (Jennifer Jones), who heads the project on Twitter archiving at the Library of Congress. She says that the plans are still in effect, simply "a good librarian never in a hurry", that is, they are not going to provide your services at the same pace, which works on Twitter.

Of course, the task was much harder technically than it seemed at first. "The process of developing technical specifications is still ongoing, but we are already much closer to its end,' said Gavin. I can't announce a specific date when we're ready to announce it officially". Now define the criteria how to sort the source data by keywords, time, etc. the Developers still have not decided what should be the system UI.

"Last year we started partly to get material from the company Twitter. Now we get it almost daily. It is very large volumes of data," said Gavin. While there is a six-month embargo on fresh tweets archiving. Under the terms of the agreement with the company, this database should be available only for non-commercial vnutribolnichnogo use and conservation. The system will be available only for registered library patrons for library cards.
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

Vkontakte sync with address book for iPhone. How it was done

Automatically create Liquibase migrations for PostgreSQL

What part of the archived web