Сообщения

Сообщения за ноябрь, 2017

What part of the archived web

Изображение
/ > the time Machine Internet Archive is the largest and well-known archive that preserves web pages since 1995. Besides him there are a dozen other services, which are also archived to the web: the indexes of search engines and industry-specific archives like Archive-It , UK Web Archive , Cite Web , ArchiefWeb , Diigo . Interesting to know how many web pages gets into these files, relative to the total number of documents in the Internet? It is known that the base of the Internet Archive for 2011 contains more than 2.7 billion URIs , many of them in multiple copies taken at different points in time. For example, the home page of Habra "photographed" already 518 times since July 3, 2006 . It is also known that the base reference Google five years ago overstepped the mark in trillion unique URLS , although many of the documents were duplicated. Google is not able to perform all URLS, so the company decided to count the number of documents in the Internet a

12 Atlassian Summit in San Francisco

Изображение
from 2009 onwards, the company Atlassian , a manufacturer known issue-tracker JIRA and Confluence enterprise wiki, holds an annual event called Atlassian Summit . This event aims to convey to users and partners the latest news about the development of the company and its products, just to listen to useful presentations and, of course, to communicate with each other. Traditionally, the Summit is being held in San Francisco, where is located one of the offices of the company, and this year I first visited there. Want to share with you news from the world of Atlassian, and just impressions. Got to the Summit, I can say, "for free". Last year, Atlassian is holding a contest Bamboo Task Master , which was to write extensions for Bamboo — the Continuous Integration system. And I (with the invaluable assistance of colleagues) won this contest with Bamboo VMWare Plugin , which allows you to start and stop virtual machines with the test beds before and after As

Migration from mysql to postgresql

Hello dear community! At some point in time faced with the need to transfer the database in django app from mysql to postgresql. The first two rounds to this problem were unsuccessful, but allowed to deal with the data integrity, to eliminate problems for manage.py syncdb and manage.py migrate . The first time we tried to move the base through the conversion of sql-sheets in the dialect of postgresql. On the second pass we tried to migrate using ./manage.py dumpdata , but always got error with the keys, invalid data (our database had a lot of manual changes). Between the second and third time it took a lot of time, and the last gugleniya on the issue led me to this article . Mentally I was already prepared to analyze and parse line by line footcloths sql/yml weighing under GB, were blanks for this process... and I decided to try and go for the simplicity of the process. And that Poehl (all were performed in a virtualenv, postgresql was created empty databas

The study of statistics of search queries allows to detect previously unknown side effects of drugs

Изображение
Using data from search engines Google, Bing and Yahoo in 2010, a group of researchers from Microsoft Research, Stanford and Columbia University confirmed the ability to detect side effects of drugs by analyzing the information from the logs of search engines for General purpose. To check the information about the fact that the combined use of two drugs — paroxetine and pravastatin may lead to hyperglycemia . This became known only in 2011, that is, in 2010 no information about this network could not be. The researchers analyzed the frequency of the search terms associated with symptoms of hyperglycemia among users who previously searched for online information about these two drugs. It turned out that the frequency of search queries about the symptoms of hyperglycemia is much higher in those who were looking for information on both drugs than those who searched for just one of them. The diagrams show that the difference was tangible throughout the years and is not some tempor

What questions can be answered by analyzing 1 500 000 unique histories?

Изображение
is There a link between asthma and schizophrenia? Diabetes and bipolar personality disorder — can they have anything in common? Can you identify such a non-trivial connection, the analysis of the database on 1500000 patients of the United States? warning: under the cut a lot of text The article is written on materials of the report "Autism and Mendelian disease" Geckogo Andrei panibratov at the First international conference "Autism. Challenges and solutions". More about him and analyzing data Andrey Rzhetsky Andrew Riecke — Professor of medicine and human genetics in Institute of genomics and systems biology, University of Chicago. He is also Director of the CONTE Center for genome bioinformatics in the field of neuropsychiatric diseases. A. Riecke graduated from Novosibirsk state University, defended candidate dissertation at the Institute of Cytology and genetics in Novosibirsk. In 1991, as a post-doctoral research fellowship went to the United

Google: alternatives to the search giant

Изображение
Google is constantly working on innovations and improvements to their services. Only in the last few weeks the company has added the ability to automatically correct images when uploading to Google+, launched a network of balloons to provide Internet access in areas with poorly developed infrastructure has opened access to a new subscription service to music and updated service card. But while the company reports on the recent income and achievements, investors are only interested in one thing — income from contextual advertising company Among all innovations of Google, the search engine remains one of the most reliable sources of income of the company. In June, more than 90% of searches on computers and over 92% of queries on mobile devices in the UK were on Google. These numbers can be envied by any company. At the same time in the U.S. market, the average percentage below average and is about 78%. There are problems in the markets of China, Russia and South Korea.

Google refused to remove the home page The Pirate Bay

Изображение
The Pirate Bay is one of the few torrent trackers on the main page which has no links to pirated material. However, this does not stop copyright holders from trying to remove it from the search index. To the credit of Google, she strictly observes the letter of the DMCA and refused to comply. For example, a Corporation BPI (British Recorded Music Industry) is one of the most active "partners" in removal of pirated content from search results. Today she sent already 229 705 such queries that contain a total of more than 32 million pages to remove from the index. Last week, the BPI sent another request with 2000 URLS, among which were the title page "the Pirate Bay". In the list for removal was the title page thepiratebay.sx is one of the many mirrors a popular torrent tracker. According to the report by Google, but this page was the only one that has not taken any action (No actions taken), that is, the company refused to comply with BPI and requ

Science writer Clive Thompson believes that technology makes us dumber

Article liked in the first place as the carrier of alternative views. I don't know how who is right in this matter, but to see his point of view was interesting. — approx. interpreter Clive Thompson thinks that the use of, for example, Google is unlikely to have such a detrimental effect on our memory, as assumed in our time. Mr. Thompson is a science and technology writer, author of "Smarter than You think: how technology is changing our minds for the better." Also, sometimes he writes for the new York times. Based on the science in the book Thompson argues that the current transformation of society in the digital makes us smarter, and not Vice versa. Under the cut is an edited version of an interview with Clive. V. do You believe that technology makes us smarter? A. Yes. I think we need to think more socially. I'm talking about the ability to take our thoughts out of my head and compare them with the thoughts of other people, and to do it publi

Ready to build a news portal NewsModxBox

Изображение
Two years ago, I wrote about the ready build an online store on MODX Revolution . This time it was made quite a few online stores on ShopModxBox (most of them forces third-party developers), and today the number of installations of the engine is 200-300 pieces per month. The project continues to evolve, getting to the core of a new useful functionality. The main qualities that we note in ShopModxBox is the high performance, flexibility and minimal code (ShopModxBox is a solution based on the framework, MODX Revolution, and his own part of the code is literally 3-5 thousand lines of php + Smarty templates). So the other day we released a new build NewsModxBox. Here, the basis is the same as in ShopModxBox, but the logic is tuned for news portals and media. The Assembly is sawn for real and is not a small news portal and largely corresponds to realities of the business logic of electronic and mixed media. That, for example, is in NewsModxBox? the a custom editor artic