What questions can be answered by analyzing 1 500 000 unique histories?

is There a link between asthma and schizophrenia?
Diabetes and bipolar personality disorder — can they have anything in common?
Can you identify such a non-trivial connection, the analysis of the database on 1500000 patients of the United States?

warning: under the cut a lot of text

The article is written on materials of the report "Autism and Mendelian disease" Geckogo Andrei panibratov at the First international conference "Autism. Challenges and solutions". More about him and analyzing data
Andrey Rzhetsky
image
Andrew Riecke — Professor of medicine and human genetics in Institute of genomics and systems biology, University of Chicago. He is also Director of the CONTE Center for genome bioinformatics in the field of neuropsychiatric diseases. A. Riecke graduated from Novosibirsk state University, defended candidate dissertation at the Institute of Cytology and genetics in Novosibirsk. In 1991, as a post-doctoral research fellowship went to the United States.
Research interests:
1) bioinformatics and phylogenetics applied to the analysis of genes, proteins, molecular pathways;
2) application of statistics to the analysis of sequences and analysis of molecular networks;
3) development of algorithms and software for analysis and comparison of metabolic pathways and sequences to phylogenetic reconstruction.
As a mathematician, a biologist and theorist, Andrew Riecke is a leading expert in the development of new bioinformatic approaches to the analysis of biological systems and diseases. The scientist is a pioneer in the development of strategies bioinformatic mapping of diseases through an integrated analysis of genetic data.
Andrey is so famous in the US, in Google there are even a few hints search with his name:


Autism
Autism is a developmental disorder of the nervous system, which is reflected by difficulties with social interaction and communication and restricted and repetitive behavior. In accord with diagnostic criteria, symptoms of autism should be apparent in children up to three years. Autism affects information processing in the brain, changing the order of organization and connectivity of the synapses of nerve cells. How this happens is not yet entirely clear.
Rough translation Ango-Wiki

Mendelian diseases
Mendelian disease traits (Mendels diseases, traits): disease or signs resulting from expression of a single gene, which has a large effect on the phenotype. Are inherited according to Mendel's laws. Examples of Mendelian diseases: cystic fibrosis, sickle cell disease, Huntington's disease (Huntington's) and hemophilia
from Internet

the

Abstract


In biology has accumulated a huge data that can be processed by computer. Andrei's group Geckogo took to process the data about mental health disorders. They treat not a single array of data, be it genetic, environmental factors or clinical outcomes, and all the data together, and it gives a more complete picture of the causes of disorders.
In 2004, the group A. Geckogo received a grant from Autism Speaks, on a bilateral analysis of autism (as a biological process and developmental disorders), using the rich information accumulated in several related areas. The group gathered information about the molecular interactions in human neurons and with the help of its unique (GeneWays system) considered a wide range of disorders with which autism shows non-random Association (neurological, autoimmune, metabolic and many other groups of disorders, which have a strong hereditary component).

On the chart (below) shows the correlation of some common diseases. Red lines — positive correlation, blue — negative. The thickness of the line — correlation value. The size of the circle corresponds to a sample of patients (from 20 to 136 thousand).

the

Autism and Mendelian diseases


His presentation Dr. Riecke opened the slide with a familiar scene from Russian film about Sherlock Holmes. This is no accident: it is the phenomenon of Holmes, succeeded in the detective business through attention to detail, insignificant to most normal observers, inspiring Geckogo, as well confident that it's the little things can determine a biological puzzle and help pick them the keys.
He uses a metaphor: illness, crime, data, evidence.
The purpose of the study: building the model which gives the result (a finding of "criminal" — the cause of the disease)

There are two symbolic images: the Hedgehog and the Fox. The Fox knows lots of little tricks, the Hedgehog only one reliable trick.
In the book “the Signal and the noise” Nate silver (Nate Silver) — analyzes the many scientific predictions. And if you look at what the predictions work, what fail, the “Foxes” predict better than “hedgehogs”.

The problems of work with statistical data that there are two approaches, comparable with the religious.
The Bayesian approach allows to explain how much we can believe the results and to make assumptions in quantitative terms.
The problem of building a reliable model, that it is necessary to combine data phenotype+genome+environment to get the model with useful predictions. For example, one that will be able to analyze a predisposition to a specific disease in a child.

So, we decided to perform a multitude of diseases. Why? Because classification of diseases is largely artificial.In fact, autism, certainly is “a container of disease” — with different reasons genetics.

A small digression: Churchill, Martin Luther king, Gerneral Sherman, Roosevelt, Kennedy, Gandhi
what they have in common (except that they are famous and the dead)?
answer here
Common the fact that they had bipolar depression personality disorder (manic depression). About his state of apathy Churchill spoke as about “the black dog of depression”.
Affective disorder common to many successful politicians.

What is the phenotype of autism: I wonder what else Has Asperger's formulated from selected groups “inability to develop social skills,” “absorption in small details, also drew attention to the “awkward movement”. Autistic children he called “little professors,” we allocated as the criteria of autism so far.
A little bit of autism is essential for success in science. We don't know exactly what scientists in the past had autism (I suspect Newton and Tesla), but many scientists were schizophrenia and bipolar disorder.

In the book “Invisible plague” that is approved for 260 years the incidence of neurological and mental diseases increased (Treated a multitude of direct and indirect data).
The question of whether we see an increase in cases of autism is very sharp: some believe. the increase is, others say no.
Center diseases provides the following statistics on autism: 1:80 boys, 1:240 were girls.
Korean study: there was an attempt to make full whole population. Trolled through almost all of the children in South Korea and found that cases of autism is much greater and the frequency of diseases increases. According to them, the autism 4% of boys and 1.5% girls.

Why when we talk about statistics and analysis may exist, such a different point of view?
Causes:
    the
  1. changing diagnostic criteria;
  2. the
  3. different doctors can diagnose.

However, according to Ruccolo Andrew diseases such as autism, still increasing in frequency.

What do we need to build a plausible model for autism? We simulated the environment and the genome as random variables. For example infection there is no random variable, and changes in the genome — is also random genetic variable. Take R1 and P2 as two phenotypes (e.g., autism and diabetes, autism and schizophrenia) and they bound to be “common factors”. And we can build many models, where R1 crosses/repress P2 in factors in the environment or in the genome or phenotype.
The problem is that all existing models of the genotype-phenotype now is very simple, and is not suitable for describing such complex disorders as autism. But the models that also included the environment in General there.

In addition, we don't know HOW to model, we don't know WHAT should be included in the model:
Donald Rumsfeld (Secretary of defense) said: "There are things we know we know. There are things we know that we do not know. But there's also something about what we don't know what we don't know."
We also identify three types of factors: “Known known” is well studied and always consider the factors “known unknowns” — insufficiently studied factors, but they fall under suspicion, as being able to influence the outcome and the “Unknown unknowns” — factors affecting the process which we study, but about which we do not know and do not even suspect about them and their existence.
Example of the relationship genotype-phenotype-environment:
Genotype: recessive mutation in the X chromosome.
Phenotype: protein deficiency of coagulation factor VIII (Hemophilia A)
Environment: for the treatment is taken the blood of hundreds of thousands of people.
The result: More than 80% of gemofilitikami in the United States suffer from AIDS and hepatitis. (because once donors are not tested for these diseases)

When the environmental factors are obvious:
Obesity in the US: too fast a growing number of people are overweight to explain this factor to the genome, because the increase has happened over a generation or two:
image

What is the impact of environment on autism? While insufficient data.
To add to the model “the known unknown” interviewed many parents.
This is not the cause of autism are factors that must be considered. For example: my mother lived on the edge of a corn field, a field treated with pesticides and that could have an impact. Or another factor: infectious disease, high temperature, and then regression (loss of child speech, coordination of movements). All factors must be considered when modelling, you can't shy away from them.
Vaccination — the battlefield in question causes autism or not. Tested the hypothesis that only vaccination causes autism. This hypothesis was rejected (although the study raises many questions). But remains unexplored combination of factors: genome+vaccination, and such a theory can be right.
Together with James Evanson (James A. Evans) investigated the factors that should be included in a genetic model of autism. Interviewed a number of scientists dealing with the problems of autism. Expect to find a lot of positions of agreement and areas of disagreement, but found an ocean of differences with small Islands of unity.
Therefore, the model has included all possible factors.

As actually performed genetic test?
The task is simple when it is necessary to compare one chromosome, then it is easy to find matching distorted plot that leads to the disease. But when such sites are not one, when a few chromosomes, then the task becomes much more complicated. In humans, about 20,000 genes. If you just look for changes associated with autism for any combination of genes, the number of possible combinations
genes for 3 — 10^12
for 10 genes 10^37 — ie not enough population to collect data for analysis.
As you can see, what worked for one gene does not work for many.

The way out is to compile a map of functional relationships of genes and proteins. Where to get such a map? Laboratory of Andrei Yurevich analyzed tens of thousands of articles in scientific journals, to determine these relations.

Fortunately, the genes that we are looking for should be located close to the functional space is well analyzed, reliable pattern. So, we searched not all in a row, but only those where the greatest correlation between genome and phenotype.
Why is taken for the analysis of Mendelian disease? They are well researched and known that is responsible for them specific locations in the genome.
Color coding of Mendelian diseases in the future visualization


When we carried out the analysis for several diseases, it was found that the same molecular network overlap multiple illnesses.
Example of covert communication:
the
image image

Jodie foster and Ronald Reagan — what is common between them?
Don Hinckley trying to impress jodie foster attempt on Ronald Reagan

Phenotypes can be compared with well-known personalities, genotype – hidden connections between them. If we observe the sequence of phenotypes: whether it is possible to draw conclusions about genetics? Yes, for simulation you can do.

Data:


1500000 unique patient records, coded according to the ICD-9 diseases during the whole life of the patient. As these data are used to determine the amount of compensation insurance in the United States, they are imperfect. But given their enormous volume, it would be criminal not to perform them.
Using the threshold model to describe how genetic diseases develop in the phenotype, it is possible to estimate genetic relationships with complex phenotypes of diseases (such as autism). Red fin — very strong communication. Prediction: autism is a common genetics with weight related diseases. The analysis: obvious significant associations of autism with infectious diseases and with many diseases of the nervous system.
Correlation Mendelian diseases, autism, bipolar disorder and schizophrenia


Finally, the graph below shows the correlation of some often meeting diseases in the database 1500000 patients. Red lines — positive correlation, blue — negative. The thickness of the line — correlation value. The size of the circle corresponds to a sample of patients (from 20 to 136 thousand).


During the lecture the Professor showed a correlation table of complex diseases and Mendelian diseases of unpublished works where the analysis of already 10 000 000 (Yes, 10 million) unique cards diseases:


Conclusions


Proven overlapping fragments of the genome for various diseases
Every complex disease has a genetically related set of Mendelian diseases.
I hope that you are all asleep :)

the

warning


If you have interesting developments in the field of search links, you do a comparison of data arrays if you are doing genetic research, the laboratory Geckogo Andrei Yurevich interested in the broad and mutually beneficial cooperation.
Contact them! (links at the bottom of a topic)


Acknowledgements:
Thank company "Itek" where I work, my managers Balytsky Yuri Kalashnikov of the Novel for giving "time off" for three days in the hot season to our service of technical support.
A professional community of practices "Preventive medicine" thank you for the first international conference on autism, in which we could hear a wonderful report Geckogo A. Yu.
I Express my sincere and deep gratitude to the Foundation "child with future z pay" and personally Inna Sergienko and Larisa Rybchenko, head BF the Association of parents of children with autism — Eugene Panichevskaya. Thank you for your trust and the opportunity to rearrange all of you at the 1st Moscow international conference "Autism: challenges and solutions".
I Express my gratitude to the Director of Fund "Exit" Evgenia Mishina, provide invaluable material and moral help, and to you, my wonderful Svetlana Moiseeva and Ala Yanushevich, thanks to which I stayed the night at the station. And of course to all who organized it and volonteri: Catherine Men, Jan Zolotovitskii from the Center for autism and everyone else.

Selected publications A. Geckogo:

the
    the
  • Iossifov I, Zheng T, Baron M, Gilliam TC, Rzhetsky A. (2008) Genetic-linkage mapping of complex hereditary disorders to a whole-genome molecular-interaction network. Genome Res. June 3.
  • the
  • Feldman I, Rzhetsky A, Vitkup D. (2008) Network properties of genes harboring inherited disease mutations. Proc Natl Acad Sci U S A. 105, 4323-4328.
  • the
  • Rodriguez-Esteban R,Rzhetsky A. (2008) Six senses in the literature. The bleak sensory landscape of biomedical texts. EMBO Rep. 9, 212-215.
  • Yao L &Rzhetsky A.(2008) Quantitative systems-level determinants of human genes targeted by successful drugs. Genome Res. 18:206-213. the

  • Rzhetsky, A., Wajngurt, D., Park, N. &Zheng, T. (2007) Probing genetic overlap among complex human phenotypes. Proc. Natl. Acad. Sci. U S A. 104, 11694-11699.
  • the
  • Cokol, M., Rodriguez-Esteban, R. & Rzhetsky, A. (2007) A recipe for high impact. Genome Biol, 8, 406.
  • the
  • Cokol, M., Iossifov, I., Rodriguez-Esteban, R. & Rzhetsky, A. (2007) How many scientific papers should be retracted? EMBO Rep 8. 422-423.


Links:


One of the books written Riecken A. U. in collaboration with Zharkikh A. A. during the Soviet era: "a New approach to the reconstruction of phylogeny based on the analysis of many gene families": books.google.com.ua/books/about/%D0%9D%D0%BE%D0%B2%D1%8B%D0%B9_%D0%BF%D0%BE%D0%B4%D1%85%D0%BE%D0%B4_%D0%BA_%D1%80%D0%B5%D0%BA%D0%BE%D0%BD%D1%81.html?id=RTPGHAAACAAJ&redir_esc=y
Website of Andrei Yurevich: www.ci.uchicago.edu/research/rzhetsky
Andrew Riecke in the directory "Biomedexperts" www.biomedexperts.com/Profile.bme/1652205/Andrey_Rzhetsky
Articles on the research results:
Network properties of genes harboring inherited disease mutations www.pnas.org/content/105/11/4323.full
Probing genetic overlap among complex human phenotypes www.pnas.org/content/104/28/11694.full
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

Automatically create Liquibase migrations for PostgreSQL

Vkontakte sync with address book for iPhone. How it was done

What part of the archived web