What is DNA?
Deoxyribonucleic acid (DNA) is a nucleic acid that carries the whole biological information of organisms and viruses. DNA can protect the information for a long time. It can be likened to a prescription containing the information needed to produce other components of the cell such protein and RNA. Gen means that the DNA fragments containing genetic information.
One gram of DNA contains approximately 450 exabytes of data (1 exabytes = 1 million GB). When the human DNA sequence is transferred to the computer, it becomes a huge data cluster. This dataset contains the biological characteristics of humans. Eye color, hair color, neck length and hereditary diseases are included in this information.
DNA is found inside chromosomes. A set of chromosomes in a cell is called the genome. The DNA is found in the cell nucleus and also in the mitochondria in small amounts. The information encoded by the genome is contained in the genes and this information is called the genotype. The gene is inherited and is defined by DNA sequences that determine characteristics of the organism.
In most biological species, a small part of the sequence in the genome code the proteins. Only 1% of the human genome can code the protein. 50% of human DNA cannot be encoded and consists of repeating sequences. The reason for the presence of so many protein-encoding DNA in eucaryote genomes and differences in genome size is yet to be understood. Non-coding DNA sequences encode RNA molecules. For this reason, DNA sequences that do not encode proteins have effect indirectly on gene sequencing.
DNA Sequence
DNA sequencing means that the sequencing of the nucleotide bases (adenine, guanine, cytosine and thymine) in a DNA molecule.
Searching the DNA sequences has become very popular in many fields such as basic biology, biotechnology, forensic science, medical diagnosis. DNA sequencing has accelerated biological research and discoveries. Thanks to the rapid DNA sequencing made possible by modern DNA sequencing technologies, the human genome can be sequenced at the Human Genome Project. Full array of a large number of animal, plant and microbial genomes could be produced with similar projects.
Human DNA is sequenced and found a common reference by the human genome project. This DNA provide information that how much difference the test person has from the reference DNA. The family history of illness and the differences are examined to see if there is any inherited disease.
The genes causing the disease were found by comparison with the DNA from many subjects and the disease history of the subjects. This gene may have been altered by mutation or it may be inherited from the family.
The genes in a DNA sequence are as recipe. If a gene in the sequence is different from the recipe, the gene is regarded as a mutation. However, it should not be forgotten that every difference is not the cause of illness.
Linear discriminant analysis or basic component analysis is generally used when DNA sequences are mathematically processed. These algorithms use statistical methods. They have been developed for statistically reducing size on large data sets. This method allows to obtain the different gene by reducing the repetition in the DNA sequences.