Jump to page content Jump to navigation

College Board

AP Central

Print Page
Home > Features > DNA and Computers -- A Marriage Made in Heaven

DNA and Computers -- A Marriage Made in Heaven

by Eli Hatchwell
Cold Spring Harbor Laboratory
Cold Spring Harbor, New York

Dramatic Progress in Molecular Genetics
To the customers quietly eating their lunch and drinking their beer at the Eagle -- a pub in the center of Cambridge, England -- the announcement that the secret of life had just been discovered must have come as somewhat of a surprise. Two young scientists, James Watson and Francis Crick,
had worked out what they believed was the structure of the most important molecule in nature, deoxyribonucleic acid (DNA). While much was known about the chemical composition of this molecule -- many structures had been proposed by other workers, notably Linus Pauling's suggestion of a triple helix -- the structure that Watson and Crick arrived at not only worked chemically, but it also made perfect biological sense. Their discovery was published in the April 25 issue of Nature in 19531 in a landmark paper that explained the overall structure of the molecule, and, in particular, how the bases of DNA paired up. The paper contained one of the greatest understatements in modern science: "It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material." Subsequent publications2 discussed the genetic implications implicit in the structure of DNA. The 1962 Nobel Prize in Physiology or Medicine was awarded jointly to Watson and Crick along with Maurice Wilkins, whose initial discovery paved the way for all of modern molecular genetics.

The last 50 years has been witness to dramatic progress in molecular genetics, and it is fitting that the "final" draft of the human genome sequence was announced in April 2003 to coincide with the anniversary of that first publication of the structure of DNA.

The three beneficiaries of the Nobel Prize are still alive and working, albeit in different areas. James Watson is president of the Cold Spring Harbor Laboratory in New York and is intimately involved with modern
molecular biology and genetics research; he had a great influence on the implementation of the human genome-sequencing project. Francis Crick switched to the study of consciousness and works at the Salk Institute for Biological Studies in San Diego, California. Maurice Wilkins, now 86, still teaches at King's College in London and is working on his autobiography. The fourth person whose work is popularly accepted as having been vital to the eventual discovery for which Watson, Crick, and Wilkins won their Nobel Prize was Rosalind Franklin, who tragically died in 1958 of ovarian cancer at the impossibly young age of 37. (Her cancer was most probably induced by the high doses of X-rays she was exposed to during her working lifetime, which paved the way for Watson and Crick's insight.) Nobel Prizes are not awarded posthumously, nor are they ever shared between more than three people, so it is uncertain how attribution of the discovery would have been apportioned had she lived.

A Third Way of Doing Science
The year 1953 was important in the field of computer science as well. It was also in that year that the idea of a "computer experiment" was born in the minds of three physicists. Enrico Fermi and his two Los Alamos Scientific Laboratory colleagues, John Pasta and Stanislaw Ulam, realized that for some seemingly intractable problems, pencil and paper would never work (even for the brightest mathematical minds) and that a useful approach might be to "allow" a computer to solve the problem. This approach represented a third way of doing science -- it lay somewhere between the real experiments of biology and the purely theoretical "experiments" of mathematicians and theoretical physicists. For their simulation, which was an initial attempt to gain an improved understanding of entropy (the tendency of all systems to become more disordered), they used the most powerful supercomputer then available, known as Maniac. They were initially interested in the way a group of 64 atoms would behave once the system was perturbed. In their simulation, atoms were connected to each other by chemical bonds that behaved in a nonlinear way, as occurs in nature. (Problems in linear dynamics, where a proportional relationship exists between cause and effect, have been tractable for centuries, but our mathematical framework does not allow for easy solutions of nonlinear systems.) Fermi and his colleagues expected that the simulation they had set up would result in a "white noise" of atomic movement. However, to their surprise, the system displayed an order that seemingly had come from nowhere. This observation should be viewed as somewhat of a milestone, in that it taught us two things:
  1. Some problems will likely never have clear-cut, explicit solutions, and insight about such problems will only be available through the use of simulations.
  2. The origins of order in the universe lie at a much deeper level than we can appreciate.
Why then this aside on the topic of nonlinear dynamics when we are supposed to be celebrating DNA? Because the two fields are related by more than just being discovered in the year 1953. The availability of the human genome sequence has allowed for dramatic advances in our understanding of the genetic basis of a range of disorders, from Huntington's disease to many types of cancer. In many cases, we have been able to find strong correlations between changes in the DNA sequence and the presence of an abnormal phenotype, whether at the level of the whole individual or at the level of the cell. Such knowledge, however, should only be considered very basic in the context of our overall goal of understanding how cells work and, eventually, being able to intervene in a predictable and robust way for the purpose of therapeutics. After all, DNA, while of prime importance in the transfer of information both vertically (generation to generation) and horizontally (cell to daughter cell), is actually a passive player. It does nothing apart from being copied. It is the elements whose generation is encoded in the sequence of bases that actually perform the "work" in the cell (these are mostly proteins, although ribonucleic acids -- RNAs -- are increasingly recognized as having a role other than mediation between DNA and protein synthesis). It is the functioning of tens of thousands of individual protein and RNA molecules in the cell that is our nonlinear connection.

No simple mathematical approach allows us to analyze even the behavior of a single protein molecule, let alone many thousands of distinct such elements. It is almost certain, moreover, that explicit mathematics will never be able to explain how a cell works, even in principle. The only approaches will be to attempt simulations in silico -- to use the methods that have been developed for the analysis of nonlinear dynamic systems. In fact, some now believe that such approaches may be the only way to look at the universe in general. In his recently published book A New Kind of Science, Stephen Wolfram3 proposes that, contrary to the way science has been conducted for centuries, the only hope for increased understanding of a range of natural phenomena, particularly biological phenomena, is to use rule-based algorithms. Such algorithms, he says, demonstrate emergent properties of order and may give insights, even though they are not tractable in the formal mathematical sense. These algorithms represent a "black box," in that little of meaning can be said about the internal workings of the computation. Wolfram likens the universe to a large computer rapidly performing calculations in order to evolve.

Using DNA as a Computer
One other connection between modern molecular biology/genetics and computing needs to be noted. While we understand the structure of DNA and its component bases in general, and we can synthesize DNA in the laboratory from basic chemicals, the complexity of the molecule is, in one sense, infinite. While only four bases are generally found in DNA (A, C, G, and T), it is the order of these bases that is critical and which endows the molecule with its enormous complexity and specificity. Consider a DNA fragment of very modest size, say 150 bases (the human genome has 3 x 10ˆ9 bases, a bacterial genome approximately 4 x 10ˆ6). The potential number of different molecules of this length is 4ˆ150 = 2 x 10ˆ90. The number of particles in the observable universe is "only" of the order of 10ˆ80, 10 billion times less than this number. The possible number of sequences of length 3 x 10ˆ9 can be considered infinite for all intents and purposes. In terms of information content, a solution of DNA molecules is much greater than the most densely packed computer chip available (or likely to be available in the future). It is for this reason, and because of both the rapidity and parallel nature of chemical reactions, that scientists have attempted to use DNA as a computer. The first success in this area was by Leonard M. Adelman in 1994.4 A mathematician by background, Adelman first realized the potential of DNA for this purpose when he spent a summer doing molecular biology. His initial demonstration was the solution, using DNA in solution, of a classical mathematics problem in graph theory (see Figure 1).

It is likely that the fledgling marriage between DNA and computing will be cemented in the future, as both DNA and computing come into their own, and computational approaches shed light on the complex workings of the cell.

Figure 1. The traveling salesman problem, or TSP for short, is as follows: Given a finite number of "cities" along with the cost of travel between each pair, find the cheapest way of visiting all the cities and returning to your starting point. For small numbers of cities, the solution to the TSP problem is trivial. However, with increasing numbers of cities, the amount of computational time required increases dramatically, making all but the "easiest" cases intractable, even with all the computing resources of the world for long periods of time. It is this problem for which Adelman provided a proof of principle when he performed the first DNA computation in 1994.
Eli Hatchwell is currently an investigator at Cold Spring Harbor Laboratory in New York. After attending medical school at Cambridge University and training in internal medicine and pediatrics, Dr. Hatchwell embarked on a Ph.D. in molecular genetics at Oxford University. A period specializing in medical genetics convinced him that better approaches were required for the genetic diagnosis of patients with complex syndromes, and he crossed the Atlantic with a view to developing technologies that ultimately might be applied to patients in the clinic. His work focuses on the detection of genomic abnormalities in individuals with complex, sporadic genetic disorders.

1 J. D. Watson and F. H. C. Crick, "Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid," Nature 171 (April 25, 1953): 737-738.
2 J. D. Watson and F. H. C. Crick, "Genetical Implications of the Structure of Deoxyribonucleic Acid," Nature 171 (May 30, 1953): 964-967.
3 Stephen Wolfram, A New Kind of Science, Wolfram Media, Inc., 2002.
4 Leonard A. Adleman, "Molecular Computation of Solutions to Combinatorial Problems," Science 266 (November 11, 1994): 1021-1023.

Back to top