Combinatorial Optimization in Biology




Quantitative technologies have, in recent years, taken on enormous importance for molecular biology. Some of the most fundamental problems facing biologists today, such as sequencing the Human Genome, rely extensively on the use of bioinformatics and computational techniques. Large gaps remain to be filled, however. Little attempt has been made to address some important optimization problems arising in a biologically-motivated setting, often because these problems have been thought to be computationally hard.

Surprisingly, many of them are not. Particularly for problems relying on a geometric formulation, the theoretical machinery often exists for finding algorithms that solve them in polynomial time. Our goal is, based on these theoretical insights, to develop such algorithms so that they can readily be implemented.

In the case of the Human Genome Project, we have already found that simple models of DNA sequencing procedures lead to polynomial-time algorithms that can improve sequencing efficiency substantially. We are now extending these algorithms to be able to accommodate additional experimental desiderata. This will enable full-scale automation of DNA sequencing and the elimination of a labor-intensive bottleneck that currently retards the process. Another optimization problem we are addressing in this context is that of pooling, or group testing. Pooling is the technique of choice for identifying a few ``positive'' clones from a large collection consisting mainly of ``negatives''. While designing efficient yet accurate pools appears at first sight to be computationally hard, experience with sequencing algorithms suggests to us that useful variants on the problem may in fact be solvable in polynomial time. We are working towards developing algorithms for constructing implementable pooling designs that will robustly yield the desired identifications, using a minimum number of pools.

LA-UR 00-14


  Last modified: 21 April 2001