Biological Sequence Alignment Using Graphics Processing Units

My thesis subject was the implementation and evaluation of a GPU accelerated protein database search tool, using the Smith-Waterman optimal sequence alignment algorithm.

Update: a paper has been published on my thesis material.

Thesis abstract:

Alignment algorithms are used to find similarity between biological sequences, such as DNA and proteins. By aligning a sequence with a database, similar sequences can be found. These can be used to identify the source of a query sequence, to find commonalities between organisms, or to infer an ancestral relation. Various methods of performing biological sequence alignment exist, including dynamic programming and heuristic methods. Dynamic programming methods are guaranteed to find all optimal alignments, but are relatively slow; heuristic methods are faster but less precise. This thesis investigates the acceleration of one such optimal algorithm, the Smith-Waterman local sequence alignment algorithm, by using graphics processing units (GPUs). A fully functioning GPU-based protein database search tool was designed, implemented and optimized. The optimizations mostly concern the elimination of memory bottlenecks and the conversion of the database to a format well suited for GPU use. The final implementation offers the same features its CPU-based counterparts do, such as user configurable scoring and substitution matrix settings, and includes a web interface for convenient and remote usage. The performance of the GPU accelerated implementation was evaluated and compared to other solutions. It was found to attain a performance of more than 21 GCUPS (giga cell-updates of the Smith-Waterman score matrix per second) when searching the October 2010 release of Swiss-Prot on an NVIDIA Geforce GTX 275 GPU. With this performance, it is the fastest known GPU implementation on comparable hardware. It is also faster than the BLAST heuristic. However, the cost of purchasing a GPU, its power consumption, and the relative difficulty of maintaining a GPU software product are disadvantages of GPU acceleration.

The implementation has been named GASW, for 'GPU Accelerated Smith-Waterman'. Features:


Thesis - Introduction to biological sequence alignment, GPU acceleration thereof, the implementation and how it was optimized, benchmark results.
User's guide - Explains how to use the tool, limitations, setting up the web interface, and building everything from source.
Program distribution - Binaries and source code (GPLv3). Includes sample data files.

Benchmark results on a machine running a 2.4 GHz Intel Q6600 CPU, 4 GB RAM, NVIDIA Geforce GTX 275 with 896 MB of memory and Windows 7.
The command-line tool.
web interface
Web interface.
Created: Feb 23 2010
Modified: Sep 23 2011