The Needleman-Wunsch Algorithm
Summary
The Needleman-Wunsch algorithm is a dynamic programming algorithm used for global alignment of two sequences, finding the optimal alignment by maximizing the number of matches and minimizing the number of gaps
Use Case
The Needleman-Wunsch algorithm can be used in bioinformatics to align two DNA or protein sequences, helping to identify similarities and differences between them
Steps
- Initialize a 2D matrix to store the scores of the alignments
- Fill in the matrix by comparing the characters of the two sequences and calculating the scores based on matches, mismatches, and gaps
- Trace back the matrix to find the optimal alignment
Complexity
The time complexity of the Needleman-Wunsch algorithm is O(n*m) and the space complexity is O(n*m), where n and m are the lengths of the two sequences
Code Example
def needleman_wunsch(seq1, seq2, match=1, mismatch=-1, gap=-1):
m, n = len(seq1), len(seq2)
score_matrix = [[0 for _ in range(n+1)] for _ in range(m+1)]
for i in range(m+1):
score_matrix[i][0] = gap * i
for j in range(n+1):
score_matrix[0][j] = gap * j
for i in range(1, m+1):
for j in range(1, n+1):
match_score = score_matrix[i-1][j-1] + (match if seq1[i-1] == seq2[j-1] else mismatch)
delete_score = score_matrix[i-1][j] + gap
insert_score = score_matrix[i][j-1] + gap
score_matrix[i][j] = max(match_score, delete_score, insert_score)
alignment = [''] * (m + n)
i, j = m, n
while i > 0 and j > 0:
score = score_matrix[i][j]
score_diagonal = score_matrix[i-1][j-1]
score_up = score_matrix[i-1][j]
score_left = score_matrix[i][j-1]
if score == score_diagonal + (match if seq1[i-1] == seq2[j-1] else mismatch):
alignment[i+j-1] = seq1[i-1]
alignment[i+j] = seq2[j-1]
i -= 1
j -= 1
elif score == score_up + gap:
alignment[i+j-1] = seq1[i-1]
alignment[i+j] = '-'
i -= 1
else:
alignment[i+j-1] = '-'
alignment[i+j] = seq2[j-1]
j -= 1
while i > 0:
alignment[i] = seq1[i-1]
i -= 1
while j > 0:
alignment[j] = seq2[j-1]
j -= 1
return ''.join(alignment[m:]), ''.join(alignment[:m])