Code
最小编辑距离
现在有两个字符串,我们怎样衡量它们之间的"相似度"呢?比如naxt和exnt哪个和next更加相似呢?这就需要一个统一的衡量准则了。
为了做一个统一的衡量准则,有人提出了编辑距离(Edit distance or Levenshtein distance)这个概念。 (more…)
In computer science, edit distance is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other. Edit distances find applications in natural language processing, where automatic spelling correction can determine candidate corrections for a misspelled word by selecting words from a dictionary that have a low distance to the word in question. In bioinformatics, it can be used to quantify the similarity of DNA sequences, which can be viewed as strings of the letters A, C, G and T.
Different definitions of an edit distance use different sets of string operations. The Levenshtein distance operations are the removal, insertion, or substitution of a character in the string. Being the most common metric, the Levenshtein distance is usually what is meant by “edit distance”.
现在有两个字符串,我们怎样衡量它们之间的"相似度"呢?比如naxt和exnt哪个和next更加相似呢?这就需要一个统一的衡量准则了。
为了做一个统一的衡量准则,有人提出了编辑距离(Edit distance or Levenshtein distance)这个概念。 (more…)