Computes similarity but allows you to assign weights to specific tokens. This is useful, for example, when you have a frequently-occurring string that doesn't contain useful information. See examples.
lev_weighted_token_ratio(a, b, weights = list(), ...)
The input strings
List of token weights. For example, weights = list(foo = 0.9, bar = 0.1)
. Any
tokens omitted from weights
will be given a weight of 1.
Additional arguments to be passed to stringdist::stringdistmatrix()
or
stringdist::stringsimmatrix()
.
A float
The algorithm used here is as follows:
Tokenise the input strings
Compute the edit distance between each pair of tokens
Compute the maximum edit distance between each pair of tokens
Apply any weights from the weights
argument
Return 1 - (sum(weighted_edit_distances) / sum(weighted_max_edit_distance))
Other weighted token functions:
lev_weighted_token_set_ratio()
,
lev_weighted_token_sort_ratio()
lev_weighted_token_ratio("jim ltd", "tim ltd")
#> [1] 0.8333333
lev_weighted_token_ratio("tim ltd", "jim ltd", weights = list(ltd = 0.1))
#> [1] 0.6969697