Tversky index

The Tversky index, named after Amos Tversky,[1] is an asymmetric similarity measure on sets that compares a variant to a prototype. The Tversky index can be seen as a generalization of the Sørensen–Dice coefficient and the Jaccard index.

For sets X and Y the Tversky index is a number between 0 and 1 given by

S ( X , Y ) = | X Y | | X Y | + α | X Y | + β | Y X | {\displaystyle S(X,Y)={\frac {|X\cap Y|}{|X\cap Y|+\alpha |X\setminus Y|+\beta |Y\setminus X|}}}

Here, X Y {\displaystyle X\setminus Y} denotes the relative complement of Y in X.

Further, α , β 0 {\displaystyle \alpha ,\beta \geq 0} are parameters of the Tversky index. Setting α = β = 1 {\displaystyle \alpha =\beta =1} produces the Jaccard index; setting α = β = 0.5 {\displaystyle \alpha =\beta =0.5} produces the Sørensen–Dice coefficient.

If we consider X to be the prototype and Y to be the variant, then α {\displaystyle \alpha } corresponds to the weight of the prototype and β {\displaystyle \beta } corresponds to the weight of the variant. Tversky measures with α + β = 1 {\displaystyle \alpha +\beta =1} are of special interest.[2]

Because of the inherent asymmetry, the Tversky index does not meet the criteria for a similarity metric. However, if symmetry is needed a variant of the original formulation has been proposed using max and min functions[3] .

S ( X , Y ) = | X Y | | X Y | + β ( α a + ( 1 α ) b ) {\displaystyle S(X,Y)={\frac {|X\cap Y|}{|X\cap Y|+\beta \left(\alpha a+(1-\alpha )b\right)}}}

a = min ( | X Y | , | Y X | ) {\displaystyle a=\min \left(|X\setminus Y|,|Y\setminus X|\right)} ,

b = max ( | X Y | , | Y X | ) {\displaystyle b=\max \left(|X\setminus Y|,|Y\setminus X|\right)} ,

This formulation also re-arranges parameters α {\displaystyle \alpha } and β {\displaystyle \beta } . Thus, α {\displaystyle \alpha } controls the balance between | X Y | {\displaystyle |X\setminus Y|} and | Y X | {\displaystyle |Y\setminus X|} in the denominator. Similarly, β {\displaystyle \beta } controls the effect of the symmetric difference | X Y | {\displaystyle |X\,\triangle \,Y\,|} versus | X Y | {\displaystyle |X\cap Y|} in the denominator.

Notes

  1. ^ Tversky, Amos (1977). "Features of Similarity" (PDF). Psychological Review. 84 (4): 327–352. doi:10.1037/0033-295x.84.4.327.
  2. ^ "Daylight Theory: Fingerprints".
  3. ^ Jimenez, S., Becerra, C., Gelbukh, A. SOFTCARDINALITY-CORE: Improving Text Overlap with Distributional Measures for Semantic Textual Similarity. Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, p.194-201, June 7–8, 2013, Atlanta, Georgia, USA.