PRI: We introduce a probabilistic version of the well-known Rand Index (RI) for measuringthe similarity between two partitions, called Probabilistic Rand Index (PRI), in which agreements and disagreements at the object-pair level are weighted according to the probability of their occurring by chance. We then cast consensus clustering as an optimization problem of the PRI value between a target partition and a set of given partitions, experimenting with a simple and very efficient stochastic optimization algorithm. Remarkable performance gains over input partitions as well as over existing related methods are demonstrated through a range of applications, including a new use of consensus clustering to improve subtopic retrieval. Definition
Given a set of elements and two partitions of to compare, , a partition of S intor subsets, and , a partition of S into s subsets, define the following: * , the number of pairs of elements in that are in the same set in and in the same set in * , the number of pairs of elements in that are in different sets in and in different sets in * , the number of pairs of elements in that are in the same set in and in different sets in * , the number of pairs of elements in that are in different sets in and in the same set in The Rand index, , is:
Intuitively, can be considered as the number of agreements between and and as the number of disagreementsbetween and . -------------------------------------------------
Suppose we have two clusterings (a division of a set into several subsets) and where , , . Then the variation of information between two clusterings is:
where is entropy of and is mutual information between and . This is completely equivalent to the shared information distance. Along with Kohler and Koffka, Max Wertheimer was one of the principal proponents of Gestalt theory which emphasized higher-order cognitive processes in the midst of behaviorism. The...
Please join StudyMode to read the full document