solidot新版网站常见问题,请点击这里查看。
消息
本文已被查看310次
Query Complexity of Clustering with Side Information. (arXiv:1706.07719v1 [stat.ML])
来源于:arXiv
Suppose, we are given a set of $n$ elements to be clustered into $k$
(unknown) clusters, and an oracle/expert labeler that can interactively answer
pair-wise queries of the form, "do two elements $u$ and $v$ belong to the same
cluster?". The goal is to recover the optimum clustering by asking the minimum
number of queries. In this paper, we initiate a rigorous theoretical study of
this basic problem of query complexity of interactive clustering, and provide
strong information theoretic lower bounds, as well as nearly matching upper
bounds. Most clustering problems come with a similarity matrix, which is used
by an automated process to cluster similar points together. Our main
contribution in this paper is to show the dramatic power of side information
aka similarity matrix on reducing the query complexity of clustering. A
similarity matrix represents noisy pair-wise relationships such as one computed
by some function on attributes of the elements. A natural noisy model is where
similar 查看全文>>