Clustering for Data Mining: A Data Recovery Approach

Product Description
Often considered more as an art than a science, the field of clustering has been dominated by learning through examples and by techniques chosen almost through trial-and-error. Even the most popular clustering methods–K-Means for partitioning the data set and Ward’s method for hierarchical clustering–have lacked the theoretical attention that would establish a firm relationship between the two methods and relevant interpretation aids. Rather than the traditiona… More >>

Clustering for Data Mining: A Data Recovery Approach

2 Responses to “Clustering for Data Mining: A Data Recovery Approach”

  1. John Matlock says:

    First, understand that the type of clustering being discussed in this book is the statistical technique of finding clusters of data in a collection, where the collection is typically a database. This is not about clustered micro computers being used to work on big computational tasks as though it is a supercomputer.

    Clusters of customers is a key area in data mining and knowledge discovery. You are usually trying to find groups of people with similar buying patterns but not necessarily identical. For instance if you have a group of people that have purchased a book on PHP, you might want to try to sell them a book on MySQL, or Apache, or Linnux. These programs fit together, but are not identical. Still the customer who purchased the PHP book is more likely to want a MySQL book than he is to want an audio CD of a murder mystery.

    In this book, two of the most popular clustering techniques, K-Means and Ward’s Method are presented. They are presented for a reader interested in the technical aspects of data mining as a theoretician or a practitioner. It is intended (the author says) that the material be useful to a reader with no mathematical background beyond high school. But the author also says, it might be of help if the reader is acquainted with basic notions of calculus, statistics, matrix algebra, graph theory and logic. (The author went to a different high school than I).

    Clustering is described in this book to be used in a wide variety of applications, most of which are oriented to discovering social patterns, biological taxonomies, machine learning, etc. The book discusses the various techniques that have been developed and gives examples where they have been used in a wide variety of applications.
    Rating: 5 / 5

  2. Mark Levin says:

    This book gives a smooth, motivated and example-rich

    introduction to clustering, which is innovative in many aspects.

    Answers to important questions that are very rarely addressed if

    addressed at all, are provided.

    Examples:

    (a) what to do if the user has no idea of the number

    of clusters and/or their location – use what is called intelligent k-means;

    (b) what to do if the data contain both numeric and categorical

    features – use what is called three-step standardization procedure;

    (c) how to catch anomalous patterns, (d) how to validate clusters, etc.

    Some of these may be subject to criticism, however some motivation is always

    supplied, and the results are always reproducible thus testable.

    The book introduces a number

    of non-conventional cluster interpretation aids derived from a data

    geometry view accepted by the author and based on what is referred

    the contribution weights – basically showing those elements of cluster

    structures that distinguish clusters from the rest. These contribution

    weights, applied to categorical data, appear to be highly compatible

    with what statisticians such as A. Quetelet and K. Pearson were developing

    in the past couple of centuries, which is a highly original and welcome

    development. The book reviews a rich set of approaches being accumulated

    in such hot areas as text mining and bioinformatics, and shows that

    clustering is not just a set of naive methods for data processing but

    forms an evolving area of data science.

    I adopted the book as a text for my courses in data mining for bachelor

    and master degrees.

    Rating: 5 / 5

Leave a Reply