My problem is that I have already a distance (similarity) matrix generated
outside R through a C++ code because the criteria to calculate the
"distance" between pairs of objects are none of the standard criteria
implemented in R.
If I got it right, but I might be mistaken, stepFlexclust() performs the
clusters layout optimization by calling either one of  kcca or cclust which
calculate their own similarity matrix.
I just need a function or method to optimize the clusters number no matter
how the similarity matrix has been generated and no matter which clustering
function I use (PAM).
Is this at all possible ?

Thank you very much,
Maura



On Fri, Oct 31, 2008 at 12:18 AM, Dylan Beaudette <[EMAIL PROTECTED]
> wrote:

> On Thursday 30 October 2008, Maura E Monville wrote:
> > I have the book you mentioned. It basically describes the silhouette
> > method. I do not have it handy as I moved so it is still in some box.
> > However I cannot remember that book providing any other criterion to find
> > the best clusters number.
> > On the other hand I have the same problem with hierarchical clustering
> > techniques.
> > I use clusters as exploratory analysis because I do not have any a-priori
> > knowledge that helps me make a choice.
> > How can multivariate analysis help?
> > I launched a loop where the silhouette test follows PAM which is passed a
> > clusters number increased by 1 at each iteration.
> > Since I am observing that the silhouette value is now oscillating among
> > negative numbers, I wonder whether I can assume that it can only grow
> worse
> > once it has turned negative the first time so leave the loop after the
> > first negative number and choose the clusters number associated with the
> > biggest positive silhouette value.
> > This procedure would spare a lot of CPU time.
>
> Another approach might involve the stepFlexclust() from the flexclust
> package.
> See the manual page for this function for examples.
>
> Dylan
>
>
> > Thank you very much,
> > Maura
> >
> > On Thu, Oct 30, 2008 at 7:25 PM, Dylan Beaudette
> >
> > <[EMAIL PROTECTED]>wrote:
> > > On Thursday 30 October 2008, Maura E Monville wrote:
> > > > I have a pretty big similarity matrix (2870x2870). I will produce
> even
> > > > bigger ones soon.
> > > > I am using PAM to generate clusters.
> > > > The desired number of output clusters is a PAM input parameter.
> > > > I do not know  a-priopri what is the best clusters layout .
> > > > I resorted to the silhouette test. It takes forever as I have to run
> > > > PAM with all possible
> > > > numbers of clusters.
> > > > I wonder whether there is some faster method, either a s/w code or
> some
> > > > theoretical guidelines,
> > > > to get the optimum clusters number.
> > > >
> > > > Thank you very much,
> > >
> > > This is a very general topic in the field of multivariate analysis.
> There
> > > really isn't any way to know the 'correct' number of clusters, however
> > > there
> > > are several metrics that can give you an indication of how messy your
> > > data are.
> > >
> > > For information on the methods in the cluster package, see this book:
> > >
> > > Kaufman, L. & Rousseeuw, P. J. Finding Groups in Data An Introduction
> to
> > > Cluster Analysis Wiley-Interscience, 2005
> > >
> > > Otherwise, consider a book on multivariate analysis. Alternatively, try
> a
> > > hierarchical clustering approach, and look for meaningful groupings.
> Some
> > > thing like this:
> > >
> > > d <- diana(daisy(your_data_matrix))
> > > d.hc <- as.hclust(d)
> > >
> > > d.hc$labels <- your_data_matrix$id
> > >
> > > plot(d.hc)
> > >
> > > Cheers,
> > >
> > > Dylan
> > >
> > >
> > > --
> > > Dylan Beaudette
> > > Soil Resource Laboratory
> > > http://casoilresource.lawr.ucdavis.edu/
> > > University of California at Davis
> > > 530.754.7341
>
>
>
> --
> Dylan Beaudette
> Soil Resource Laboratory
> http://casoilresource.lawr.ucdavis.edu/
> University of California at Davis
> 530.754.7341
>



-- 
Maura E.M

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to