f code. On the other hand, the
help page for codetools::checkUsage is quite cryptic. But it's good to know at
least where to look.
Sincerely,
Leonard
From: Ivan Krylov
Sent: Wednesday, February 28, 2024 10:36 AM
To: Leo Mada via R-help
Cc: Leo Mada
Su
В Sat, 24 Feb 2024 03:08:26 +
Leo Mada via R-help пишет:
> Are there any tools to extract the function names called by
> reverse-dependencies?
For well-behaved packages that declare their dependencies correctly,
parsing the NAMESPACE for importFrom() and import() calls should give
you the ex
Hello,
I am not at all sure that the following answers the question.
The code below ries to find the optimal number of clusters. One of the
changes I have made to your call to kmeans is to subset DMs not dropping
the dim attribute.
library(cluster)
max_clust <- 10
wss <- numeric(max_clust)
Hi Subhamitra,
I think the fact that you are passing a vector of values rather than a
matrix is part of the problem. As you have only one value for each
country, The points plotted will be the index on the x-axis and the
value for each country on the y-axis. Passing a value for ylim= means
that you
Hello Adrian,
It all depends on what the structure of the dataset is. For instance, you said
that all your values are betweenn -1 and 1. Do the data rown sum-squared up to
1? How about the means? Are they zero. I guess all this has to depend on the
application and how the data were processed or
Dear Marianna,
the function agnes in library cluster can compute Ward's method from a raw
data matrix (at least this is what the help page suggests).
Also, you may not be using the most recent version of hclust. The most
recent version has a note in its help page that states:
"Two different
Do not use html in r-help emails. Look below at what happens to your data.
The error message is telling you that t(data) is not numeric.
> str(data)
That will tell you what kind of data you have.
--
David L Carlson
Associate Professor of Anthropology
Please read the posting guide for future questions.
I presume you mean using the vegan package? If so, then see this blog
post of mine which shows how to do something similar:
http://wp.me/pZRQ9-73
If you post more details and an example I will help further if the blog
post is not sufficient for
On 30.04.2012 18:44, borinot wrote:
Hello to all,
I'm new to R so I have a lot of problems with it, but I'll only ask the main
one.
I have clustered an environmental matrix
We do not know what that is. Where is the example data? See the posting
guide.
with 2 different methods,
Which
PS to my previous posting: Also have a look at kmeansruns in fpc. This
runs kmeans for several numbers of clusters and decides the number of
clusters by either Calinski&Harabasz or Average Silhouette Width.
Christian
On Wed, 10 Aug 2011, Ken Hutchison wrote:
Hello all,
I am using the clust
There is a number of methods in the literature to decide the number of
clusters for k-means. Probably the most popular one is the Calinski and
Harabasz index, implemented as calinhara in package fpc. A distance
based version (and several other indexes to do this) is in function
cluster.stats in
On Wed, Aug 10, 2011 at 12:07 PM, Ken Hutchison wrote:
> Hello all,
> I am using the clustering functions in R in order to work with large
> masses of binary time series data, however the clustering functions do not
> seem able to fit this size of practical problem. Library 'hclust' is good
> (t
Try the flow cytometry clustering functions in Bioconductor.
-thomas
On Thu, Aug 11, 2011 at 7:07 AM, Ken Hutchison wrote:
> Hello all,
> I am using the clustering functions in R in order to work with large
> masses of binary time series data, however the clustering functions do not
> see
Yes absolutely, your explanation makes sense. Thanks very much.
rgds
Paul
--
View this message in context:
http://r.789695.n4.nabble.com/clustering-based-on-most-significant-pvalues-does-not-separate-the-groups-tp3644249p3649233.html
Sent from the R help mailing list archive at Nabble.com.
_
t-tests and the like test for a difference in mean value, not for
non-overlapping populations or data sets.
The fact that the mean of one data set differs significantly from the mean of
the other does not mean that the ranges of the individual points in each data
set are disjoint.
set.seed(10
Sure,
but in the end I like to call clusters of genes and not of samples. Actually
the experiment is a time-lapse experiment, therefore the samples (columns)
are fixed anyway.
I guess my misunderstanding is that I get clustering of rows in the latter
case (with dist(t(matrix))) because it's actua
Don't you expect it to be a lot faster if you cluster 20 items instead of 25000?
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Maxim
Sent: Wednesday, March 02, 2011 4:08 PM
To: r-help@r-project.org
Subject: [R] clustering problem
After ordering the table of membership degrees , i must get the difference
between the first and second coloumns , between the first and second largest
membership degree of object i. This for K=2,K=3,to K.max=6.
This difference is multiplyed by the Crisp silhouette index vector (si). Too
it d
After ordering the table of membership degrees , i must get the difference
between the first and second coloumns , between the first and second largest
membership degree of object i. This for K=2,K=3,to K.max=6.
This difference is multiplyed by the Crisp silhouette index vector (si). Too
it d
There are quite a few packages that work with finite mixtures, as
evidenced by the descriptions here:
http://cran.r-project.org/web/packages/index.html
These might be useful:
http://cran.r-project.org/web/packages/flexmix/index.html
http://cran.r-project.org/web/packages/mclust/index.html
-Ma
I must get an index (fuzzy silhouette), a weighted average. A average the
crisp silhouette for every row (i) s and the weight of each term is
determined by the difference between the membership degrees of corrisponding
object to its first and second best matching fuzzy clusters.
i need the differe
thank you ,you have been very kind
--
View this message in context:
http://r.789695.n4.nabble.com/clustering-fuzzy-tp3229853p3230228.html
Sent from the R help mailing list archive at Nabble.com.
__
R-help@r-project.org mailing list
https://stat.ethz.c
use 'apply':
> head(x.m)
V2 V3 V4 V5
[1,] 0.66 0.04 0.01 0.30
[2,] 0.02 0.89 0.09 0.00
[3,] 0.06 0.92 0.01 0.01
[4,] 0.07 0.71 0.21 0.01
[5,] 0.10 0.85 0.04 0.01
[6,] 0.91 0.04 0.02 0.02
> x.m.sort <- apply(x.m, 1, sort, decreasing = TRUE)
> head(t(x.m.sort))
[,1] [,2] [,3] [,4]
Jüri,
How did you create the output?
An example to cluster transactions with arules can be found in:
Michael Hahsler and Kurt Hornik. Building on the arules infrastructure
for analyzing transaction data with R. In R. Decker and H.-J. Lenz,
editors, /Advances in Data Analysis, Proceedings of t
On Oct 30, 2010, at 7:49 AM, dpender wrote:
David Winsemius wrote:
On Oct 29, 2010, at 12:08 PM, David Winsemius wrote:
On Oct 29, 2010, at 11:37 AM, dpender wrote:
Apologies for being vague,
The structure of the output is as follows:
Still no code?
I am using the Clusters functio
David Winsemius wrote:
>
>
> On Oct 29, 2010, at 12:08 PM, David Winsemius wrote:
>
>>
>> On Oct 29, 2010, at 11:37 AM, dpender wrote:
>>
>>> Apologies for being vague,
>>>
>>> The structure of the output is as follows:
>>
>> Still no code?
>>
>
> I am using the Clusters function from the evd
On Oct 29, 2010, at 12:08 PM, David Winsemius wrote:
On Oct 29, 2010, at 11:37 AM, dpender wrote:
Apologies for being vague,
The structure of the output is as follows:
Still no code?
$ cluster1 : Named num [1:131] 3.05 2.71 3.26 2.91 2.88 3.11 3.21
-1 2.97
3.39 ...
..- attr(*, "nam
On Oct 29, 2010, at 11:37 AM, dpender wrote:
Apologies for being vague,
The structure of the output is as follows:
Still no code?
$ cluster1 : Named num [1:131] 3.05 2.71 3.26 2.91 2.88 3.11 3.21
-1 2.97
3.39 ...
..- attr(*, "names")= chr [1:131] "6667" "6668" "6669" "6670" ...
Wi
Apologies for being vague,
The structure of the output is as follows:
$ cluster1 : Named num [1:131] 3.05 2.71 3.26 2.91 2.88 3.11 3.21 -1 2.97
3.39 ...
..- attr(*, "names")= chr [1:131] "6667" "6668" "6669" "6670" ...
With 613 clusters. What I require is abstracting the first and last va
On Oct 29, 2010, at 5:14 AM, dpender wrote:
That's helpful but the reason I'm using clusters in evd is that I
need to
specify a time condition to ensure independence.
I believe this is the first we heard about any particular function or
package.
I therefore have an output
We woul
That's helpful but the reason I'm using clusters in evd is that I need to
specify a time condition to ensure independence.
I therefore have an output in the form Cluster[[i]][j-k] where i is the
cluster number and j-k is the range of values above the threshold taking
account of the time condi
John,
Hi, just a general question: when we do hierarchical clustering, should we
compute the dissimilarity matrix based on scaled dataset or non-scaled dataset?
daisy() in cluster package allow standardizing the variables before calculating
dissimilarity matrix;
I'd say that should depend
On Oct 28, 2010, at 8:00 AM, dpender wrote:
I am looking to use R in order to determine the number of extreme
events for
a high frequency (20 minutes) dataset of wave heights that spans 25
years
(657,432) data points.
I require the number, spacing and duration of the extreme events as an
I have worked with seismic data measured at 100hz, and had no trouble
locating events in "long" records (several times the size of your
dataset). 20 minutes is high frequency? what kind of waves are
these? what is the wavelength? some details would help.
albyn
On Thu, Oct 28, 2010 at 05:00:10A
Hello Steve,
> I've been asked to help evaluate a vegetation data set, specifically to
> examine it for community similarity. The initial problem I see is that the
> data is ordinal. At best this only captures a relative ranking of
> abundance and ordinal ranks are assigned after data collection
10 02:23 cc
PMr-help@r-project.org
Subject
Re: [R] Clustering with or
Steve -
Take a look at daisy() in the cluster package.
- Phil Spector
Statistical Computing Facility
Department of Statistics
UC Be
Hi Ralph,
In case of hclust, the dendrogram does show the "steps" (they are the
heights presented in the graph).
You can present them also in a matrix using "cutree", for example:
dat <- (USArrests)
n <- (dim(dat)[1])
hc <- hclust(dist(USArrests))
cutree(hc, k=1:n)
You might then visualize the
Thank you Etienne, this seems to work like a charm. Also thanks to the rest
of you for your help.
Henrik
On 11 June 2010 13:51, Cuvelier Etienne wrote:
>
>
> Le 11/06/2010 12:45, Henrik Aldberg a écrit :
>
> I have a directed graph which is represented as a matrix on the form
>>
>>
>> 0 4 0 1
Henrik,
the methods you use are NOT applicable to directed graphs, in the
contrary even. They will split up what you want to put together. In
your data, an author never cites himself. Hence, A and B are far more
different than B and D according to the techniques you use.
Please check out Etiennes
Henrik,
Given your initial matrix, that should tell you which authors are
similar/dissimilar to which other authors in terms of which authors they
cite. In this case authors 1 and 3 are most similar because they both
cite authors 2 and 4. Authors 2 and 3 are most different because they
Dave,
I used daisy with the default settings (daisy(M) where M is the matrix).
Henrik
On 11 June 2010 21:57, Dave Roberts wrote:
> Henrik,
>
>The clustering algorithms you refer to (and almost all others) expect
> the matrix to be symmetric. They do not seek a graph-theoretic solution,
>
Henrik,
The clustering algorithms you refer to (and almost all others)
expect the matrix to be symmetric. They do not seek a graph-theoretic
solution, but rather proximity in geometric or topological space.
How did you convert y9oru matrix to a dissimilarity?
Dave Roberts
Henrik Al
Le 11/06/2010 12:45, Henrik Aldberg a écrit :
I have a directed graph which is represented as a matrix on the form
0 4 0 1
6 0 0 0
0 1 0 5
0 0 4 0
Each row correspond to an author (A, B, C, D) and the values says how many
times this author have cited the other authors. Hence the first ro
Ah OK, I didn't get your question then.
a dist-object is actually a vector of numbers with a couple of attributes.
You can't just cut out values like that. The hclust function needs a perfect
distance matrix to use the calculations.
shortcut is easy : just do f <- f/2*max(f), and all values are b
I can't run your code.
Please, just give me whatever comes on your screen when you run:
dput(q)
On Fri, May 28, 2010 at 10:57 PM, Ayesha Khan
wrote:
> I assume my matrix should look something like this?..
>
> >round(distance, 4)
>P00A P00B M02A M02B P04A P04B M06A M06B P0
I assume my matrix should look something like this?..
>round(distance, 4)
P00A P00B M02A M02B P04A P04B M06A M06B P08A
P08B M10A
P00B 0.9678
M02A 1.0054 1.0349
M02B 1.0258 1.0052 1.2106
P04A 1.0247 0.9928 1.0145 0.9260
P04B 0.9898 0.9769 0.9875 0.9855 0.6075
M06A 1.0159 0.
v <- dput(x,"sampledata.txt")
dim(v)
q <- v[1:10,1:10]
f =as.matrix(dist(t(q)))
distB=NULL
for(k in 1:(nrow(f)-1)) for( m in (k+1):ncol(f)) {
if(f[k,m] <2) distB=rbind(distB,c(k,m,f[k,m]))
}
#now distB looks like this
> distB
[,1] [,2] [,3]
[1,]12 1.6275568
[2,]13 0
Yes Joris. I did try that and it does produce the results. I am now
wondering why I wanted a matrix like structure in the first place. However,
I do want 'f' to contain values less than 2 only. but when i try to get rid
of values greater than 2 by doing N <- (f[f<2], f strcuture disrupts and
hclust
errr, forget about the output of dput(q), but keep it in mind for next time.
f = dist(t(q))
hclust(f,method="single")
it's as simple as that.
Cheers
Joris
On Fri, May 28, 2010 at 10:39 PM, Ayesha Khan
wrote:
> v <- dput(x,"sampledata.txt")
> dim(v)
> q <- v[1:10,1:10]
> f =as.matrix(dist(t(q)))
Hi Ayesha,
I wish to help you, but without a simple self contained example that shows
your issue, I will not be able to help.
Try using the ?dput command to create some simple data, and let us see what
you are doing.
Best,
Tal
Contact
Details:---
Thanks Tal & Joris!
I created my distance matrix distA by using the dist() function in R
manipulating my output in order to get a matrix.
distA =as.matrix(dist(t(x2))) # x2 being my original dataset
as according to the documentaion on dist()
For the default method, a "dist" object, or a matrix (of
As Tal said.
Next to that, I read that column1 (and column2?) are supposed to be seen as
factors, not as numerical variables. Did you take that into account somehow?
It's easy to reproduce the error code :
> n <- NULL
> if(n<2)print("This is OK")
Error in if (n < 2) print("This is OK") : argument
Hi Ayesha,
hclust is a way to go (much better then trying to invent the wheel here).
Please add what you used to create:
distA
And create a sample data set to show us what you did, using
dput
Best,
Tal
Contact
Details:---
Con
Dear Paco,
as far as I know, there is no such problem with clara, but I may be wrong.
However, in order to help you (though I'm not sure whether I'll be able to
do that), we'd need to understand precisely what you were doing in R and
how your data looks like (code and data; you can show us a r
On Wednesday 14 October 2009, Paul Evans wrote:
> Hi,
>
> I just wanted to check whether there is a clustering package available for
> ordinal data. My data looks something like: #1 #2 #3 #4.
> A B C D...
> D B C A...
> D C A A...
> where each column represents a sample, and each row some ordin
Hi there,
I'm travelling right now so I can't really check this but it seems that
the problem is that cluster.stats needs a partition as input. hclust
doesn't give you a partition but you can generate one from it using
cutree.
BTW, rather use "<-" than "=".
Best wishes,
Christian
On Wed, 1
I don't have any experience with your particular problem, but the thing I
notice is that mahalanobis is that by default you specify a covariance
matrix, and it uses solve to calculate its inverse. If you could supply the
inverse covariance matrix (and specify inverted=TRUE to mahalanobis), that
mi
It would help a lot if you told us what the error message was, and provided
some data to work with. As it is, we can't even run the function to find
out what goes wrong.
And also, OS, version of R - all that stuff that the posting guide requests.
Sarah
On Sat, Nov 8, 2008 at 10:31 AM, Bryan Rich
Here is some recent update: Any thoughts?
I have collected a list of experiment result data. I put them into a
table.
There are N rows corresponding to N data points.
For i-th row, it contains data of the form y_i = f(a_i, b_i, c_i, d_i,
e_i, f_i),
where f is a possibly stochastic function, a,
Hi there,
whether clara is a proper way of clustering depends strongly on what your data
are and particularly what interpretation or use you want for your
clustering. You may do better with a hierarchical method after having defined a
proper distance (however this would rather go into statisti
Hi Dani,
If you are working with NMR data, which data pretreatment methods you
are using? 13112 variables for NMR data sounds too lot, you should apply
some data binning or peak picking methods for data reduction.
Also you must consider multicollinearity problems related to
spectroscopic data, the
Karin Lagesen wrote:
> First I just want to say thanks for all the help I've had from the
> list so far..)
>
> I now have what I think is a clustering problem. I have lots of
> objects which I have measured a dissimilarity between. Now, this list
> only has one entry per pair, so it is not symme
On Fri, 2008-02-15 at 10:45 -0800, Tim Smith wrote:
> Hi,
>
> Is there any clustering package in R that can cluster with ordinal data?
>
> thanks!
daisy() in recommended package 'cluster' can generate dissimilarities
for ordinal data using Gower's general (dis)similarity coefficient for
mixed da
Thank you very much! I had misunderstood it's true...
On Nov 28, 2007 6:28 PM, Birgit Lemcke <[EMAIL PROTECTED]> wrote:
> Hello Eleni,
>
> as far as I understood and used agnes() the method argument
> determines only the clustering method.
> If you use diss=TRUE the distances should be taken from
Eleni,
The method= argument is in reference to how clusters are
constructed, not how the dissimilarity or distance is calculated. If
you pass agnes diss=TRUE then it will use the distances you have
calculated by whatever means. method="complete" means that clusters are
evaluated by the
Hello Eleni,
as far as I understood and used agnes() the method argument
determines only the clustering method.
If you use diss=TRUE the distances should be taken from the distance
matrix.
Birgit
Am 28.11.2007 um 12:18 schrieb Eleni Christodoulou:
> Hello all!
>
> I am performingsome cluste
Sent: Monday, October 01, 2007 1:37 PM
To: Maura E Monville
Cc: [EMAIL PROTECTED]
Subject: Re: [R] Clustering techniques using R
On Mon, 1 Oct 2007, Maura E Monville wrote:
> Now that I've loaded a file into an R data.frame and played with
> linear regression until I got a good model
On Mon, 1 Oct 2007, Maura E Monville wrote:
> Now that I've loaded a file into an R data.frame and played with
> linear regression until I got a good model, my next step is clustering
> using the coefficients of the regression model (I have many files)
> Thanks to some R experts' guidelines I co
69 matches
Mail list logo