Ah OK, I didn't get your question then.
a dist-object is actually a vector of numbers with a couple of attributes.
You can't just cut out values like that. The hclust function needs a perfect
distance matrix to use the calculations.
shortcut is easy : just do f <- f/2*max(f), and all values are below 2.
Otherwise this function could do that for you :
to.dist <- function(x){
x.names <- sort(unique(c(x[[1]],x[[2]])))
n <- length(x.names)
x.dist <- matrix(0,n,n)
dimnames(x.dist) <- list(x.names,x.names)
x.ind <- rbind(cbind(match(x[[1]], x.names), match(x[[2]], x.names)),
cbind(match(x[[2]], x.names), match(x[[1]], x.names)))
x.dist[x.ind] <- rep(x[[3]], 2)
x.dist <- as.dist(x.dist)
return(x.dist)
}
d <- to.dist(distB)
hclust(d)
Cheers
Joris
On Sat, May 29, 2010 at 12:04 AM, Ayesha Khan
<[email protected]>wrote:
> Yes Joris. I did try that and it does produce the results. I am now
> wondering why I wanted a matrix like structure in the first place. However,
> I do want 'f' to contain values less than 2 only. but when i try to get rid
> of values greater than 2 by doing N <- (f[f<2], f strcuture disrupts and
> hclust doesnt want to recognize it anyore again. Because obviously the data
> frame changes again with that. Any ideas on how to do that?
>
>
> On Fri, May 28, 2010 at 4:13 PM, Joris Meys <[email protected]> wrote:
>
>> errr, forget about the output of dput(q), but keep it in mind for next
>> time.
>>
>> f = dist(t(q))
>> hclust(f,method="single")
>>
>> it's as simple as that.
>> Cheers
>> Joris
>>
>>
>> On Fri, May 28, 2010 at 10:39 PM, Ayesha Khan <
>> [email protected]> wrote:
>>
>>> v <- dput(x,"sampledata.txt")
>>> dim(v)
>>> q <- v[1:10,1:10]
>>> f =as.matrix(dist(t(q)))
>>>
>>> distB=NULL
>>> for(k in 1:(nrow(f)-1)) for( m in (k+1):ncol(f)) {
>>> if(f[k,m] <2) distB=rbind(distB,c(k,m,f[k,m]))
>>> }
>>> #now distB looks like this
>>>
>>> > distB
>>> [,1] [,2] [,3]
>>> [1,] 1 2 1.6275568
>>> [2,] 1 3 0.5252058
>>> [3,] 1 4 0.7323116
>>> [4,] 1 5 1 .9966001
>>> [5,] 1 6 1.6664110
>>> [6,] 1 7 1.0800540
>>> [7,] 1 8 1.8698925
>>> [8,] 1 10 0.5161808
>>> [9,] 2 3 1.7325811
>>> [10,] 2 5 0.8267843
>>> [11,] 2 6 0.5963280
>>> [12,] 2 7 0.8787230
>>>
>>> #now from this output< i want to cluster all 1's, friedns of 1 and
>>> friends of friends of 1 in one cluster. The same goes for 2,3 and so on
>>> But when i do that using hclust, i get the following error. I think what
>>> I need to do is convert my cureent matrix somehow into a format that would
>>> be accepted by the hclust function but I dont know how to achieve that.
>>> distclust <- hclust(distB,method="single")
>>>
>>> Error in if (n < 2) stop("must have n >= 2 objects to cluster") :
>>> argument is of length zero
>>>
>>> P.S: Please let me know if this makes things more clear? "cuz i dont know
>>> how looking at the original data set would help becuase the matrix under
>>> consdieration right now is the distance matrix and how it can be altered. I
>>> have tried as.dist, doesnt work because my matrix as i mentioned eralier is
>>> not a square matrix.
>>> On Fri, May 28, 2010 at 2:37 PM, Tal Galili <[email protected]>wrote:
>>>
>>>> Hi Ayesha,
>>>> I wish to help you, but without a simple self contained example that
>>>> shows your issue, I will not be able to help.
>>>> Try using the ?dput command to create some simple data, and let us see
>>>> what you are doing.
>>>>
>>>> Best,
>>>> Tal
>>>> ----------------Contact
>>>> Details:-------------------------------------------------------
>>>> Contact me: [email protected] | 972-52-7275845
>>>> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew)
>>>> | www.r-statistics.com (English)
>>>>
>>>> ----------------------------------------------------------------------------------------------
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, May 28, 2010 at 9:04 PM, Ayesha Khan <
>>>> [email protected]> wrote:
>>>>
>>>>> Thanks Tal & Joris!
>>>>> I created my distance matrix distA by using the dist() function in R
>>>>> manipulating my output in order to get a matrix.
>>>>> distA =as.matrix(dist(t(x2))) # x2 being my original dataset
>>>>> as according to the documentaion on dist()
>>>>>
>>>>> For the default method, a "dist" object, or a matrix (of distances) or
>>>>> an object which can be coerced to such a matrix using as.matrix()
>>>>>
>>>>> On Fri, May 28, 2010 at 6:34 AM, Joris Meys <[email protected]>wrote:
>>>>>
>>>>>> As Tal said.
>>>>>>
>>>>>> Next to that, I read that column1 (and column2?) are supposed to be
>>>>>> seen as factors, not as numerical variables. Did you take that into
>>>>>> account
>>>>>> somehow?
>>>>>>
>>>>>> It's easy to reproduce the error code :
>>>>>> > n <- NULL
>>>>>> > if(n<2)print("This is OK")
>>>>>> Error in if (n < 2) print("This is OK") : argument is of length zero
>>>>>>
>>>>>> In the hclust code, you find following line :
>>>>>> n <- as.integer(attr(d, "Size"))
>>>>>> where d is the distance object entered in the hclust function. Looking
>>>>>> at the error you get, this means that the size attribute of your
>>>>>> distance is
>>>>>> NULL. Which tells me that distA is not a dist-object.
>>>>>>
>>>>>> > A <- matrix(1:4,ncol=2)
>>>>>> > A
>>>>>> [,1] [,2]
>>>>>> [1,] 1 3
>>>>>> [2,] 2 4
>>>>>> > hclust(A,method="single")
>>>>>>
>>>>>> Error in if (n < 2) stop("must have n >= 2 objects to cluster") :
>>>>>> argument is of length zero
>>>>>>
>>>>>> Did you actually put in a distance object? see also ?dist or ?as.dist.
>>>>>>
>>>>>> Cheers
>>>>>> Joris
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, May 28, 2010 at 1:41 AM, Ayesha Khan <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> i have a matrix with the following dimensions
>>>>>>> 136 3
>>>>>>>
>>>>>>> and it looks something like
>>>>>>>
>>>>>>> [,1] [,2] [,3]
>>>>>>> [1,] 402 675 1.802758
>>>>>>> [2,] 402 696 1.938902
>>>>>>> [3,] 402 699 1.994253
>>>>>>> [4,] 402 945 1.898619
>>>>>>> [5,] 424 470 1.812857
>>>>>>> [6,] 424 905 1.816345
>>>>>>> [7,] 470 905 1.871252
>>>>>>> [8,] 504 780 1.958191
>>>>>>> [9,] 504 848 1.997111...............
>>>>>>>
>>>>>>> ................................................................................
>>>>>>> so you get the idea. I want to group similar items in one
>>>>>>> group/cluster
>>>>>>> following the "friends of friends" approach. I tried doing
>>>>>>>
>>>>>>> distclust <- hclust(distA,method="single")
>>>>>>> However, I got the following error.
>>>>>>>
>>>>>>> Error in if (n < 2) stop("must have n >= 2 objects to cluster") :
>>>>>>> argument
>>>>>>> is of length zero
>>>>>>> which probably means there's something wrong with my input here. Is
>>>>>>> there
>>>>>>> another way of doing this kind of clustering without getting into all
>>>>>>> the
>>>>>>> looping and ifelse etc. Basically, if 402 is close to 675,696,and699
>>>>>>> and
>>>>>>> thus fall in cluster A then all items close to 675,696,and 699 should
>>>>>>> also
>>>>>>> fall into the same cluster A following a friends of friedns strategy.
>>>>>>> Any help would be highly appreciated.
>>>>>>>
>>>>>>> --
>>>>>>> Ayesha Khan
>>>>>>>
>>>>>>> MS Bioengineering
>>>>>>> Dept. of Bioengineering
>>>>>>> Rice University, TX
>>>>>>>
>>>>>>> [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> [email protected] mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide
>>>>>>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Joris Meys
>>>>>> Statistical Consultant
>>>>>>
>>>>>> Ghent University
>>>>>> Faculty of Bioscience Engineering
>>>>>> Department of Applied mathematics, biometrics and process control
>>>>>>
>>>>>> Coupure Links 653
>>>>>> B-9000 Gent
>>>>>>
>>>>>> tel : +32 9 264 59 87
>>>>>> [email protected]
>>>>>> -------------------------------
>>>>>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Ayesha Khan
>>>>>
>>>>> MS Bioengineering
>>>>> Dept. of Bioengineering
>>>>> Rice University, TX
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Ayesha Khan
>>>
>>> MS Bioengineering
>>> Dept. of Bioengineering
>>> Rice University, TX
>>>
>>
>>
>>
>> --
>> Joris Meys
>> Statistical Consultant
>>
>> Ghent University
>> Faculty of Bioscience Engineering
>> Department of Applied mathematics, biometrics and process control
>>
>> Coupure Links 653
>> B-9000 Gent
>>
>> tel : +32 9 264 59 87
>> [email protected]
>> -------------------------------
>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>>
>
>
>
> --
> Ayesha Khan
>
> MS Bioengineering
> Dept. of Bioengineering
> Rice University, TX
>
--
Joris Meys
Statistical Consultant
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control
Coupure Links 653
B-9000 Gent
tel : +32 9 264 59 87
[email protected]
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.