[R] Knowledge discovery
Hi, I have 10 units with 10 attributes (attr1, attr2, attr3, etc...) For instance: unit attr1 attr2 attr3 ... 1 a ww 12 2 a re 11 3 b ww 09 4 c yt 02 5 a qw 02 ... I'd like to answer to the question: a) what are the most frequent combinations of attributes? b) How could I describe the relations among the attributes? c) What are the most significative values for each attribute and how they are in relationship with the value of others attributes? Do you suggest any specific method in order to answer to these questions? Thanks -- View this message in context: http://r.789695.n4.nabble.com/Knowledge-discovery-tp2276207p2276207.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Knowledge discovery
with "table" function you can just build a contigence table. What do you think about "arules" package? I thought "mining associative rules" is the correct approach to the problem.. Thanks Abanero -- View this message in context: http://r.789695.n4.nabble.com/Knowledge-discovery-tp2276207p2276368.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cluster analysis and supervised classification: an alternative to knn1?
Hi Ulrich, I'm studying the principles of Affinity Propagation and I'm really glad to use your package (apcluster) in order to cluster my data. I have just an issue to solve.. If I apply the funcion: apcluster(sim) where sim is the matrix of dissimilarities, sometimes I encounter the warning message: "Algorithm did not converge. Turn on details and call plot() to monitor net similarity. Consider increasing maxits and convits, and, if oscillations occur also increasing damping factor lam." with too high number of clusters. I thought to solve the problem setting the argument "p" of the function apcluster() to mean(PreferenceRange(sim)): apcluster(sim, p=mean(preferenceRange(sim))) and actually it seems to be a good solution because I don't receive any warning message and the number of cluster is slower. Do you think it's a good solution? I submitt that I have to use apcluster() in an automatic procedure so I can't manipulate directly the arguments of the funcion. Thanks in advance. Giuseppe -- View this message in context: http://r.789695.n4.nabble.com/cluster-analysis-and-supervised-classification-an-alternative-to-knn1-tp2231656p2715278.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Time series clustering
Hi, I have 1000 monthly time series (just a year) and I want to cluster them. for instance (x): jan 2010 feb 2010 mar 2010 apr 2010 ... ts 1: 12300 12354550 1233 12312 ... ts 2:23423232 2323 232323 ... ... My approach is applying clara algorithm to the standardized data: clara(x,k=10,stand=TRUE)->clarax Is that a correct approach? Thanks Giuseppe -- View this message in context: http://r.789695.n4.nabble.com/Time-series-clustering-tp2336343p2336343.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] daisy(): space allocation issue
Hi, I'm trying to apply the function daisy() to a data.frame 1x10 but I have not enough space (error message: cannot allocate vector of length 1476173280). I didn't imagine I was not able to work with a matrix of just 1 observations... I have setted in Rgui --max-mem-size=2G (I'm not able to set more space..) How can I solve this issue? Separating observations depending on some rules? thanks -- View this message in context: http://r.789695.n4.nabble.com/daisy-space-allocation-issue-tp2339844p2339844.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cluster analysis and supervised classification: an alternative to knn1?
Hi, I have a 1.000 observations with 10 attributes (of different types: numeric, dicotomic, categorical ecc..) and a measure M. I need to cluster these observations in order to assign a new observation (with the same 10 attributes but not the measure) to a cluster. I want to calculate for the new observation a measure as the average of the meausures M of the observations in the cluster assigned. I would use cluster analysis ( “Clara” algorithm?) and then “knn1” (in package class) to assign the new observation to a cluster. The problem is: I’m not able to use “knn1” because some of attributes are categorical. Do you know something like “knn1” that works with categorical variables too? Do you have any suggestion? -- View this message in context: http://r.789695.n4.nabble.com/cluster-analysis-and-supervised-classification-an-alternative-to-knn1-tp2231656p2231656.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cluster analysis and supervised classification: an alternative to knn1?
Hi, thank you Joris and Ulrich for you answers. Joris Meys wrote: >see the library randomForest for example I'm trying to find some example in randomForest with categorical variables but I haven't found anything. Do you know any example with both categorical and numerical variables? Anyway I don't have any class labels yet. How could I find clusters with randomForest? Ulrich wrote: >Probably the simplest way is Affinity Propagation[...] All you need is a way of measuring the similarity of >samples which is straightforward both for numerical and categorical variables. I had a look at the documentation of the package apcluster. That's interesting but do you have any example using it with both categorical and numerical variables? I'd like to test it with a large dataset.. Thanks a lot! Cheers Giuseppe -- View this message in context: http://r.789695.n4.nabble.com/cluster-analysis-and-supervised-classification-an-alternative-to-knn1-tp2231656p2232950.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cluster analysis and supervised classification: an alternative to knn1?
Ulrich wrote: >Affinity propagation produces quite a number of clusters. I tried with q=0 and produces 17 clusters. Anyway that's a good idea, thanks. I'm looking to test it with my dataset. So I'll probably use daisy() to compute an appropriate dissimilarity then apcluster() or another method to determine clusters. What do you suggest in order to assign a new observation to a determined cluster? It seems that RandomForest doesn't work with both numerical and categorical predictors (thanks to Joris). Christian wrote: >and the implement >nearest neighbours classification myself if I needed it. >It should be pretty straightforward to implement. Do you intend modify the code of the knn1() function by yourself? thanks to everyone! -- View this message in context: http://r.789695.n4.nabble.com/cluster-analysis-and-supervised-classification-an-alternative-to-knn1-tp2231656p2233210.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.