[R] Knowledge discovery

2010-07-02 Thread abanero

Hi,

I have 10 units with 10 attributes (attr1, attr2, attr3, etc...) 

For instance:

unit  attr1  attr2  attr3  ...

1  a   ww 12
2  a   re   11
3  b   ww 09
4  c   yt   02
5  a   qw  02
... 

I'd like to answer to the question: 

a) what are the most frequent combinations of attributes? 
b) How could I describe the relations among the attributes? 
c) What are the most significative values for each attribute and how they
are in relationship with the value of  others attributes?

Do you suggest any specific method in order to answer to these questions?

Thanks  


-- 
View this message in context: 
http://r.789695.n4.nabble.com/Knowledge-discovery-tp2276207p2276207.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Knowledge discovery

2010-07-02 Thread abanero

with "table" function you can just build a  contigence table.
 What do you think about "arules" package? I thought "mining associative
rules" is the correct approach to the problem.. 

Thanks
Abanero
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Knowledge-discovery-tp2276207p2276368.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-09-27 Thread abanero

Hi Ulrich,
 I'm studying the principles of Affinity Propagation and I'm really glad to
use your package (apcluster) in order to cluster my data.  I have just an
issue to solve..

If I apply the funcion: apcluster(sim) 

where sim is the matrix of dissimilarities, sometimes I encounter the
warning message:

"Algorithm did not converge. Turn on details
and call plot() to monitor net similarity. Consider
increasing maxits and convits, and, if oscillations occur
also increasing damping factor lam."
 
with  too high number of clusters.
 
I thought to solve the problem setting the argument "p" of the function
apcluster() to mean(PreferenceRange(sim)):


apcluster(sim, p=mean(preferenceRange(sim)))

and actually it seems to be a good solution because I don't receive any
warning message and the number of cluster is slower.

Do you think it's a good solution? I submitt that I have to use apcluster()
in an automatic procedure so I can't manipulate directly the arguments of
the funcion.

Thanks in advance.
Giuseppe
-- 
View this message in context: 
http://r.789695.n4.nabble.com/cluster-analysis-and-supervised-classification-an-alternative-to-knn1-tp2231656p2715278.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Time series clustering

2010-08-24 Thread abanero

Hi,

I have 1000 monthly time series (just a year) and  I want to cluster them.

for instance (x):

  jan 2010  feb 2010  mar 2010  apr 2010 ...

ts 1:   12300 12354550  1233   12312 ...
ts 2:23423232  2323 232323 ...
...  

My approach is applying clara algorithm to the standardized data:

clara(x,k=10,stand=TRUE)->clarax

Is that a correct approach?
 
Thanks

Giuseppe
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Time-series-clustering-tp2336343p2336343.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] daisy(): space allocation issue

2010-08-26 Thread abanero

Hi,

I'm trying to apply the function daisy() to a data.frame 1x10 but I have
not enough space (error message: cannot allocate vector of length
1476173280).

I didn't imagine I was not able to work with a matrix of just 1
observations... I have setted in Rgui --max-mem-size=2G (I'm not able to set
more space..)

How can I solve this issue? Separating observations depending on some rules?

thanks
-- 
View this message in context: 
http://r.789695.n4.nabble.com/daisy-space-allocation-issue-tp2339844p2339844.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-26 Thread abanero

Hi,
I have a 1.000 observations with 10 attributes (of different types: numeric,
dicotomic, categorical  ecc..) and a measure M. 

I need to cluster these observations in order to assign a new observation
(with the same 10 attributes but not the measure) to a cluster. 

I want to calculate for the new observation a measure as the average of the
meausures M of the observations in the cluster assigned.

I would use cluster analysis ( “Clara” algorithm?) and then “knn1” (in 
package class) to assign the new observation to a cluster.

The problem is: I’m not able to use “knn1” because some of attributes are
categorical. 

Do you know  something like “knn1” that works with categorical variables
too? Do you have any suggestion?

-- 
View this message in context: 
http://r.789695.n4.nabble.com/cluster-analysis-and-supervised-classification-an-alternative-to-knn1-tp2231656p2231656.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-27 Thread abanero

Hi,

thank you Joris and Ulrich for you answers.

Joris Meys wrote: 

>see the library randomForest for example


I'm trying to find some example in randomForest with categorical variables
but I haven't found anything. Do you know any example with both categorical
and numerical variables? Anyway I don't have any class labels yet. How could 
I  find clusters with randomForest? 


Ulrich wrote:

>Probably the simplest way is Affinity Propagation[...] All you need is a
way of measuring the similarity of >samples which is straightforward both
for numerical and categorical variables.

I had a look at the documentation of the package apcluster. That's
interesting but do you have any example using it with both categorical and
numerical variables? I'd like to test it with a large dataset..

Thanks a lot!
Cheers

Giuseppe

-- 
View this message in context: 
http://r.789695.n4.nabble.com/cluster-analysis-and-supervised-classification-an-alternative-to-knn1-tp2231656p2232950.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cluster analysis and supervised classification: an alternative to knn1?

2010-05-27 Thread abanero


Ulrich wrote: 
>Affinity propagation produces quite a number of clusters. 


I tried with q=0 and produces 17 clusters. Anyway that's a good idea,
thanks. I'm looking to test it with my dataset.

So I'll probably use daisy() to compute an appropriate dissimilarity then
apcluster() or another method to determine clusters.

What do you suggest in order to assign a new observation to a determined
cluster?

 It seems that RandomForest doesn't work with both numerical and categorical
predictors (thanks to Joris).

Christian wrote: 
>and the implement
>nearest neighbours classification myself if I needed it. 
>It should be pretty straightforward to implement. 

Do you intend modify the code of the knn1() function by yourself?


thanks to everyone!

-- 
View this message in context: 
http://r.789695.n4.nabble.com/cluster-analysis-and-supervised-classification-an-alternative-to-knn1-tp2231656p2233210.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.