Hello, I''ve looked around and I can't seem to find a package to do data
mining in R for a mixture of categorical and numerical attributes.

If you have this data set:

## dummy data
set.seed(123)
dummy <- data.frame(A = sample(paste("tasks",1:100), 10000,B =
sample(paste("loads",1:100), 10000,
                             replace = TRUE),
                  B = rnorm(10000))

## We can then try this:

op <- par(mar = c(5,6,4,2) + 0.1)
boxplot(B ~ A, data = dummy, horizontal = TRUE, axes = FALSE)
axis(side = 1)
axis(side = 2, at = seq_along(levels(dummy$A)),
    labels = levels(dummy$A), cex.axis = 0.5,
    las = 1)
box()
par(op)

which gives 10,000 rows x 4 columns where one column is categorical ("tasks,
loads") and the other 2 cols are numeric.
Is it possible to do data mining like clustering on a mixture of categorical
and numeric variables? If so what package should I be studying or using? Is
random forest the only algorithm that can handle a mixture of attributes?

Thanks

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to