Good evening, I'm encountering a different kind of discretization with respect to the 1997 Liu and Setiono's one descripted in their papers, using Chi2 algorithm for feature selection with discretization.
As stated in R documentation (discretization - R (from CRAN) <https://cran.r-project.org/web/packages/discretization/discretization.pdf>), R package discretizion offers the function Chi2, which comes to life in the following papers: Liu, H. and Setiono, R. (1995). Chi2: Feature selection and discretization of numeric attributes, Tools with Artificial Intelligence, 388–391. Liu, H. and Setiono, R. (1997). Feature selection and discretization, IEEE transactions on knowledge and data engineering, Vol.9, no.4, 642–645. I wrote the following R programming language code, in which I have set alpha and delta equal to the ones set in the papers above. Finally, the following code prints out the discretized dataframe. I used Iris dataframe, as in one of the examples in the two papers. The first paper above states that alfa = 0.5 and delta = 5%, and that "the originally odd numbered data are selected for training (75 patterns) and rest for testing (75 patterns)". With this asset, Sepal attributes should be removed. library(discretization) data(iris) df1 <- iris[FALSE,]for(i in 1:nrow(iris)){ if(i %% 2 != 0){ df1 <- rbind(df1, iris[i,]) }} chi2(df1, alp=0.5, del=0.05)$Disc.data The point is that, observing the dataframe printed out by the last instruction, you can see that no attribute is removed. The discretized data frame still have 4 attributes discretized: if I correctly understood the above papers, Sepal Length and Sepal Width should have been both discretized in just one interval by Chi2 algorithm. I have posted a question here: http://stats.stackexchange.com/questions/ 247499/why-does-not-r-chi2-algorithm-discretize-in-the- same-manner-as-in-the-paper-by-l?noredirect=1#comment470974_247499. Moreover, it's really hard to understand the cut points that Chi2 algorithm implemented in R makes. For example: res <- chi2(iris, 0.5, 0.05) cut(iris$Sepal.Length, res$cutp, labels=FALSE) is different from res$Disc.data$Sepal.Length Help me understand, please Best regards [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.