> On 12 May 2017, at 15:30, Elahe chalabi <chalabi.el...@yahoo.de> wrote: > > > > Thanks for your reply. What I exactly have is a data frame with rows > containing words which have been used in each speech and columns containing > frequency of these words, I have an extra row showing the type of the speech > whether it was from a control group or Alzheimer group. Then I create a > training and test set for KNN from this data frame and by KNN I classify the > speeches which assigns every speech (actually text of the speech!) to the > correct type of group, if it's from control group or Alzheimer group. > Now my question is how can I visualize my KNN classifier or its results? > cause now I only have an accuracy matrix from KNN! > > Thanks for any help! > Elahe
It would be very helpful if you create a minimal example to understand your data and what you have done with. Yes, you explained your data by your words but it’s still unclear. So, I created a minimal example instead of you. For simplicity, I have a data.frame with 3 columns. First 2 are numeric and last one is factor. Group column is my real classes. A and B columns are some kind a numeric representation of these classes. Let’s call them features. Because they have hidden information represent a class. I use 30% of data for training and 70% for test. This is the point you asked for. After classification, I have a test.guess.cluster (factor) variable and it contains predicted clusters by knn method (you said that accuracy matrix from KNN, I don’t know what it is). Now, I want to see the clusters on a plot. That’s why, I converted “test.guess.cluster” variable to numeric, so I can use it to colorise the points on the plot. I plotted points in test.df data.frame (A versus B) and coloured them by predicted class. At the end, I evaluated the overall performance of the knn model. Is it good or bad? Please note that you have to choose your own _k_ value and size of training dataset by trial and error. library(class) library(gmodels) set.seed(6) df <- data.frame(A = c(rnorm(30, 0), rnorm(30, 3)), B = c(rnorm(30, 0), rnorm(30, 3)), Group = factor(c(rep("G1", 30), rep("G2", 30)))) # use 33% of data for training and 67% is for test i <- sample(2, nrow(df), replace = TRUE, prob = c(0.67, 0.33)) train.df <- df[i == 2, -3] # do not include last column train.cl <- df[i == 2, 3] # training result cluters test.df <- df[i == 1, -3] # test data.frame test.real.cluster <- df[i == 1, 3] # real clusters for test # predicted clusters by knn test.guess.cluster <- knn(train = train.df, test = test.df, cl = train.cl, k = 3) # convert them to muneric to colorize points on the plot test.guess.cluster.num <- as.numeric(test.guess.cluster) plot(test.df, col = test.guess.cluster.num, pch = test.guess.cluster.num) # examine the result of CrossTable # The model identified 2 G1 classes as G2 and 1 G2 class as G1. # Hence, 3 elements are misclassified. (you can distinguish them on the plot) gm <- gmodels::CrossTable(test.guess.cluster, test.real.cluster, prop.chisq = FALSE) sum(diag(gm$prop.tbl)) # overall success of the model (34 - 3)/34 > > > On Monday, May 8, 2017 3:55 PM, Ismail SEZEN <sezenism...@gmail.com> wrote: > > > > As far as I know, kNN groups by Eucledian distance. So, you need numerical > data as input. You said your dataset has only “speeches” and “type of > people”. Are these input? or one of them is input and the latter one is > output? Type of people should be a factor variable (I guess). I don’t know > how you represent “speech” in your dataset. As character or numerical > representation of a feature? If you send a minimal example of the problem, we > can help you. Please, read posting guide. > > > >> ______________________________________________ > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.