> On 12 May 2017, at 15:30, Elahe chalabi <chalabi.el...@yahoo.de> wrote:
> 
> 
> 
> Thanks for your reply. What I exactly have is a data frame with rows 
> containing words which have been used in each speech and columns containing 
> frequency of these words, I have an extra row showing the type of the speech 
> whether it was from a control group or Alzheimer group. Then I create a 
> training and test set for KNN from this data frame and by KNN I classify the 
> speeches which assigns every speech (actually text of the speech!) to the 
> correct type of group, if it's from control group or Alzheimer group. 
> Now my question is how can I visualize my KNN classifier or its results? 
> cause now I only have an accuracy matrix from KNN!
> 
> Thanks for any help!
> Elahe 


It would be very helpful if you create a minimal example to understand your 
data and what you have done with. Yes, you explained your data by your words 
but it’s still unclear. So, I created a minimal example instead of you.

For simplicity, I have a data.frame with 3 columns. First 2 are numeric and 
last one is factor. Group column is my real classes. A and B columns are some 
kind a numeric representation of these classes. Let’s call them features. 
Because they have hidden information represent a class. I use 30% of data for 
training and 70% for test. 

This is the point you asked for. After classification, I have a 
test.guess.cluster (factor) variable and it contains predicted clusters by knn 
method (you said that accuracy matrix from KNN, I don’t know what it is). Now, 
I want to see the clusters on a plot. That’s why, I converted 
“test.guess.cluster” variable to numeric, so I can use it to colorise the 
points on the plot. I plotted points in test.df data.frame (A versus B) and 
coloured them by predicted class.

At the end, I evaluated the overall performance of the knn model. Is it good or 
bad? Please note that you have to choose your own _k_ value and size of 
training dataset by trial and error.


library(class)
library(gmodels)
set.seed(6)
df <- data.frame(A = c(rnorm(30, 0), rnorm(30, 3)),
                 B = c(rnorm(30, 0), rnorm(30, 3)),
                 Group = factor(c(rep("G1", 30), rep("G2", 30))))
# use 33% of data for training and 67% is for test
i <- sample(2, nrow(df), replace = TRUE, prob = c(0.67, 0.33))
train.df <- df[i == 2, -3] # do not include last column
train.cl <- df[i == 2, 3] # training result cluters
test.df <- df[i == 1, -3] # test data.frame
test.real.cluster <- df[i == 1, 3] # real clusters for test
# predicted clusters by knn
test.guess.cluster <- knn(train = train.df, test = test.df, cl = train.cl, k = 
3)
# convert them to muneric to colorize points on the plot
test.guess.cluster.num <- as.numeric(test.guess.cluster)
plot(test.df, col = test.guess.cluster.num, pch = test.guess.cluster.num)

# examine the result of CrossTable
# The model identified 2 G1 classes as G2 and 1 G2 class as G1.
# Hence, 3 elements are misclassified. (you can distinguish them on the plot)
gm <- gmodels::CrossTable(test.guess.cluster, test.real.cluster, prop.chisq = 
FALSE)
sum(diag(gm$prop.tbl)) # overall success of the model (34 - 3)/34




> 
> 
> On Monday, May 8, 2017 3:55 PM, Ismail SEZEN <sezenism...@gmail.com> wrote:
> 
> 
> 
> As far as I know, kNN groups by Eucledian distance. So, you need numerical 
> data as input. You said your dataset has only “speeches” and “type of 
> people”. Are these input? or one of them is input and the latter one is 
> output? Type of people should be a factor variable (I guess). I don’t know 
> how you represent “speech” in your dataset. As character or numerical 
> representation of a feature? If you send a minimal example of the problem, we 
> can help you. Please, read posting guide.
> 
> 
> 
>> ______________________________________________
> 
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> 
>> https://stat.ethz.ch/mailman/listinfo/r-help
> 
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> 
>> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to