Good afternoon. I hope I have provided enough info to get my question answered.
I am running windows 10 -- R3.5.1 -- RStudio Version 1.1.456 When running a K-Means clustering routine is it possible to get the actual data from each cluster into a DF? I have reviewed a number of tutorials and unless I missed it somewhere I would like to know if it is possible. https://www.datacamp.com/community/tutorials/k-means-clustering-r https://www.guru99.com/r-k-means-clustering.html https://datascienceplus.com/k-means-clustering-in-r/ https://datascienceplus.com/finding-optimal-number-of-clusters/ http://enhancedatascience.com/2017/10/24/machine-learning-explained-kmeans/ http://enhancedatascience.com/2017/04/30/r-basics-k-means-r/ For example: I ran the below and get K-means clustering with 10 clusters of sizes 1511, 1610, 702, 926, 996, 1076, 580, 2429, 728, 3797 Can the 1511 values of SavingsReversed and ProviderID , 1610 values of SavingsReversed and ProviderID, etc.. be run out into DF's? Thank you for your help. WHP str(rr0) Classes 'data.table' and 'data.frame':14355 obs. of 2 variables: $ SavingsReversed: num 0 0 61 128 160 ... $ ProviderID : num 113676 113676 116494 116641 116641 ... - attr(*, ".internal.selfref")=<externalptr> head(rr0, n=35) SavingsReversed ProviderID 1: 0.00 113676 2: 0.00 113676 3: 61.00 116494 4: 128.25 116641 5: 159.60 116641 6: 372.66 119316 7: 18.79 121319 8: 15.64 121319 9: 0.00 121319 10: 18.79 121319 11: 23.00 121319 12: 18.79 121319 13: 0.00 121319 14: 25.86 121319 15: 14.00 121319 16: 113.00 121545 17: 50.00 121545 18: 1155.32 121545 19: 113.00 121545 20: 197.20 121545 21: 0.00 121780 22: 36.00 122536 23: 1171.32 125198 24: 1171.32 125198 25: 43.00 125303 26: 0.00 125881 27: 69.64 128435 28: 420.18 128435 29: 175.18 128435 30: 71.54 128435 31: 99.85 128435 32: 0.00 128435 33: 42.75 128435 34: 175.18 128435 35: 846.45 128435 set.seed(213) rr0a <- kmeans(rr0, 10) View(rr0a) summary(rr0a) # Length Class Mode # cluster 14355 -none- numeric # centers 20 -none- numeric # totss 1 -none- numeric # withinss 10 -none- numeric # tot.withinss 1 -none- numeric # betweenss 1 -none- numeric # size 10 -none- numeric # iter 1 -none- numeric # ifault 1 -none- numeric x1 <- as.data.frame(rr0a$centers) sort(x1) #SavingsReversed ProviderID # 2 75.19665 2773789.2 # 3 99.31959 4147091.6 # 5 101.21070 3558532.7 # 4 103.41147 3893274.4 # 1 105.38310 2241031.2 # 8 114.61562 3240701.5 # 10 121.14184 4718727.6 # 9 153.70536 4470878.9 # 6 156.84426 5560636.6 # 7 185.09745 173732.9 print(rr0a) # K-means clustering with 10 clusters of sizes 1511, 1610, 702, 926, 996, 1076, 580, 2429, 728, 3797 # # Cluster means: # SavingsReversed ProviderID # 1 105.38310 2241031.2 # 2 75.19665 2773789.2 # 3 99.31959 4147091.6 # 4 103.41147 3893274.4 # 5 101.21070 3558532.7 # 6 156.84426 5560636.6 # 7 185.09745 173732.9 # 8 114.61562 3240701.5 # 9 153.70536 4470878.9 # 10 121.14184 4718727.6 #Within cluster sum of squares by cluster: # [1] 74529288379846 25846368411171 4692898666512 6277704963344 8428785199973 90824041558798 1468798013919 12143462193009 5483877005233 # [10] 51547955737867 # (between_SS / total_SS = 98.7 %) # # Available components: # # [1] "cluster" "centers" "totss" "withinss" "tot.withinss" "betweenss" "size" "iter" "ifault" Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}} ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.