Re: Error and doubts in using Mllib Naive bayes for text clasification

2014-07-08 Thread Rahul Bhojwani
Thanks Xiangrui. You have solved almost all my problems :) On Wed, Jul 9, 2014 at 1:47 AM, Xiangrui Meng wrote: > 1) The feature dimension should be a fixed number before you run > NaiveBayes. If you use bag of words, you need to handle the > word-to-index dictionary by yourself. You can either

Re: Error and doubts in using Mllib Naive bayes for text clasification

2014-07-08 Thread Xiangrui Meng
1) The feature dimension should be a fixed number before you run NaiveBayes. If you use bag of words, you need to handle the word-to-index dictionary by yourself. You can either ignore the words that never appear in training (because they have no effect in prediction), or use hashing to randomly pr

Re: Error and doubts in using Mllib Naive bayes for text clasification

2014-07-08 Thread Rahul Bhojwani
I am really sorry. Its actually my mistake. My problem 2 is wrong because using a single feature is a senseless thing. Sorry for the inconvenience. But still I will be waiting for the solutions for problem 1 and 3. Thanks, On Tue, Jul 8, 2014 at 12:14 PM, Rahul Bhojwani wrote: > Hello, > > I a

Error and doubts in using Mllib Naive bayes for text clasification

2014-07-07 Thread Rahul Bhojwani
Hello, I am a novice.I want to classify the text into two classes. For this purpose I want to use Naive Bayes model. I am using Python for it. Here are the problems I am facing: *Problem 1:* I wanted to use all words as features for the bag of words model. Which means my features will be count