Hi, There is a parameter in the HashingTF called "numFeatures". I was wondering what is the best way to set the value to this parameter. In the use case of text categorization, do you need to know in advance the number of words in your vocabulary? or do you set it to be a large value, greater than the number of words in your vocabulary?
Thanks, Jianguo