Re: Ability to have CountVectorizerModel vocab as empty

2020-08-19 Thread Jatin Puri
Thanks Sean for the quick response. Logged a Jira: https://issues.apache.org/jira/browse/SPARK-32662 Will send a pull request shortly. Regards, Jatin On Wed, Aug 19, 2020 at 6:58 PM Sean Owen wrote: > I think that's true. You're welcome to open a pull request / JIRA to > remove that requireme

Re: Ability to have CountVectorizerModel vocab as empty

2020-08-19 Thread Sean Owen
I think that's true. You're welcome to open a pull request / JIRA to remove that requirement. On Wed, Aug 19, 2020 at 3:21 AM Jatin Puri wrote: > > Hello, > > This is wrt > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala#L244 > >

Ability to have CountVectorizerModel vocab as empty

2020-08-19 Thread Jatin Puri
Hello, This is wrt https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala#L244 require(vocab.length > 0, "The vocabulary size should be > 0. Lower minDF as necessary.") Currently, if `CountVectorizer` is trained on an empty dataset resu