Re: Ability to have CountVectorizerModel vocab as empty

2020-08-19 Thread Jatin Puri
> remove that requirement. > > On Wed, Aug 19, 2020 at 3:21 AM Jatin Puri wrote: > > > > Hello, > > > > This is wrt > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala#L244 > > > > require(vocab.

Ability to have CountVectorizerModel vocab as empty

2020-08-19 Thread Jatin Puri
Hello, This is wrt https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala#L244 require(vocab.length > 0, "The vocabulary size should be > 0. Lower minDF as necessary.") Currently, if `CountVectorizer` is trained on an empty dataset resu

Re: Java 11 support in Spark 2.5

2020-01-02 Thread Jatin Puri
>From this >(http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Spark-2-5-release-td27963.html#a27966), > looks like there is no confirmation yet if at all Spark 2.5 would have JDK 11 >support. Spark 3 would most likely be out soon (tentatively this quarter as per mailing list).

Re: Lightweight pipeline execution for single eow

2018-09-26 Thread Jatin Puri
and tried a few. But they didnt seem to work. I am using spark 2.3.1 Thanks. On Sun, Sep 23, 2018 at 6:00 PM Michael Artz wrote: > Are you using the scheduler in fair mode instead of fifo mode? > > Sent from my iPhone > > > On Sep 22, 2018, at 12:58 AM, Jatin Puri

Lightweight pipeline execution for single eow

2018-09-21 Thread Jatin Puri
Hi. What tactics can I apply for such a scenario. I have a pipeline of 10 stages. Simple text processing. I train the data with the pipeline and for the fitted data, do some modelling and store the results. I also have a web-server, where I receive requests. For each request (dataframe of single

Spark with Scala 2.12

2018-04-20 Thread Jatin Puri
Hello. I am wondering, if there is any new update on Spark upgrade to Scala 2.12. https://issues.apache.org/jira/browse/SPARK-14220. Especially given that Scala 2.13 is near the vicinity of a release. This is because, there is no recent update on the Jira and related ticket. May be someone is act