Hi.
This bug still exists in 2.4.4:
https://issues.apache.org/jira/browse/SPARK-10848
The `nullable` value is always set as `true` atleast when reading via `json()`.
Should I log a new issue?
Is there a temporary workaround?
Regards,
Jatin
May be this is a bug. The source can be found at:
https://github.com/purijatin/spark-retrain-bug
*Issue:*
The program takes input a set of documents. Where each document is in a
separate file.
The spark program tf-idf of the terms (Tokenizer -> Stopword remover ->
stemming -> tf -> tfidf).
Once
Hello.
As part of `org.apache.spark.ml.feature.IDFModel`, I think it is a good
idea to also expose:
1. Document frequency vector
2. Number of documents
We get the above for free currently and they just need to be exposed as
public val.
This avoids re-implementation for someone who needs to comp
one long and is already computed. This would
> have to be added to Pyspark too.
>
> On Mon, Jan 14, 2019 at 7:56 AM Jatin Puri wrote:
> >
> > Hello.
> >
> > As part of `org.apache.spark.ml.feature.IDFModel`, I think it is a good
> idea to also expose:
> >
&
Thanks. Created: https://issues.apache.org/jira/browse/SPARK-26616
On Mon, Jan 14, 2019 at 9:19 PM Sean Owen wrote:
> Yes that seems OK to me.
>
> On Mon, Jan 14, 2019 at 9:40 AM Jatin Puri wrote:
> >
> > Thanks for the response. So do I go ahead and create a jira ticket