date:20200819

Ability to have CountVectorizerModel vocab as empty

2020-08-19 Thread Jatin Puri

Hello, This is wrt https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala#L244 require(vocab.length > 0, "The vocabulary size should be > 0. Lower minDF as necessary.") Currently, if `CountVectorizer` is trained on an empty dataset resu

Re: Spark3 on k8S reading encrypted data from HDFS with KMS in HA

2020-08-19 Thread Prashant Sharma

-dev Hi, I have used Spark with HDFS encrypted with Hadoop KMS, and it worked well. Somehow, I could not recall, if I had the kubernetes in the mix. Somehow, seeing the error, it is not clear what caused the failure. Can I reproduce this somehow? Thanks, On Sat, Aug 15, 2020 at 7:18 PM Michel Su

RDD which was checkpointed is not checkpointed

2020-08-19 Thread Ivan Petrov

Hi! Seems like I do smth wrong. I call .checkpoint() on RDD, but it's not checkpointed... What do I do wrong? val recordsRDD = convertToRecords(anotherRDD) recordsRDD.checkpoint() logger.info("checkpoint done") logger.info(s"isCheckpointed? ${recordsRDD.isCheckpointed}, getCheckpointFile: ${recor

Re: RDD which was checkpointed is not checkpointed

2020-08-19 Thread Jacob Lynn

Hi Ivan, Unlike cache/persist, checkpoint does not operate in-place but requires the result to be assigned to a new variable. In your case: val recordsRDD = convertToRecords(anotherRDD).checkpoint() Best, Jacob Op wo 19 aug. 2020 om 14:39 schreef Ivan Petrov : > Hi! > Seems like I do smth wron

Re: Ability to have CountVectorizerModel vocab as empty

2020-08-19 Thread Sean Owen

I think that's true. You're welcome to open a pull request / JIRA to remove that requirement. On Wed, Aug 19, 2020 at 3:21 AM Jatin Puri wrote: > > Hello, > > This is wrt > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala#L244 > >

Re: Spark3 on k8S reading encrypted data from HDFS with KMS in HA

2020-08-19 Thread Michel Sumbul

Hi Prashant, I have the problem only on K8S, it's working fine when spark is executed on top of yarn. I'm asking myself if the delegation gets saved, any idea how to check that? Could it be because kms is in HA and spark request 2 delegation token? For the testing, just running spark3 on top of

Re: RDD which was checkpointed is not checkpointed

2020-08-19 Thread Ivan Petrov

i did it and see lineage change BEFORE calling action. No success. Job$ - isCheckpointed? false, getCheckpointFile: None Job$ - recordsRDD.toDebugString: (2) MapPartitionsRDD[7] at map at Job.scala:112 [] | MapPartitionsRDD[6] at map at Job.scala:111 [] | MapPartitionsRDD[5] at map at s

Re: RDD which was checkpointed is not checkpointed

2020-08-19 Thread Russell Spitzer

It determines whether it can use the checkpoint at runtime, so you'll be able to see it in the UI but not in the plan since you are looking at the plan before the job is actually running when it checks to see if it can use the checkpoint in the lineage. Here is a two stage job for example: *scala

Re: RDD which was checkpointed is not checkpointed

2020-08-19 Thread Ivan Petrov

Awesome, thanks for explaining it. ср, 19 авг. 2020 г. в 16:29, Russell Spitzer : > It determines whether it can use the checkpoint at runtime, so you'll be > able to see it in the UI but not in the plan since you are looking at the > plan > before the job is actually running when it checks to se

Re: Ability to have CountVectorizerModel vocab as empty

2020-08-19 Thread Jatin Puri

Thanks Sean for the quick response. Logged a Jira: https://issues.apache.org/jira/browse/SPARK-32662 Will send a pull request shortly. Regards, Jatin On Wed, Aug 19, 2020 at 6:58 PM Sean Owen wrote: > I think that's true. You're welcome to open a pull request / JIRA to > remove that requireme

Ability to have CountVectorizerModel vocab as empty

Re: Spark3 on k8S reading encrypted data from HDFS with KMS in HA

RDD which was checkpointed is not checkpointed

Re: RDD which was checkpointed is not checkpointed

Re: Ability to have CountVectorizerModel vocab as empty

Re: Spark3 on k8S reading encrypted data from HDFS with KMS in HA

Re: RDD which was checkpointed is not checkpointed

Re: RDD which was checkpointed is not checkpointed

Re: RDD which was checkpointed is not checkpointed

Re: Ability to have CountVectorizerModel vocab as empty

10 matches

Site Navigation

Mail list logo

Footer information