Hello,
This is wrt
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala#L244
require(vocab.length > 0, "The vocabulary size should be > 0. Lower minDF
as necessary.")
Currently, if `CountVectorizer` is trained on an empty dataset resu
-dev
Hi,
I have used Spark with HDFS encrypted with Hadoop KMS, and it worked well.
Somehow, I could not recall, if I had the kubernetes in the mix. Somehow,
seeing the error, it is not clear what caused the failure. Can I reproduce
this somehow?
Thanks,
On Sat, Aug 15, 2020 at 7:18 PM Michel Su
Hi!
Seems like I do smth wrong. I call .checkpoint() on RDD, but it's not
checkpointed...
What do I do wrong?
val recordsRDD = convertToRecords(anotherRDD)
recordsRDD.checkpoint()
logger.info("checkpoint done")
logger.info(s"isCheckpointed? ${recordsRDD.isCheckpointed},
getCheckpointFile: ${recor
Hi Ivan,
Unlike cache/persist, checkpoint does not operate in-place but requires the
result to be assigned to a new variable. In your case:
val recordsRDD = convertToRecords(anotherRDD).checkpoint()
Best,
Jacob
Op wo 19 aug. 2020 om 14:39 schreef Ivan Petrov :
> Hi!
> Seems like I do smth wron
I think that's true. You're welcome to open a pull request / JIRA to
remove that requirement.
On Wed, Aug 19, 2020 at 3:21 AM Jatin Puri wrote:
>
> Hello,
>
> This is wrt
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala#L244
>
>
Hi Prashant,
I have the problem only on K8S, it's working fine when spark is executed on
top of yarn.
I'm asking myself if the delegation gets saved, any idea how to check that?
Could it be because kms is in HA and spark request 2 delegation token?
For the testing, just running spark3 on top of
i did it and see lineage change
BEFORE calling action. No success.
Job$ - isCheckpointed? false, getCheckpointFile: None
Job$ - recordsRDD.toDebugString:
(2) MapPartitionsRDD[7] at map at Job.scala:112 []
| MapPartitionsRDD[6] at map at Job.scala:111 []
| MapPartitionsRDD[5] at map at s
It determines whether it can use the checkpoint at runtime, so you'll be
able to see it in the UI but not in the plan since you are looking at the
plan
before the job is actually running when it checks to see if it can use the
checkpoint in the lineage.
Here is a two stage job for example:
*scala
Awesome, thanks for explaining it.
ср, 19 авг. 2020 г. в 16:29, Russell Spitzer :
> It determines whether it can use the checkpoint at runtime, so you'll be
> able to see it in the UI but not in the plan since you are looking at the
> plan
> before the job is actually running when it checks to se
Thanks Sean for the quick response.
Logged a Jira: https://issues.apache.org/jira/browse/SPARK-32662
Will send a pull request shortly.
Regards,
Jatin
On Wed, Aug 19, 2020 at 6:58 PM Sean Owen wrote:
> I think that's true. You're welcome to open a pull request / JIRA to
> remove that requireme
10 matches
Mail list logo