There were recently some fantastic talks about this at the SparkSummit
conference in San Francisco. I suggest you check out the SparkSummit YouTube
channel after May 9th for a deep dive into this topic.
From: rajat kumar
Date: Monday, April 29, 2019 at 9:34 AM
To: "user@spark.apache.org"
Subj
.
Please expand on what you're trying to achieve here.
--
Michael Mansour
Data Scientist
Symantec CASB
On 4/28/18, 8:41 AM, "klrmowse" wrote:
i am currently trying to find a workaround for the Spark application i am
working on so that it does not have to use .collect()
pass it into the function. This
alleviates the need to write debugging code etc. I find this model useful and
a bit more fast, but it does not offer the step-through capability.
Best of luck!
M
--
Michael Mansour
Data Scientist
Symantec CASB
From: Vitaliy Pisarev
Date: Sunday, March 11, 2018 at 8
Toy,
I suggest your partition your data according to date, and use the
forEachPartition function, using the partition as the bucket location.
This would require you to define a custom hash partitioner function, but that
is not too difficult.
--
Michael Mansour
Data Scientist
Symantec
From: Toy
Hi all,
I’m poking around the Pyspark.Broadcast module, and I notice that one can pass
in a `pickle_registry` and a `path`. The documentation does not outline the
pickle registry use and I’m curious about how to use it, and if there are any
advantages to it.
Thanks,
Michael Mansour
expression” tool, and pass them through the function In expression evaluator.
Hope this helps --
Michael Mansour
--
Michael Mansour
Data Scientist
Symantec Cloud Security
From: Pavel Klemenkov
Date: Wednesday, May 10, 2017 at 10:43 AM
To: "user@spark.apache.org"
Subject: [EXT] Re: [