Hi,
For testing things like this you have a couple of options, you could isolate
all your business logic separately from your read/write/spark code which, in my
experience, makes the code harder to write and manage.
The other option is to accept that tests will be slower than you would expect
Hi,
My understanding of window functions is that they can only operate on
fixed window sizes. For example, I can create a window like the following:
Window.partitionBy("group_identifier").orderBy("sequencial_counter").rowsBetween(-4,
5)
or even:
Window.partitionBy("group_identifier").o
To whom it may concern,
Hope this email finds you well.
I am trying to download spark but I was not able to select the release and
package type. Could you please help me with this?
Thank you.
Best,
Ming
[image: screenshot.png]
Works for me - do you have javascript disabled? it will be necessary.
On Wed, Jul 15, 2020 at 11:52 AM Ming Liao wrote:
> To whom it may concern,
>
> Hope this email finds you well.
> I am trying to download spark but I was not able to select the release and
> package type. Could you please help
Why do you need to mock the read/write at all? Why not have your test CSV
file, and invoke it (which will perform the real Spark DF read of CSV),
write it, and assert on the output?
On Tue, Jul 14, 2020 at 12:19 PM Dark Crusader
wrote:
> Sorry I wasn't very clear in my last email.
>
> I have a
Hi all,
For our use case, we would like to perform an aggregation using a
pandas_udf with dataframes that have O(100m) rows and a few 10s of
columns. Conceptually, this looks a bit like pyspark.RDD.aggregate,
where the user provides:
* A "seqOp" which accepts pandas series(*) and outputs an inter