Unsubscribe
On Fri, Jan 15, 2021 at 9:52 AM Dilip Desavali
wrote:
> Unsubscribe
>
Unsubscribe
You need to make sure the delta-core_2.11-0.6.1. jar file in your
$SPARK_HOME/jars folder.
On Thu, Jan 14, 2021 at 4:59 AM András Kolbert
wrote:
> sorry missed out a bit. Added, highlighted with yellow.
>
> On Thu, 14 Jan 2021 at 13:54, András Kolbert
> wrote:
>
>> Thanks, Muru, very helpful su
sorry missed out a bit. Added, highlighted with yellow.
On Thu, 14 Jan 2021 at 13:54, András Kolbert
wrote:
> Thanks, Muru, very helpful suggestion! Delta Lake is amazing, completely
> changed a few of my projects!
>
> One question regarding that.
> When I use the following statement, all works
Thanks, Muru, very helpful suggestion! Delta Lake is amazing, completely
changed a few of my projects!
One question regarding that.
When I use the following statement, all works fine and I can use delta
properly, in the spark context that jupyter initiates automatically.
export PYSPARK_DRIVER_PYT
You could try Delta Lake or Apache Hudi for this use case.
On Sat, Jan 9, 2021 at 12:32 PM András Kolbert
wrote:
> Sorry if my terminology is misleading.
>
> What I meant under driver only is to use a local pandas dataframe (collect
> the data to the master), and keep updating that instead of de
Sorry if my terminology is misleading.
What I meant under driver only is to use a local pandas dataframe (collect
the data to the master), and keep updating that instead of dealing with a
spark distributed dataframe for holding this data.
For example, we have a dataframe with all users and their
Could you please clarify what do you mean by 1)? Driver is only
responsible for submitting Spark job, not performing.
-- ND
On 1/9/21 9:35 AM, András Kolbert wrote:
Hi,
I would like to get your advice on my use case.
I have a few spark streaming applications where I need to keep
updating a da
Hi,
I would like to get your advice on my use case.
I have a few spark streaming applications where I need to keep updating a
dataframe after each batch. Each batch probably affects a small fraction of
the dataframe (5k out of 200k records).
The options I have been considering so far:
1) keep data