date:20200812

Spark Streaming with Kafka and Python

2020-08-12 Thread Hamish Whittal

Hi folks, Thought I would ask here because it's somewhat confusing. I'm using Spark 2.4.5 on EMR 5.30.1 with Amazon MSK. The version of Scala used is 2.11.12. I'm using this version of the libraries spark-streaming-kafka-0-8_2.11-2.4.5.jar Now I'm wanting to read from Kafka topics using Python (

Re: Spark Streaming with Kafka and Python

2020-08-12 Thread German Schiavon

Hey, Maybe I'm missing some restriction with EMR, but have you tried to use Structured Streaming instead of Spark Streaming? https://spark.apache.org/docs/2.4.5/structured-streaming-kafka-integration.html Regards On Wed, 12 Aug 2020 at 14:12, Hamish Whittal wrote: > Hi folks, > > Thought I wo

How can I use pyspark to upsert one row without replacing entire table

2020-08-12 Thread Siavash Namvar

Hi, I have a use case, and read data from a db table and need to update few rows based on primary key without replacing the entire table. for instance if I have 3 following rows --- id | fname --- 1 | jo

Re: Spark Streaming with Kafka and Python

2020-08-12 Thread Sean Owen

What supports Python in (Kafka?) 0.8? I don't think Spark ever had a specific Python-Kafka integration. But you have always been able to use it to read DataFrames as in Structured Streaming. Kafka 0.8 support is deprecated (gone in 3.0) but 0.10 means 0.10+ - works with the latest 2.x. What is the

Re: How can I use pyspark to upsert one row without replacing entire table

2020-08-12 Thread Sean Owen

It's not so much Spark but the data format, whether it supports upserts. Parquet, CSV, JSON, etc would not. That is what Delta, Hudi et al are for, and yes you can upsert them in Spark. On Wed, Aug 12, 2020 at 9:57 AM Siavash Namvar wrote: > > Hi, > > I have a use case, and read data from a db ta

Re: How can I use pyspark to upsert one row without replacing entire table

2020-08-12 Thread Siavash Namvar

Thanks Sean, Do you have any URL or reference to help me how to upsert in Spark? I need to update Sybase db On Wed, Aug 12, 2020 at 11:06 AM Sean Owen wrote: > It's not so much Spark but the data format, whether it supports > upserts. Parquet, CSV, JSON, etc would not. > That is what Delta, Hud

Re: How can I use pyspark to upsert one row without replacing entire table

2020-08-12 Thread Nicholas Gustafson

The delta docs have examples of upserting: https://docs.delta.io/0.4.0/delta-update.html#upsert-into-a-table-using-merge > On Aug 12, 2020, at 08:31, Siavash Namvar wrote: > > > Thanks Sean, > > Do you have any URL or reference to help me how to upsert in Spark? I need to > update Sybase db

[Spark SQL]: Rationale for access modifiers and qualifiers in Spark

2020-08-12 Thread 김민우

https://gist.github.com/JoeyValentine/23821c27c1f540a4ac63e446e2243dbc I wonder why the constructor and stateStoreCoordinator are private[sql]. In other words, I want to know the reason why the scope has to be [sql] and not [streaming]. And the next question is: "The reason for using private[sql

Re: How can I use pyspark to upsert one row without replacing entire table

2020-08-12 Thread ed elliott

You’ll need to do an insert and use a trigger on the table to change it into an upsert, also make sure your mode is append rather than overwrite. Ed From: Siavash Namvar Sent: Wednesday, August 12, 2020 4:09:07 PM To: Sean Owen Cc: User Subject: Re: How can I

Re: [SPARK-STRUCTURED-STREAMING] IllegalStateException: Race while writing batch 4

2020-08-12 Thread Jungtaek Lim

File stream sink doesn't support the functionality. There're several approaches to do so: 1) two queries write to Kafka (or any intermediate storage which allows concurrent writes), and let next Spark application read and write to the final path 2) two queries write to two different directories, a

Spark Streaming with Kafka and Python

Re: Spark Streaming with Kafka and Python

How can I use pyspark to upsert one row without replacing entire table

Re: Spark Streaming with Kafka and Python

Re: How can I use pyspark to upsert one row without replacing entire table

Re: How can I use pyspark to upsert one row without replacing entire table

Re: How can I use pyspark to upsert one row without replacing entire table

[Spark SQL]: Rationale for access modifiers and qualifiers in Spark

Re: How can I use pyspark to upsert one row without replacing entire table

Re: [SPARK-STRUCTURED-STREAMING] IllegalStateException: Race while writing batch 4

10 matches

Site Navigation

Mail list logo

Footer information