date:20200617

Re: GPU Acceleration for spark-3.0.0

2020-06-17 Thread charles_cai

Bobby Thanks for your answer, it seems that I have misunderstood this paragraph in the website : *"GPU-accelerate your Apache Spark 3.0 data science pipelines—without code changes—and speed up data processing and model training while substantially lowering infrastructure costs."* . So if I am goin

java.lang.ClassNotFoundException: com.hortonworks.spark.cloud.commit.PathOutputCommitProtoco

2020-06-17 Thread murat migdisoglu

Hello all, we have a hadoop cluster (using yarn) using s3 as filesystem with s3guard is enabled. We are using hadoop 3.2.1 with spark 2.4.5. When I try to save a dataframe in parquet format, I get the following exception: java.lang.ClassNotFoundException: com.hortonworks.spark.cloud.commit.PathOu

Re: Is Spark Structured Streaming TOTALLY BROKEN (Spark Metadata Issues)

2020-06-17 Thread Rachana Srivastava

Structured Stream Vs Spark Steaming (DStream)? Which is recommended for system stability. Exactly once is NOT first priority. First priority is STABLE system. I am I need to make a decision soon. I need help. Here is the question again. Should I go backward and use Spark Streaming DStream b

Re: Is Spark Structured Streaming TOTALLY BROKEN (Spark Metadata Issues)

2020-06-17 Thread Rachana Srivastava

Frankly speaking I do not care about EXACTLY ONCE... I am OK with ATLEAST ONCE at long as system does not fail every 5 to 7 days with no recovery option. On Wednesday, June 17, 2020, 02:31:50 PM PDT, Rachana Srivastava wrote: Thanks so much TD. Thanks for forwarding your datalake pro

Re: Is Spark Structured Streaming TOTALLY BROKEN (Spark Metadata Issues)

2020-06-17 Thread Rachana Srivastava

Thanks so much TD. Thanks for forwarding your datalake project but at this time we have budget constraints we can only use open source project. I just want the Structured Streaming Application or Spark Streaming DStream Application to run without and issue for a long time.. I do not want the

Re: Is Spark Structured Streaming TOTALLY BROKEN (Spark Metadata Issues)

2020-06-17 Thread Breno Arosa

Kafka-connect (https://docs.confluent.io/current/connect/index.html) may be an easier solution for this use case of just dumping kafka topics. On 17/06/2020 18:02, Jungtaek Lim wrote: Just in case if anyone prefers ASF projects then there are other alternative projects in ASF as well, alphabeti

Re: Is Spark Structured Streaming TOTALLY BROKEN (Spark Metadata Issues)

2020-06-17 Thread Jungtaek Lim

Just in case if anyone prefers ASF projects then there are other alternative projects in ASF as well, alphabetically, Apache Hudi [1] and Apache Iceberg [2]. Both are recently graduated as top level projects. (DISCLAIMER: I'm not involved in both.) BTW it would be nice if we make the metadata impl

Re: Is Spark Structured Streaming TOTALLY BROKEN (Spark Metadata Issues)

2020-06-17 Thread Tathagata Das

Hello Rachana, Getting exactly-once semantics on files and making it scale to a very large number of files are very hard problems to solve. While Structured Streaming + built-in file sink solves the exactly-once guarantee that DStreams could not, it is definitely limited in other ways (scaling in

How to manage offsets in Spark Structured Streaming?

2020-06-17 Thread Rachana Srivastava

Background: I have written a simple spark structured steaming app to move data from Kafka to S3. Found that in order to support exactly-once guarantee spark creates _spark_metadata folder, which ends up growing too large, when the streaming app runs for a long time the metadata folder grows so

Re: unsubscribe

2020-06-17 Thread Jeff Evans

That is not how you unsubscribe. See here: https://gist.github.com/jeff303/ba1906bb7bcb2f2501528a8bb1521b8e On Wed, Jun 17, 2020 at 8:56 AM DIALLO Ibrahima (BPCE-IT - Consultime) wrote: > > > > > *Ibrahima DIALLO* > > *Consultant Big Data – Architecte - Analyste* > > *Consultime * - *Pour BPCE-

unsubscribe

2020-06-17 Thread DIALLO Ibrahima (BPCE-IT - Consultime)

Ibrahima DIALLO Consultant Big Data - Architecte - Analyste Consultime - Pour BPCE-IT - Groupe BPCE D2I_FDT_DMA_BD2 BPCE Infogérance & Technologies 110 Avenue de France - 75013 PARIS -Tél. : +33185342104 [BPCE ITx200.png]

Is Spark Structured Streaming TOTALLY BROKEN (Spark Metadata Issues)

2020-06-17 Thread Rachana Srivastava

I have written a simple spark structured steaming app to move data from Kafka to S3. Found that in order to support exactly-once guarantee spark creates _spark_metadata folder, which ends up growing too large as the streaming app is SUPPOSE TO run FOREVER. But when the streaming app runs for a l

Re: unsubscribe

2020-06-17 Thread Jeff Evans

That is not how you unsubscribe. See here: https://gist.github.com/jeff303/ba1906bb7bcb2f2501528a8bb1521b8e On Wed, Jun 17, 2020 at 5:39 AM Ferguson, Jon wrote: > > > This message is confidential and subject to terms at: > https://www.jpmorgan.com/emaildisclaimer including on confidential, > pr

unsubscribe

2020-06-17 Thread Ferguson, Jon

This message is confidential and subject to terms at: https://www.jpmorgan.com/emaildisclaimer including on confidential, privileged or legal entity information, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sen

Re: GPU Acceleration for spark-3.0.0

java.lang.ClassNotFoundException: com.hortonworks.spark.cloud.commit.PathOutputCommitProtoco

Re: Is Spark Structured Streaming TOTALLY BROKEN (Spark Metadata Issues)

Re: Is Spark Structured Streaming TOTALLY BROKEN (Spark Metadata Issues)

Re: Is Spark Structured Streaming TOTALLY BROKEN (Spark Metadata Issues)

Re: Is Spark Structured Streaming TOTALLY BROKEN (Spark Metadata Issues)

Re: Is Spark Structured Streaming TOTALLY BROKEN (Spark Metadata Issues)

Re: Is Spark Structured Streaming TOTALLY BROKEN (Spark Metadata Issues)

How to manage offsets in Spark Structured Streaming?

Re: unsubscribe

unsubscribe

Is Spark Structured Streaming TOTALLY BROKEN (Spark Metadata Issues)

Re: unsubscribe

unsubscribe

14 matches

Site Navigation

Mail list logo

Footer information