Bobby
Thanks for your answer, it seems that I have misunderstood this paragraph in
the website : *"GPU-accelerate your Apache Spark 3.0 data science
pipelines—without code changes—and speed up data processing and model
training while substantially lowering infrastructure costs."* . So if I am
goin
Hello all,
we have a hadoop cluster (using yarn) using s3 as filesystem with s3guard
is enabled.
We are using hadoop 3.2.1 with spark 2.4.5.
When I try to save a dataframe in parquet format, I get the following
exception:
java.lang.ClassNotFoundException:
com.hortonworks.spark.cloud.commit.PathOu
Structured Stream Vs Spark Steaming (DStream)?
Which is recommended for system stability. Exactly once is NOT first priority.
First priority is STABLE system.
I am I need to make a decision soon. I need help. Here is the question again.
Should I go backward and use Spark Streaming DStream b
Frankly speaking I do not care about EXACTLY ONCE... I am OK with ATLEAST ONCE
at long as system does not fail every 5 to 7 days with no recovery option.
On Wednesday, June 17, 2020, 02:31:50 PM PDT, Rachana Srivastava
wrote:
Thanks so much TD. Thanks for forwarding your datalake pro
Thanks so much TD. Thanks for forwarding your datalake project but at this
time we have budget constraints we can only use open source project.
I just want the Structured Streaming Application or Spark Streaming DStream
Application to run without and issue for a long time.. I do not want the
Kafka-connect (https://docs.confluent.io/current/connect/index.html) may
be an easier solution for this use case of just dumping kafka topics.
On 17/06/2020 18:02, Jungtaek Lim wrote:
Just in case if anyone prefers ASF projects then there are other
alternative projects in ASF as well, alphabeti
Just in case if anyone prefers ASF projects then there are other
alternative projects in ASF as well, alphabetically, Apache Hudi [1] and
Apache Iceberg [2]. Both are recently graduated as top level projects.
(DISCLAIMER: I'm not involved in both.)
BTW it would be nice if we make the metadata impl
Hello Rachana,
Getting exactly-once semantics on files and making it scale to a very large
number of files are very hard problems to solve. While Structured Streaming
+ built-in file sink solves the exactly-once guarantee that DStreams could
not, it is definitely limited in other ways (scaling in
Background: I have written a simple spark structured steaming app to move
data from Kafka to S3. Found that in order to support exactly-once guarantee
spark creates _spark_metadata folder, which ends up growing too large, when the
streaming app runs for a long time the metadata folder grows so
That is not how you unsubscribe. See here:
https://gist.github.com/jeff303/ba1906bb7bcb2f2501528a8bb1521b8e
On Wed, Jun 17, 2020 at 8:56 AM DIALLO Ibrahima (BPCE-IT - Consultime)
wrote:
>
>
>
>
> *Ibrahima DIALLO*
>
> *Consultant Big Data – Architecte - Analyste*
>
> *Consultime * - *Pour BPCE-
Ibrahima DIALLO
Consultant Big Data - Architecte - Analyste
Consultime - Pour BPCE-IT - Groupe BPCE
D2I_FDT_DMA_BD2
BPCE Infogérance & Technologies
110 Avenue de France - 75013 PARIS -Tél. : +33185342104
[BPCE ITx200.png]
I have written a simple spark structured steaming app to move data from Kafka
to S3. Found that in order to support exactly-once guarantee spark creates
_spark_metadata folder, which ends up growing too large as the streaming app is
SUPPOSE TO run FOREVER. But when the streaming app runs for a l
That is not how you unsubscribe. See here:
https://gist.github.com/jeff303/ba1906bb7bcb2f2501528a8bb1521b8e
On Wed, Jun 17, 2020 at 5:39 AM Ferguson, Jon
wrote:
>
>
> This message is confidential and subject to terms at:
> https://www.jpmorgan.com/emaildisclaimer including on confidential,
> pr
This message is confidential and subject to terms at:
https://www.jpmorgan.com/emaildisclaimer including on confidential, privileged
or legal entity information, viruses and monitoring of electronic messages. If
you are not the intended recipient, please delete this message and notify the
sen
14 matches
Mail list logo