I think Spark is trying to ensure that it reads the input "continuously"
without any missing. Technically it may be valid to say the situation is a
kind of "data-loss", as the query couldn't process the offsets which are
being thrown out, and owner of the query needs to be careful as it affects
the
There are many use case cases for Spark. A google search with "Use cases for
apache spark" will give you all the information that you need.
On Tue, 14 Apr 2020 18:44:59 -0400 janethor...@aol.com.INVALID wrote
I did write a long email in response to you.
But then I deleted it because
I see, I wasn’t sure if that would work as expected. The docs seems to
suggest to be careful before turning off that option, and I’m not sure why
failOnDataLoss is true by default.
On Tue, Apr 14, 2020 at 5:16 PM Burak Yavuz wrote:
> Just set `failOnDataLoss=false` as an option in readStream?
>
Just set `failOnDataLoss=false` as an option in readStream?
On Tue, Apr 14, 2020 at 4:33 PM Ruijing Li wrote:
> Hi all,
>
> I have a spark structured streaming app that is consuming from a kafka
> topic with retention set up. Sometimes I face an issue where my query has
> not finished processing
Hi all,
I have a spark structured streaming app that is consuming from a kafka
topic with retention set up. Sometimes I face an issue where my query has
not finished processing a message but the retention kicks in and deletes
the offset, which since I use the default setting of “failOnDataLoss=tru
I did write a long email in response to you.
But then I deleted it because I felt it would be too revealing.
On Tuesday, 14 April 2020 David Hesson wrote:
I want to know if Spark is headed in my direction.
You are implying Spark could be.
What direction are you headed in, exactly? I
>
> I want to know if Spark is headed in my direction.
>
You are implying Spark could be.
What direction are you headed in, exactly? I don't feel as if anything were
implied when you were asked for use cases or what problem you are solving.
You were asked to identify some use cases, of which yo
That's what I want to know, Use Cases.
I am looking for direction as I described and I want to know if Spark is
headed in my direction.
You are implying Spark could be.
So tell me about the USE CASES and I'll do the rest.
On Tuesday, 14 April 2020 yeikel valdes wrote:
It depends on y
Hi,
I am trying to setup a cross region Apache Spark cluster. All my data are
stored in Amazon S3 and well partitioned by region.
For example, I have parquet file at
S3://mybucket/sales_fact.parquet/us-west
S3://mybucket/sales_fact.parquet/us-east
S3://mybucket/sales_fact.parquet/uk
It depends on your use case. What are you trying to solve?
On Tue, 14 Apr 2020 15:36:50 -0400 janethor...@aol.com.INVALID wrote
Hi,
I consider myself to be quite good in Software Development especially using
frameworks.
I like to get my hands dirty. I have spent the last few mo
Hi,
I consider myself to be quite good in Software Development especially using
frameworks.
I like to get my hands dirty. I have spent the last few months understanding
modern frameworks and architectures.
I am looking to invest my energy in a product where I don't have to relying on
the
Looking at the results of explain, I can see a CollectLimit step. Does that
work the same way as a regular .collect() ? (where all records are sent to
the driver?)
spark.sql("select * from db.table limit 100").explain(false)
== Physical Plan ==
CollectLimit 100
+- FileScan parquet ... 806
I do not know the answer to this question so I am also looking for it, but
@kant maybe the generated code can help with this.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-un
In my team , we get elevated access to our Spark cluster using a common
username which means that we all share the same history. I am not sure if
this is common , but unfortunately there is nothing I can do about it.
Is there any option to set the location of the history? I am looking for
someth
Hi all,
I have been trying to write batch-synchronized incremental graph algorithms.
More specifically, I want to run an increment algorithm on a given data-set and
when a new batch arrives, I want to start the algorithm from last snapshot, and
run the algorithm on the vertices that are effecte
+1 on the previous guess and additionally I suggest to reproduce it with
vanilla Spark.
Amazon Spark contains modifications which not available in vanilla Spark
which makes problem hunting hard or impossible.
Such case amazon can help...
On Tue, Apr 14, 2020 at 11:20 AM ZHANG Wei wrote:
> I will
The simplest way is to do thread dump which doesn't require any fancy tool
(it's available on Spark UI).
Without thread dump it's hard to say anything...
On Tue, Apr 14, 2020 at 11:32 AM jane thorpe
wrote:
> Here a is another tool I use Logic Analyser 7:55
> https://youtu.be/LnzuMJLZRdU
>
> yo
Hi,
Could you share the code that you're using to configure the connection to
the Kafka broker?
This is a bread-and-butter feature. My first thought is that there's
something in your particular setup that prevents this from working.
kind regards, Gerard.
On Fri, Apr 10, 2020 at 7:34 PM Debabrat
Sorry, hit the send accidentally...
The symptom is simple, the broker is not responding in 120 seconds.
That's the reason why Debabrata asked the broker config.
What I can suggest is to check the previous printout which logs the Kafka
consumer settings.
With the mentioned settings you can start a
The symptom is simple, the broker is not responding in 120 seconds.
That's the reason why Debabrata asked the broker config.
What I can suggest is to check the previous printout which logs the Kafka
consumer settings.
With
On Tue, Apr 14, 2020 at 11:44 AM ZHANG Wei wrote:
> Here is the asserti
Provided caching is activated for a RDD, does each executor of a cluster only
cache the partitions it requires for its computations or always the full
RDD?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To
Here is the assertion error message format:
s"Failed to get records for $groupId $topic $partition $offset after polling
for $timeout")
You might have to check the kafka service with the error log:
> 20/04/10 17:28:04 ERROR Executor: Exception in task 0.5 in stage 0.0 (TID 24)
> java.lang.As
Here a is another tool I use Logic Analyser 7:55
https://youtu.be/LnzuMJLZRdU
you could take some suggestions for improving performance queries.
https://dzone.com/articles/why-you-should-not-use-select-in-sql-query-1
Jane thorpe
janethor...@aol.com
-Original Message-
From: jane t
I will make a guess, it's not interruptted, it's killed by the driver or the
resource manager since the executor fallen into sleep for a long time.
You may have to find the root cause in the driver and failed executor log
contexts.
--
Cheers,
-z
From: L
24 matches
Mail list logo