Thank you all for your help.
The issue was caused by few failed disks in the cluster. Right after they
had been replaced everything worked well. Looking forward to moving to
spark 3.0 which is able to manage corrupted shuffle blocks
Cheers, Mike Pryakhin.
On Wed, 28 Aug 2019 at 03:44, Darshan Pa
thank you, I will check it out
Yaniv Harpaz
[ yaniv.harpaz at gmail.com ]
On Wed, Aug 28, 2019 at 7:14 AM Rao, Abhishek (Nokia - IN/Bangalore) <
abhishek@nokia.com> wrote:
> Hi,
>
>
>
> We have seen this issue when we tried to bringup the UI on custom ingress
> path (default ingress path “
Hi,
Is groupBy and partition are similar in this scenario?
I know they are not similar and mean for different purpose but I am
confused here.
Still I need to do partitioning here to save into Cassandra ?
Below is my scenario.
I am using spark-sql-2.4.1 ,spark-cassandra-connector_2.11-2.4.1 with
Hi,
We have seen this issue when we tried to bringup the UI on custom ingress path
(default ingress path “/” works). Do you also have similar configuration?
We tired setting spark.ui.proxyBase and spark.ui.reverseProxy but did not help.
As a workaround, we’re using ingress port (port on edge nod
I wonder if there's some recommended method to convert in memory
pyarrow.Table (or pyarrow.BatchRecord) to pyspark.Dataframe without using
pandas ?
My motivation is about converting nested data (like List[int]) that have an
efficient representation in pyarrow which is not possible with Pandas (I
do
you can also try to
set "spark.io.compression.codec" to "snappy" to try a different compression
codec
On Fri, Aug 16, 2019 at 10:14 AM Vadim Semenov
wrote:
> This is what you're looking for:
>
> Handle large corrupt shuffle blocks
> https://issues.apache.org/jira/browse/SPARK-26089
>
> So until
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#basic-concepts
*Note that Structured Streaming does not materialize the entire table*. It
> reads the latest available data from the streaming data source, processes
> it incrementally to update the result, and then d
I have a quick newbie question.
Spark Structured Streaming creates an unbounded dataframe that keeps
appending rows to it.
So what's the max size of data it can hold? What if the size becomes bigger
than the JVM? Will it spill to disk? I'm using S3 as storage. So will it
write temp data on S3 or
Hello!
We're running Spark 2.3.0 on Scala 2.11. We have a number of Spark
Streaming jobs that are using MapWithState. We've observed that these jobs
will complete some set of stages, and then not schedule the next set of
stages. It looks like the DAG Scheduler correctly identifies required
stag
Hi all,
We are attempting to come up with a blue-green deployment strategy for our
structured streaming job to minimize down time. The general flow would be:
1. Job A is currently streaming
2. Job B comes up and starts loading Job A state without starting its query.
3. Job B completes
hello guys,
when I launch driver pods or even when I use docker run with the spark
image,
the spark master UI (8080) works great,
but the sparkUI (4040) is loading w/o the CSS
when I dig a bit deeper I see
"Refused to apply style from '' because its MIME type ('text/html') is
not supported stylesh
11 matches
Mail list logo