date:20190827

Re: Stream is corrupted in ShuffleBlockFetcherIterator

2019-08-27 Thread Mikhail Pryakhin

Thank you all for your help. The issue was caused by few failed disks in the cluster. Right after they had been replaced everything worked well. Looking forward to moving to spark 3.0 which is able to manage corrupted shuffle blocks Cheers, Mike Pryakhin. On Wed, 28 Aug 2019 at 03:44, Darshan Pa

Re: web access to sparkUI on docker or k8s pods

2019-08-27 Thread Yaniv Harpaz

thank you, I will check it out Yaniv Harpaz [ yaniv.harpaz at gmail.com ] On Wed, Aug 28, 2019 at 7:14 AM Rao, Abhishek (Nokia - IN/Bangalore) < abhishek@nokia.com> wrote: > Hi, > > > > We have seen this issue when we tried to bringup the UI on custom ingress > path (default ingress path “

Is groupBy and partition are similar in this scenario? Still I need to do paritioning here to save into Cassandra ?

2019-08-27 Thread Shyam P

Hi, Is groupBy and partition are similar in this scenario? I know they are not similar and mean for different purpose but I am confused here. Still I need to do partitioning here to save into Cassandra ? Below is my scenario. I am using spark-sql-2.4.1 ,spark-cassandra-connector_2.11-2.4.1 with

RE: web access to sparkUI on docker or k8s pods

2019-08-27 Thread Rao, Abhishek (Nokia - IN/Bangalore)

Hi, We have seen this issue when we tried to bringup the UI on custom ingress path (default ingress path “/” works). Do you also have similar configuration? We tired setting spark.ui.proxyBase and spark.ui.reverseProxy but did not help. As a workaround, we’re using ingress port (port on edge nod

question about pyarrow.Table to pyspark.DataFrame conversion

2019-08-27 Thread Artem Kozhevnikov

I wonder if there's some recommended method to convert in memory pyarrow.Table (or pyarrow.BatchRecord) to pyspark.Dataframe without using pandas ? My motivation is about converting nested data (like List[int]) that have an efficient representation in pyarrow which is not possible with Pandas (I do

Re: Stream is corrupted in ShuffleBlockFetcherIterator

2019-08-27 Thread Darshan Pandya

you can also try to set "spark.io.compression.codec" to "snappy" to try a different compression codec On Fri, Aug 16, 2019 at 10:14 AM Vadim Semenov wrote: > This is what you're looking for: > > Handle large corrupt shuffle blocks > https://issues.apache.org/jira/browse/SPARK-26089 > > So until

Re: Structured Streaming Dataframe Size

2019-08-27 Thread Tathagata Das

https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#basic-concepts *Note that Structured Streaming does not materialize the entire table*. It > reads the latest available data from the streaming data source, processes > it incrementally to update the result, and then d

Structured Streaming Dataframe Size

2019-08-27 Thread Nick Dawes

I have a quick newbie question. Spark Structured Streaming creates an unbounded dataframe that keeps appending rows to it. So what's the max size of data it can hold? What if the size becomes bigger than the JVM? Will it spill to disk? I'm using S3 as storage. So will it write temp data on S3 or

Driver - Stops Scheduling Streaming Jobs

2019-08-27 Thread Bryan Jeffrey

Hello! We're running Spark 2.3.0 on Scala 2.11. We have a number of Spark Streaming jobs that are using MapWithState. We've observed that these jobs will complete some set of stages, and then not schedule the next set of stages. It looks like the DAG Scheduler correctly identifies required stag

Blue-Green Deployment of Structured Streaming

2019-08-27 Thread Cressy, Taylor

Hi all, We are attempting to come up with a blue-green deployment strategy for our structured streaming job to minimize down time. The general flow would be: 1. Job A is currently streaming 2. Job B comes up and starts loading Job A state without starting its query. 3. Job B completes

web access to sparkUI on docker or k8s pods

2019-08-27 Thread Yaniv Harpaz

hello guys, when I launch driver pods or even when I use docker run with the spark image, the spark master UI (8080) works great, but the sparkUI (4040) is loading w/o the CSS when I dig a bit deeper I see "Refused to apply style from '' because its MIME type ('text/html') is not supported stylesh

Re: Stream is corrupted in ShuffleBlockFetcherIterator

Re: web access to sparkUI on docker or k8s pods

Is groupBy and partition are similar in this scenario? Still I need to do paritioning here to save into Cassandra ?

RE: web access to sparkUI on docker or k8s pods

question about pyarrow.Table to pyspark.DataFrame conversion

Re: Stream is corrupted in ShuffleBlockFetcherIterator

Re: Structured Streaming Dataframe Size

Structured Streaming Dataframe Size

Driver - Stops Scheduling Streaming Jobs

Blue-Green Deployment of Structured Streaming

web access to sparkUI on docker or k8s pods

11 matches

Site Navigation

Mail list logo

Footer information