date:20161203

Spark shuffle: FileNotFound exception

2016-12-03 Thread Swapnil Shinde

Hello All I am facing FileNotFoundException for shuffle index file when running job with large data. Same job runs fine with smaller datasets. These our my cluster specifications - No of nodes - 19 Total cores - 380 Memory per executor - 32G Spark 1.6 mapr version spark.shuffle.service.enabled

Re: Unsubscribe

2016-12-03 Thread kote rao

unsubscribe From: S Malligarjunan Sent: Saturday, December 3, 2016 11:55:41 AM To: user@spark.apache.org Subject: Re: Unsubscribe Unsubscribe Thanks and Regards, Malligarjunan S. On Saturday, 3 December 2016, 20:42, Sivakumar S wrote: Unsubscribe

Unsubscribe

2016-12-03 Thread S Malligarjunan

Unsubscribe Thanks and Regards,Malligarjunan S.

Re: Unsubscribe

2016-12-03 Thread S Malligarjunan

Unsubscribe Thanks and Regards,Malligarjunan S. On Saturday, 3 December 2016, 20:42, Sivakumar S wrote: Unsubscribe

Re: What benefits do we really get out of colocation?

2016-12-03 Thread kant kodali

ephemeral storage on ssd will be very painful to maintain especially with large datasets. we will pretty soon have somewhere in PB. I am thinking to leverage something like below. But not sure how much performance gain we could get out of that. https://github.com/stec-inc/EnhanceIO On Sat, Dec 3

Design patterns for Spark implementation

2016-12-03 Thread Vasu Gourabathina

Hi, I know this is a broad question. If this is not the right forum, appreciate if you can point to other sites/areas that may be helpful. Before posing this question, I did use our friend Google, but sanitizing the query results from my need angle hasn't been easy. Who I am: - Have done data

Re: What benefits do we really get out of colocation?

2016-12-03 Thread vincent gromakowski

What about ephemeral storage on ssd ? If performance is required it's generally for production so the cluster would never be stopped. Then a spark job to backup/restore on S3 allows to shut down completely the cluster Le 3 déc. 2016 1:28 PM, "David Mitchell" a écrit : > To get a node local read

Unsubscribe

2016-12-03 Thread Sivakumar S

Unsubscribe

Re: What benefits do we really get out of colocation?

2016-12-03 Thread David Mitchell

To get a node local read from Spark to Cassandra, one has to use a read consistency level of LOCAL_ONE. For some use cases, this is not an option. For example, if you need to use a read consistency level of LOCAL_QUORUM, as many use cases demand, then one is not going to get a node local read. A

Parquet timestamp storage in Hive and possible use case of spark instead of impala

2016-12-03 Thread Mich Talebzadeh

guys, This is my suggestion. Use Spark SQL instead of Impala from Hive tables to get correct timestamp values all the time. The situation is explained below: I have come across a situation where a multi-tenant cluster is being used to read and write to Parquet file. This causes some issues as I

Re: What benefits do we really get out of colocation?

2016-12-03 Thread Steve Loughran

On 3 Dec 2016, at 09:16, Manish Malhotra mailto:manish.malhotra.w...@gmail.com>> wrote: thanks for sharing number as well ! Now a days even network can be with very high throughput, and might out perform the disk, but as Sean mentioned data on network will have other dependencies like network

Re: What benefits do we really get out of colocation?

2016-12-03 Thread kant kodali

hmm GCE pretty much seems to follow the same model as AWS. On Sat, Dec 3, 2016 at 1:22 AM, kant kodali wrote: > GCE seems to have better options. Any one had any experience with GCE? > > On Sat, Dec 3, 2016 at 1:16 AM, Manish Malhotra < > manish.malhotra.w...@gmail.com> wrote: > >> thanks for sh

Re: What benefits do we really get out of colocation?

2016-12-03 Thread kant kodali

GCE seems to have better options. Any one had any experience with GCE? On Sat, Dec 3, 2016 at 1:16 AM, Manish Malhotra < manish.malhotra.w...@gmail.com> wrote: > thanks for sharing number as well ! > > Now a days even network can be with very high throughput, and might out > perform the disk, but

Re: What benefits do we really get out of colocation?

2016-12-03 Thread Manish Malhotra

thanks for sharing number as well ! Now a days even network can be with very high throughput, and might out perform the disk, but as Sean mentioned data on network will have other dependencies like network hops, like if its across rack, which can have switch in between. But yes people are discuss

Re: What benefits do we really get out of colocation?

2016-12-03 Thread kant kodali

Forgot to mention my entire cluster is on one DC. so if it is across multiple DC's then colocating does makes sense in theory as well. On Sat, Dec 3, 2016 at 1:12 AM, kant kodali wrote: > Thanks Sean! Just for the record I am currently seeing 95 MB/s RX (Receive > throughput ) on my spark worker

Re: What benefits do we really get out of colocation?

2016-12-03 Thread kant kodali

Thanks Sean! Just for the record I am currently seeing 95 MB/s RX (Receive throughput ) on my spark worker machine when I do `sudo iftop -B` The problem with instance store on AWS is that they all are ephemeral so placing Cassandra on top doesn't make a lot of sense. so In short, AWS doesn't seem

Re: What benefits do we really get out of colocation?

2016-12-03 Thread Sean Owen

I'm sure he meant that this is downside to not colocating. You are asking the right question. While networking is traditionally much slower than disk, that changes a bit in the cloud, where attached storage is remote too. The disk throughput here is mostly achievable in normal workloads. However I

Re: What benefits do we really get out of colocation?

2016-12-03 Thread kant kodali

wait, how is that a benefit? isn't that a bad thing if you are saying colocating leads to more latency and overall execution time is longer? On Sat, Dec 3, 2016 at 12:34 AM, vincent gromakowski < vincent.gromakow...@gmail.com> wrote: > You get more latency on reads so overall execution time is l

Re: What benefits do we really get out of colocation?

2016-12-03 Thread vincent gromakowski

You get more latency on reads so overall execution time is longer Le 3 déc. 2016 7:39 AM, "kant kodali" a écrit : > > I wonder what benefits do I really I get If I colocate my spark worker > process and Cassandra server process on each node? > > I understand the concept of moving compute towards

Spark shuffle: FileNotFound exception

Re: Unsubscribe

Unsubscribe

Re: Unsubscribe

Re: What benefits do we really get out of colocation?

Design patterns for Spark implementation

Re: What benefits do we really get out of colocation?

Unsubscribe

Re: What benefits do we really get out of colocation?

Parquet timestamp storage in Hive and possible use case of spark instead of impala

Re: What benefits do we really get out of colocation?

Re: What benefits do we really get out of colocation?

Re: What benefits do we really get out of colocation?

Re: What benefits do we really get out of colocation?

Re: What benefits do we really get out of colocation?

Re: What benefits do we really get out of colocation?

Re: What benefits do we really get out of colocation?

Re: What benefits do we really get out of colocation?

Re: What benefits do we really get out of colocation?

19 matches

Site Navigation

Mail list logo

Footer information