subject:"Re\: Reading from a centralized stored"

Re: Reading from a centralized stored

2015-01-06 Thread Franc Carter

Ah, so it's rdd specific - that would make sense. For those systems where it is possible to extract sensible susbets the rdds do so. My use case, which is probably biasing my thinking is DynamoDb which I don't think can efficiently extract records from M-to-N cheers On Wed, Jan 7, 2015 at 6:59 AM

Re: Reading from a centralized stored

2015-01-06 Thread Cody Koeninger

No, most rdds partition input data appropriately. On Tue, Jan 6, 2015 at 1:41 PM, Franc Carter wrote: > > One more question, to be clarify. Will every node pull in all the data ? > > thanks > > On Tue, Jan 6, 2015 at 12:56 PM, Cody Koeninger > wrote: > >> If you are not co-locating spark execut

Re: Reading from a centralized stored

2015-01-06 Thread Franc Carter

One more question, to be clarify. Will every node pull in all the data ? thanks On Tue, Jan 6, 2015 at 12:56 PM, Cody Koeninger wrote: > If you are not co-locating spark executor processes on the same machines > where the data is stored, and using an rdd that knows about which node to > prefer

Re: Reading from a centralized stored

2015-01-05 Thread Franc Carter

Thanks, that's what I suspected. cheers On Tue, Jan 6, 2015 at 12:56 PM, Cody Koeninger wrote: > If you are not co-locating spark executor processes on the same machines > where the data is stored, and using an rdd that knows about which node to > prefer scheduling a task on, yes, the data will

Re: Reading from a centralized stored

2015-01-05 Thread Cody Koeninger

If you are not co-locating spark executor processes on the same machines where the data is stored, and using an rdd that knows about which node to prefer scheduling a task on, yes, the data will be pulled over the network. Of the options you listed, S3 and DynamoDB cannot have spark running on the

Re: Reading from a centralized stored

Re: Reading from a centralized stored

Re: Reading from a centralized stored

Re: Reading from a centralized stored

Re: Reading from a centralized stored

5 matches

Site Navigation

Mail list logo

Footer information