date:20140909

Re: Junit spark tests

2014-09-09 Thread Sudershan Malpani

Class1.java @Autowired Private ClassX cx; Public list method1(JavaPairRDD data){ List list1 = new ArrayList(); List list2 = new ArrayList(); JavaPairRDD computed = data.map( new Function>() { Public List call(object obj) throws exception {

Re: Junit spark tests

2014-09-09 Thread Reynold Xin

Can you be a little bit more specific, maybe give a code snippet? On Tue, Sep 9, 2014 at 5:14 PM, Sudershan Malpani < sudershan.malp...@gmail.com> wrote: > Hi all, > > I am calling an object which in turn is calling a method inside a map RDD > in spark. While writing the tests how can I mock tha

Junit spark tests

2014-09-09 Thread Sudershan Malpani

Hi all, I am calling an object which in turn is calling a method inside a map RDD in spark. While writing the tests how can I mock that object's call? Currently I did doNothing().when(class).method() is called but it is giving task not serializable exception. I tried making the class both spy a

Re: parquet predicate / projection pushdown into unionAll

2014-09-09 Thread Cody Koeninger

Ok, so looking at the optimizer code for the first time and trying the simplest rule that could possibly work, object UnionPushdown extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transform { // Push down filter into union case f @ Filter(condition, u @ Union

Re: RFC: Deprecating YARN-alpha API's

2014-09-09 Thread Chester Chen

We were using it until recently, we are talking to our customers and see if we can get off it. Chester Alpine Data Labs On Tue, Sep 9, 2014 at 10:59 AM, Sean Owen wrote: > FWIW consensus from Cloudera folk seems to be that there's no need or > demand on this end for YARN alpha. It wouldn't ha

Re: Lost executor on YARN ALS iterations

2014-09-09 Thread Debasish Das

Last time it did not show up on environment tab but I will give it another shot...Expected behavior is that this env variable will show up right ? On Tue, Sep 9, 2014 at 12:15 PM, Sandy Ryza wrote: > I would expect 2 GB would be enough or more than enough for 16 GB > executors (unless ALS is usi

yet another jenkins restart early thursday morning -- 730am PDT (and a brief update on our new jenkins infra)

2014-09-09 Thread shane knapp

since the power incident last thursday, the github pull request builder plugin is still not really working 100%. i found an open issue w/jenkins[1] that could definitely be affecting us, i will be pausing builds early thursday morning and then restarting jenkins. i'll send out a reminder tomorrow,

Re: parquet predicate / projection pushdown into unionAll

2014-09-09 Thread Michael Armbrust

What Patrick said is correct. Two other points: - In the 1.2 release we are hoping to beef up the support for working with partitioned parquet independent of the metastore. - You can actually do operations like INSERT INTO for parquet tables to add data. This creates new parquet files for each

Re: Lost executor on YARN ALS iterations

2014-09-09 Thread Sandy Ryza

I would expect 2 GB would be enough or more than enough for 16 GB executors (unless ALS is using a bunch of off-heap memory?). You mentioned earlier in this thread that the property wasn't showing up in the Environment tab. Are you sure it's making it in? -Sandy On Tue, Sep 9, 2014 at 11:58 AM,

Re: parquet predicate / projection pushdown into unionAll

2014-09-09 Thread Patrick Wendell

I think what Michael means is people often use this to read existing partitioned Parquet tables that are defined in a Hive metastore rather than data generated directly from within Spark and then reading it back as a table. I'd expect the latter case to become more common, but for now most users co

Re: parquet predicate / projection pushdown into unionAll

2014-09-09 Thread Cody Koeninger

Maybe I'm missing something, I thought parquet was generally a write-once format and the sqlContext interface to it seems that way as well. d1.saveAsParquetFile("/foo/d1") // another day, another table, with same schema d2.saveAsParquetFile("/foo/d2") Will give a directory structure like /foo/d

Re: Lost executor on YARN ALS iterations

2014-09-09 Thread Debasish Das

Hmm...I did try it increase to few gb but did not get a successful run yet... Any idea if I am using say 40 executors, each running 16GB, what's the typical spark.yarn.executor.memoryOverhead for say 100M x 10 M large matrices with say few billion ratings... On Tue, Sep 9, 2014 at 10:49 AM, Sandy

Re: parquet predicate / projection pushdown into unionAll

2014-09-09 Thread Michael Armbrust

I think usually people add these directories as multiple partitions of the same table instead of union. This actually allows us to efficiently prune directories when reading in addition to standard column pruning. On Tue, Sep 9, 2014 at 11:26 AM, Gary Malouf wrote: > I'm kind of surprised this

Re: parquet predicate / projection pushdown into unionAll

2014-09-09 Thread Gary Malouf

I'm kind of surprised this was not run into before. Do people not segregate their data by day/week in the HDFS directory structure? On Tue, Sep 9, 2014 at 2:08 PM, Michael Armbrust wrote: > Thanks! > > On Tue, Sep 9, 2014 at 11:07 AM, Cody Koeninger > wrote: > > > Opened > > > > https://issue

Re: parquet predicate / projection pushdown into unionAll

2014-09-09 Thread Michael Armbrust

Thanks! On Tue, Sep 9, 2014 at 11:07 AM, Cody Koeninger wrote: > Opened > > https://issues.apache.org/jira/browse/SPARK-3462 > > I'll take a look at ColumnPruning and see what I can do > > On Tue, Sep 9, 2014 at 12:46 PM, Michael Armbrust > wrote: > >> On Tue, Sep 9, 2014 at 10:17 AM, Cody Koen

Re: parquet predicate / projection pushdown into unionAll

2014-09-09 Thread Cody Koeninger

Opened https://issues.apache.org/jira/browse/SPARK-3462 I'll take a look at ColumnPruning and see what I can do On Tue, Sep 9, 2014 at 12:46 PM, Michael Armbrust wrote: > On Tue, Sep 9, 2014 at 10:17 AM, Cody Koeninger > wrote: >> >> Is there a reason in general not to push projections and pr

Re: RFC: Deprecating YARN-alpha API's

2014-09-09 Thread Sean Owen

FWIW consensus from Cloudera folk seems to be that there's no need or demand on this end for YARN alpha. It wouldn't have an impact if it were removed sooner even. It will be a small positive to reduce complexity by removing this support, making it a little easier to develop for current YARN APIs.

Re: Lost executor on YARN ALS iterations

2014-09-09 Thread Sandy Ryza

Hi Deb, The current state of the art is to increase spark.yarn.executor.memoryOverhead until the job stops failing. We do have plans to try to automatically scale this based on the amount of memory requested, but it will still just be a heuristic. -Sandy On Tue, Sep 9, 2014 at 7:32 AM, Debasish

Re: parquet predicate / projection pushdown into unionAll

2014-09-09 Thread Michael Armbrust

On Tue, Sep 9, 2014 at 10:17 AM, Cody Koeninger wrote: > > Is there a reason in general not to push projections and predicates down > into the individual ParquetTableScans in a union? > This would be a great case to add to ColumnPruning. Would be awesome if you could open a JIRA or even a PR :)

parquet predicate / projection pushdown into unionAll

2014-09-09 Thread Cody Koeninger

I've been looking at performance differences between spark sql queries against single parquet tables, vs a unionAll of two tables. It's a significant difference, like 5 to 10x Is there a reason in general not to push projections and predicates down into the individual ParquetTableScans in a union

RFC: Deprecating YARN-alpha API's

2014-09-09 Thread Patrick Wendell

Hi Everyone, This is a call to the community for comments on SPARK-3445 [1]. In a nutshell, we are trying to figure out timelines for deprecation of the YARN-alpha API's as Yahoo is now moving off of them. It's helpful for us to have a sense of whether anyone else uses these. Please comment on th

Re: [mllib] Add multiplying large scale matrices

2014-09-09 Thread 顾荣

Hi All, Sorry for my late reply! Yu Ishikawa,Thanks for your interests in Saury project. You are welcomed to try that out. If you have questions about that, please email me. We are keeping improving performance/adding features for the project. Xiangrui, thanks for your encouragement. If you have

Re: Lost executor on YARN ALS iterations

2014-09-09 Thread Debasish Das

Hi Sandy, Any resolution for YARN failures ? It's a blocker for running spark on top of YARN. Thanks. Deb On Tue, Aug 19, 2014 at 11:29 PM, Xiangrui Meng wrote: > Hi Deb, > > I think this may be the same issue as described in > https://issues.apache.org/jira/browse/SPARK-2121 . We know that th

Re: Junit spark tests

Re: Junit spark tests

Junit spark tests

Re: parquet predicate / projection pushdown into unionAll

Re: RFC: Deprecating YARN-alpha API's

Re: Lost executor on YARN ALS iterations

yet another jenkins restart early thursday morning -- 730am PDT (and a brief update on our new jenkins infra)

Re: parquet predicate / projection pushdown into unionAll

Re: Lost executor on YARN ALS iterations

Re: parquet predicate / projection pushdown into unionAll

Re: parquet predicate / projection pushdown into unionAll

Re: Lost executor on YARN ALS iterations

Re: parquet predicate / projection pushdown into unionAll

Re: parquet predicate / projection pushdown into unionAll

Re: parquet predicate / projection pushdown into unionAll

Re: parquet predicate / projection pushdown into unionAll

Re: RFC: Deprecating YARN-alpha API's

Re: Lost executor on YARN ALS iterations

Re: parquet predicate / projection pushdown into unionAll

parquet predicate / projection pushdown into unionAll

RFC: Deprecating YARN-alpha API's

Re: [mllib] Add multiplying large scale matrices

Re: Lost executor on YARN ALS iterations

23 matches

Site Navigation

Mail list logo

Footer information