Class1.java
@Autowired
Private ClassX cx;
Public list method1(JavaPairRDD data){
List list1 = new ArrayList();
List list2 = new ArrayList();
JavaPairRDD computed = data.map(
new Function>() {
Public List call(object obj) throws exception {
Can you be a little bit more specific, maybe give a code snippet?
On Tue, Sep 9, 2014 at 5:14 PM, Sudershan Malpani <
sudershan.malp...@gmail.com> wrote:
> Hi all,
>
> I am calling an object which in turn is calling a method inside a map RDD
> in spark. While writing the tests how can I mock tha
Hi all,
I am calling an object which in turn is calling a method inside a map RDD in
spark. While writing the tests how can I mock that object's call? Currently I
did doNothing().when(class).method() is called but it is giving task not
serializable exception. I tried making the class both spy a
Ok, so looking at the optimizer code for the first time and trying the
simplest rule that could possibly work,
object UnionPushdown extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
// Push down filter into
union
case f @ Filter(condition, u @ Union
We were using it until recently, we are talking to our customers and see if
we can get off it.
Chester
Alpine Data Labs
On Tue, Sep 9, 2014 at 10:59 AM, Sean Owen wrote:
> FWIW consensus from Cloudera folk seems to be that there's no need or
> demand on this end for YARN alpha. It wouldn't ha
Last time it did not show up on environment tab but I will give it another
shot...Expected behavior is that this env variable will show up right ?
On Tue, Sep 9, 2014 at 12:15 PM, Sandy Ryza wrote:
> I would expect 2 GB would be enough or more than enough for 16 GB
> executors (unless ALS is usi
since the power incident last thursday, the github pull request builder
plugin is still not really working 100%. i found an open issue
w/jenkins[1] that could definitely be affecting us, i will be pausing
builds early thursday morning and then restarting jenkins.
i'll send out a reminder tomorrow,
What Patrick said is correct. Two other points:
- In the 1.2 release we are hoping to beef up the support for working with
partitioned parquet independent of the metastore.
- You can actually do operations like INSERT INTO for parquet tables to
add data. This creates new parquet files for each
I would expect 2 GB would be enough or more than enough for 16 GB executors
(unless ALS is using a bunch of off-heap memory?). You mentioned earlier
in this thread that the property wasn't showing up in the Environment tab.
Are you sure it's making it in?
-Sandy
On Tue, Sep 9, 2014 at 11:58 AM,
I think what Michael means is people often use this to read existing
partitioned Parquet tables that are defined in a Hive metastore rather
than data generated directly from within Spark and then reading it
back as a table. I'd expect the latter case to become more common, but
for now most users co
Maybe I'm missing something, I thought parquet was generally a write-once
format and the sqlContext interface to it seems that way as well.
d1.saveAsParquetFile("/foo/d1")
// another day, another table, with same schema
d2.saveAsParquetFile("/foo/d2")
Will give a directory structure like
/foo/d
Hmm...I did try it increase to few gb but did not get a successful run
yet...
Any idea if I am using say 40 executors, each running 16GB, what's the
typical spark.yarn.executor.memoryOverhead for say 100M x 10 M large
matrices with say few billion ratings...
On Tue, Sep 9, 2014 at 10:49 AM, Sandy
I think usually people add these directories as multiple partitions of the
same table instead of union. This actually allows us to efficiently prune
directories when reading in addition to standard column pruning.
On Tue, Sep 9, 2014 at 11:26 AM, Gary Malouf wrote:
> I'm kind of surprised this
I'm kind of surprised this was not run into before. Do people not
segregate their data by day/week in the HDFS directory structure?
On Tue, Sep 9, 2014 at 2:08 PM, Michael Armbrust
wrote:
> Thanks!
>
> On Tue, Sep 9, 2014 at 11:07 AM, Cody Koeninger
> wrote:
>
> > Opened
> >
> > https://issue
Thanks!
On Tue, Sep 9, 2014 at 11:07 AM, Cody Koeninger wrote:
> Opened
>
> https://issues.apache.org/jira/browse/SPARK-3462
>
> I'll take a look at ColumnPruning and see what I can do
>
> On Tue, Sep 9, 2014 at 12:46 PM, Michael Armbrust
> wrote:
>
>> On Tue, Sep 9, 2014 at 10:17 AM, Cody Koen
Opened
https://issues.apache.org/jira/browse/SPARK-3462
I'll take a look at ColumnPruning and see what I can do
On Tue, Sep 9, 2014 at 12:46 PM, Michael Armbrust
wrote:
> On Tue, Sep 9, 2014 at 10:17 AM, Cody Koeninger
> wrote:
>>
>> Is there a reason in general not to push projections and pr
FWIW consensus from Cloudera folk seems to be that there's no need or
demand on this end for YARN alpha. It wouldn't have an impact if it
were removed sooner even.
It will be a small positive to reduce complexity by removing this
support, making it a little easier to develop for current YARN APIs.
Hi Deb,
The current state of the art is to increase
spark.yarn.executor.memoryOverhead until the job stops failing. We do have
plans to try to automatically scale this based on the amount of memory
requested, but it will still just be a heuristic.
-Sandy
On Tue, Sep 9, 2014 at 7:32 AM, Debasish
On Tue, Sep 9, 2014 at 10:17 AM, Cody Koeninger wrote:
>
> Is there a reason in general not to push projections and predicates down
> into the individual ParquetTableScans in a union?
>
This would be a great case to add to ColumnPruning. Would be awesome if
you could open a JIRA or even a PR :)
I've been looking at performance differences between spark sql queries
against single parquet tables, vs a unionAll of two tables. It's a
significant difference, like 5 to 10x
Is there a reason in general not to push projections and predicates down
into the individual ParquetTableScans in a union
Hi Everyone,
This is a call to the community for comments on SPARK-3445 [1]. In a
nutshell, we are trying to figure out timelines for deprecation of the
YARN-alpha API's as Yahoo is now moving off of them. It's helpful for
us to have a sense of whether anyone else uses these.
Please comment on th
Hi All,
Sorry for my late reply!
Yu Ishikawa,Thanks for your interests in Saury project. You are welcomed to
try that out. If you have questions about that, please email me. We are
keeping improving performance/adding features for the project.
Xiangrui, thanks for your encouragement. If you have
Hi Sandy,
Any resolution for YARN failures ? It's a blocker for running spark on top
of YARN.
Thanks.
Deb
On Tue, Aug 19, 2014 at 11:29 PM, Xiangrui Meng wrote:
> Hi Deb,
>
> I think this may be the same issue as described in
> https://issues.apache.org/jira/browse/SPARK-2121 . We know that th
23 matches
Mail list logo