Re: problem with HiveContext inside Actor

2014-09-17 Thread Michael Armbrust
- dev Is it possible that you are constructing more than one HiveContext in a single JVM? Due to global state in Hive code this is not allowed. Michael On Wed, Sep 17, 2014 at 7:21 PM, Cheng, Hao wrote: > Hi, Du > > I am not sure what you mean “triggers the HiveContext to create a > database

RE: problem with HiveContext inside Actor

2014-09-17 Thread Cheng, Hao
Hi, Du I am not sure what you mean "triggers the HiveContext to create a database", do you create the sub class of HiveContext? Just be sure you call the "HiveContext.sessionState" eagerly, since it will set the proper "hiveconf" into the SessionState, otherwise the HiveDriver will always get th

Re: GraphX graph partitioning strategy

2014-09-17 Thread Larry Xiao
Hi Ankur, all, I've implemented few graph partitioning algorithms, and done some evaluation. The goal is to lower replication factor and produce better balanced graph, so to make work load more balance. Detailed description and result: https://issues.apache.org/jira/browse/SPARK-3523 Can you

problem with HiveContext inside Actor

2014-09-17 Thread Du Li
Hi, Wonder anybody had similar experience or any suggestion here. I have an akka Actor that processes database requests in high-level messages. Inside this Actor, it creates a HiveContext object that does the actual db work. The main thread creates the needed SparkContext and passes in to the

Re: [mllib] State of Multi-Model training

2014-09-17 Thread Burak Yavuz
I believe it will be in the main repo. Burak - Original Message - From: "Kyle Ellrott" To: "Burak Yavuz" Cc: dev@spark.apache.org Sent: Wednesday, September 17, 2014 9:48:54 AM Subject: Re: [mllib] State of Multi-Model training This sounds like a pretty major re-write of the system. Is

Re: Workflow Scheduler for Spark

2014-09-17 Thread Reynold Xin
There might've been some misunderstanding. I was referring to the MLlib pipeline design doc when I said the design doc was posted, in response to the first paragraph of your original email. On Wed, Sep 17, 2014 at 2:47 AM, Egor Pahomov wrote: > It's doc about MLLib pipeline functionality. What

Re: network.ConnectionManager error

2014-09-17 Thread Reynold Xin
This is during shutdown right? Looks ok to me since connections are being closed. We could've handle this more gracefully, but the logs look harmless. On Wednesday, September 17, 2014, wyphao.2007 wrote: > Hi, When I run spark job on yarn,and the job finished success,but I found > there are som

Re: [mllib] State of Multi-Model training

2014-09-17 Thread Kyle Ellrott
This sounds like a pretty major re-write of the system. Is it going to live in an different repo during development? Or will we be able to track progress in the main Spark repo? Kyle On Tue, Sep 16, 2014 at 10:22 PM, Burak Yavuz wrote: > Hi Kyle, > > Thank you for the code examples. We may be a

Re: network.ConnectionManager error

2014-09-17 Thread Christian Chua
I see the same thing. A workaround is to put a Thread.sleep(5000) statement before sc.stop() Let us know how it goes. > On Sep 17, 2014, at 3:43 AM, "wyphao.2007" wrote: > > Hi, When I run spark job on yarn,and the job finished success,but I found > there are some error logs in the logfi

network.ConnectionManager error

2014-09-17 Thread wyphao.2007
Hi, When I run spark job on yarn,and the job finished success,but I found there are some error logs in the logfile as follow(the red color text): 14/09/17 18:25:03 INFO ui.SparkUI: Stopped Spark web UI at http://sparkserver2.cn:63937 14/09/17 18:25:03 INFO scheduler.DAGScheduler: Stopping DAGS

Re: Workflow Scheduler for Spark

2014-09-17 Thread Egor Pahomov
It's doc about MLLib pipeline functionality. What about oozie-like workflow? 2014-09-17 13:08 GMT+04:00 Mark Hamstra : > See https://issues.apache.org/jira/browse/SPARK-3530 and this doc, > referenced in that JIRA: > > > https://docs.google.com/document/d/1rVwXRjWKfIb-7PI6b86ipytwbUH7irSNLF1_6dLm

Re: Workflow Scheduler for Spark

2014-09-17 Thread Mark Hamstra
See https://issues.apache.org/jira/browse/SPARK-3530 and this doc, referenced in that JIRA: https://docs.google.com/document/d/1rVwXRjWKfIb-7PI6b86ipytwbUH7irSNLF1_6dLmh8o/edit?usp=sharing On Wed, Sep 17, 2014 at 2:00 AM, Egor Pahomov wrote: > I have problems using Oozie. For example it doesn't

Re: Workflow Scheduler for Spark

2014-09-17 Thread Egor Pahomov
I have problems using Oozie. For example it doesn't sustain spark context like ooyola job server does. Other than GUI interfaces like HUE it's hard to work with - scoozie stopped in development year ago(I spoke with creator) and oozie xml very hard to write. Oozie still have all documentation and c

Re: Workflow Scheduler for Spark

2014-09-17 Thread Reynold Xin
Hi Egor, I think the design doc for the pipeline feature has been posted. For the workflow, I believe Oozie actually works fine with Spark if you want some external workflow system. Do you have any trouble using that? On Tue, Sep 16, 2014 at 11:45 PM, Egor Pahomov wrote: > There are two thing

Re: Network Communication - Akka or more?

2014-09-17 Thread Reynold Xin
I'm not familiar with Infiniband, but I can chime in on the Spark part. There are two kinds of communications in Spark: control plane and data plane. Task scheduling / dispatching is control, whereas fetching a block (e.g. shuffle) is data. On Tue, Sep 16, 2014 at 4:22 PM, Trident wrote: > Th