Re: Adding abstraction in MLlib

2014-09-16 Thread Xiangrui Meng
Hi Egor, I posted the design doc for pipeline and parameters on the JIRA, now I'm trying to work out some details of ML datasets, which I will post it later this week. You feedback is welcome! Best, Xiangrui On Mon, Sep 15, 2014 at 12:44 AM, Reynold Xin wrote: > Hi Egor, > > Thanks for the sugg

Workflow Scheduler for Spark

2014-09-16 Thread Egor Pahomov
There are two things we(Yandex) miss in Spark: MLlib good abstractions and good workflow job scheduler. From threads "Adding abstraction in MlLib" and "[mllib] State of Multi-Model training" I got the idea, that databricks working on it and we should wait until first post doc, which would lead us.

Re: Spark authenticate enablement

2014-09-16 Thread Jun Feng Liu
I see. Thank you, it works for me. It looks confusing to have two ways expose configuration though. Best Regards Jun Feng Liu IBM China Systems & Technology Laboratory in Beijing Phone: 86-10-82452683 E-mail: liuj...@cn.ibm.com BLD 28,ZGC Software Park No.8 Rd.Dong Bei Wang West, Dist.H

Re: [mllib] State of Multi-Model training

2014-09-16 Thread Burak Yavuz
Hi Kyle, Thank you for the code examples. We may be able to use some of the ideas there. I think initially the goal is to have the optimizers ready (SGD, LBFGS), and then the evaluation metrics will come next. It might take some time, however as MLlib is going to have a significant API "face-li

Re: [mllib] State of Multi-Model training

2014-09-16 Thread Kyle Ellrott
I'd be interested in helping to test your code as soon as its available. The version I wrote used a paired RDD and combined by key, it worked best if it used a custom partitioner that put all the samples in the same area. Running things in batched matrices would probably speed things up greatly. Yo

Re: greeting from new member and jira 3489

2014-09-16 Thread Patrick Wendell
Hi Mohit, Welcome to the Spark community! We normally look at feature proposals using github pull requests mind submitting one? The contribution process is covered here: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark On Tue, Sep 16, 2014 at 9:16 PM, Mohit Jaggi wrote: >

greeting from new member and jira 3489

2014-09-16 Thread Mohit Jaggi
https://issues.apache.org/jira/browse/SPARK-3489 Folks, I am Mohit Jaggi and I work for Ayasdi Inc. After experimenting with Spark for a while and discovering its awesomeness(!) I made an attempt to provide a wrapper API that looks like R and/or pandas dataframe. https://github.com/AyasdiOpenSou

Re: Wiki page for Operations/Monitoring tools?

2014-09-16 Thread Otis Gospodnetic
Hi Patrick, In our case we have a performance monitoring, alerting, and anomaly detection product (Cloud + On Premises) and we just added Storm performance monitoring to it. My thinking was Spark, like any similar project really, needs a page/section/something listing various operational tools or

Re: Wiki page for Operations/Monitoring tools?

2014-09-16 Thread Patrick Wendell
Hey Otis, Could you describe a bit more about what your program is. Is it an open source project? A product? This would help understand a bit where it should go. - Patrick On Mon, Sep 15, 2014 at 6:49 PM, Otis Gospodnetic wrote: > Hi, > > I'm looking for a suitable place on the Wiki to add some

Network Communication - Akka or more?

2014-09-16 Thread Trident
Thank you for reading this mail. I'm trying to change the underlying network connection system of Spark to support Infiniteband. 1. I doubt whether ConnectionManager and netty is under construction. It seems that they are not usually used. 2. How much connection payload is carried by akka? 3

Re: [mllib] State of Multi-Model training

2014-09-16 Thread Burak Yavuz
Hi Kyle, I'm actively working on it now. It's pretty close to completion, I'm just trying to figure out bottlenecks and optimize as much as possible. As Phase 1, I implemented multi model training on Gradient Descent. Instead of performing Vector-Vector operations on rows (examples) and weights,

[mllib] State of Multi-Model training

2014-09-16 Thread Kyle Ellrott
I'm curious about the state of development Multi-Model learning in MLlib (training sets of models during the same training session, rather then one at a time). The JIRA lists it as in progress targeting Spark 1.2.0 ( https://issues.apache.org/jira/browse/SPARK-1486 ). But there hasn't been any note

RE: NullWritable not serializable

2014-09-16 Thread Yan Zhou.sc
There appears to be a newly added Boolean in DAGScheduler default to "False": private val localExecutionEnabled = sc.getConf.getBoolean("spark.localExecution.enabled", false) Then val shouldRunLocally = localExecutionEnabled && allowLocal && finalStage.parents.isEmpty && partitions.len

Re: NullWritable not serializable

2014-09-16 Thread Du Li
Hi, The test case is separated out as follows. The call to rdd2.first() breaks when spark version is changed to 1.1.0, reporting exception NullWritable not serializable. However, the same test passed with spark 1.0.2. The pom.xml file is attached. The test data README.md was copied from spark.

Re: how does replicate() method in BlockManager.scala aquires resources for rdd replication

2014-09-16 Thread rapelly kartheek
I could resolve the conflict between my method trace and the details from the webUI. I was modifying and compiling only on the master node. So, I found only node in the print trace. Now, I incorporated the prints in all the nodes and compiled them individually. Then started all the processes and r

Re: Spark authenticate enablement

2014-09-16 Thread Andrew Or
Hi Jun, You can still set the authentication variables through `spark-env.sh`, by exporting SPARK_MASTER_OPTS, SPARK_WORKER_OPTS, SPARK_HISTORY_OPTS etc to include "-Dspark.auth.{...}". There is an open pull request that allows these processes to also read from spark-defaults.conf, but this is not

Building Spark source error with maven

2014-09-16 Thread wyphao.2007
Hi, When I building spark with maven, but failed, the error message is as following. I didn't found the satisfactory solution by google. Anyone can help me? Thank you! INFO] [INFO] Reactor Summary: [INFO] [INFO] Spark

Re: GraphX: some vertex with specific edge

2014-09-16 Thread Ankur Dave
At 2014-09-16 00:07:34 -0700, sochi wrote: > so, above example is like a ---(e1)---> b ---(e1)---> c ---(e1)---> d > > In this case, can I find b,c and d when I have just src vertex, a and edge, > e1? First, to clarify: the three edges in your example are all distinct, since they have different

GraphX: some vertex with specific edge

2014-09-16 Thread sochi
Hi. Im ChiSeung. 1. How to know each vertices connecting some edges? I want to know how I find edges connected some vertices. And 2. example, there are vertex a, b, c, d and edge e1. On graph, a and b is connected by e1 b and c is connected by e1 c and d is also con