Hi Egor,
I posted the design doc for pipeline and parameters on the JIRA, now
I'm trying to work out some details of ML datasets, which I will post
it later this week. You feedback is welcome!
Best,
Xiangrui
On Mon, Sep 15, 2014 at 12:44 AM, Reynold Xin wrote:
> Hi Egor,
>
> Thanks for the sugg
There are two things we(Yandex) miss in Spark: MLlib good abstractions and
good workflow job scheduler. From threads "Adding abstraction in MlLib" and
"[mllib] State of Multi-Model training" I got the idea, that databricks
working on it and we should wait until first post doc, which would lead us.
I see. Thank you, it works for me. It looks confusing to have two ways
expose configuration though.
Best Regards
Jun Feng Liu
IBM China Systems & Technology Laboratory in Beijing
Phone: 86-10-82452683
E-mail: liuj...@cn.ibm.com
BLD 28,ZGC Software Park
No.8 Rd.Dong Bei Wang West, Dist.H
Hi Kyle,
Thank you for the code examples. We may be able to use some of the ideas there.
I think initially the goal is to have the optimizers ready (SGD, LBFGS),
and then the evaluation metrics will come next. It might take some time,
however as MLlib is going to have a significant API "face-li
I'd be interested in helping to test your code as soon as its available.
The version I wrote used a paired RDD and combined by key, it worked best
if it used a custom partitioner that put all the samples in the same area.
Running things in batched matrices would probably speed things up greatly.
Yo
Hi Mohit,
Welcome to the Spark community! We normally look at feature proposals
using github pull requests mind submitting one? The contribution
process is covered here:
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
On Tue, Sep 16, 2014 at 9:16 PM, Mohit Jaggi wrote:
>
https://issues.apache.org/jira/browse/SPARK-3489
Folks,
I am Mohit Jaggi and I work for Ayasdi Inc. After experimenting with Spark
for a while and discovering its awesomeness(!) I made an attempt to
provide a wrapper API that looks like R and/or pandas dataframe.
https://github.com/AyasdiOpenSou
Hi Patrick,
In our case we have a performance monitoring, alerting, and anomaly
detection product (Cloud + On Premises) and we just added Storm performance
monitoring to it. My thinking was Spark, like any similar project really,
needs a page/section/something listing various operational tools or
Hey Otis,
Could you describe a bit more about what your program is. Is it an
open source project? A product? This would help understand a bit where
it should go.
- Patrick
On Mon, Sep 15, 2014 at 6:49 PM, Otis Gospodnetic
wrote:
> Hi,
>
> I'm looking for a suitable place on the Wiki to add some
Thank you for reading this mail.
I'm trying to change the underlying network connection system of Spark to
support Infiniteband.
1. I doubt whether ConnectionManager and netty is under construction. It seems
that they are not usually used.
2. How much connection payload is carried by akka?
3
Hi Kyle,
I'm actively working on it now. It's pretty close to completion, I'm just
trying to figure out bottlenecks and optimize as much as possible.
As Phase 1, I implemented multi model training on Gradient Descent. Instead of
performing Vector-Vector operations on rows (examples) and weights,
I'm curious about the state of development Multi-Model learning in MLlib
(training sets of models during the same training session, rather then one
at a time). The JIRA lists it as in progress targeting Spark 1.2.0 (
https://issues.apache.org/jira/browse/SPARK-1486 ). But there hasn't been
any note
There appears to be a newly added Boolean in DAGScheduler default to "False":
private val localExecutionEnabled =
sc.getConf.getBoolean("spark.localExecution.enabled", false)
Then
val shouldRunLocally =
localExecutionEnabled && allowLocal && finalStage.parents.isEmpty &&
partitions.len
Hi,
The test case is separated out as follows. The call to rdd2.first() breaks when
spark version is changed to 1.1.0, reporting exception NullWritable not
serializable. However, the same test passed with spark 1.0.2. The pom.xml file
is attached. The test data README.md was copied from spark.
I could resolve the conflict between my method trace and the details from
the webUI.
I was modifying and compiling only on the master node. So, I found only
node in the print trace. Now, I incorporated the prints in all the nodes
and compiled them individually. Then started all the processes and r
Hi Jun,
You can still set the authentication variables through `spark-env.sh`, by
exporting SPARK_MASTER_OPTS, SPARK_WORKER_OPTS, SPARK_HISTORY_OPTS etc to
include "-Dspark.auth.{...}". There is an open pull request that allows
these processes to also read from spark-defaults.conf, but this is not
Hi, When I building spark with maven, but failed, the error message is as
following. I didn't found the satisfactory solution by google. Anyone can
help me? Thank you!
INFO]
[INFO] Reactor Summary:
[INFO]
[INFO] Spark
At 2014-09-16 00:07:34 -0700, sochi wrote:
> so, above example is like a ---(e1)---> b ---(e1)---> c ---(e1)---> d
>
> In this case, can I find b,c and d when I have just src vertex, a and edge,
> e1?
First, to clarify: the three edges in your example are all distinct, since they
have different
Hi. Im ChiSeung.
1. How to know each vertices connecting some edges?
I want to know how I find edges connected some vertices.
And
2.
example, there are vertex a, b, c, d and edge e1.
On graph,
a and b is connected by e1
b and c is connected by e1
c and d is also con
19 matches
Mail list logo