Re: Emergency maintenace on jenkins

2014-06-10 Thread Patrick Wendell
No luck with this tonight - unfortunately our Python tests aren't
working well with Python 2.6 and some other issues made it hard to get
the EC2 worker up to speed. Hopefully we can have this up and running
tomororw.

- Patrick

On Mon, Jun 9, 2014 at 10:17 PM, Patrick Wendell  wrote:
> Just a heads up - due to an outage at UCB we've lost several of the
> Jenkins slaves. I'm trying to spin up new slaves on EC2 in order to
> compensate, but this might fail some ongoing builds.
>
> The good news is if we do get it working with EC2 workers, then we
> will have burst capability in the future - e.g. on release deadlines.
> So it's not all bad!
>
> - Patrick


Re: debugger

2014-06-10 Thread DanielH
Hi Josh,

I came across this post when looking for a debugger or RDD visualization
tool for Spark. I am using Spark 0.9.1 and upgrading soon to Spark 1.0. The
links you posted are dead. Can you please direct me to how I can debug my
existing Spark job.

Will I need to edit my existing job's code in addition to setting any
environment variables/parameters.

The problem: I am running Bagel on a very large graph and when the job gets
to the final step (saveAsTextFile) it will hang for up to many days until I
kill it. Oftentimes if I simply rerun the job, it will finish in an hour
which is the expected amount of time it should take.

Thanks!



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/debugger-tp284p6982.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.


Re: Emergency maintenace on jenkins

2014-06-10 Thread Patrick Wendell
Hey just to update people - as of around 1pm PT we were back up and
running with Jenkins slaves on EC2. Sorry about the disruption.

- Patrick

On Tue, Jun 10, 2014 at 1:15 AM, Patrick Wendell  wrote:
> No luck with this tonight - unfortunately our Python tests aren't
> working well with Python 2.6 and some other issues made it hard to get
> the EC2 worker up to speed. Hopefully we can have this up and running
> tomororw.
>
> - Patrick
>
> On Mon, Jun 9, 2014 at 10:17 PM, Patrick Wendell  wrote:
>> Just a heads up - due to an outage at UCB we've lost several of the
>> Jenkins slaves. I'm trying to spin up new slaves on EC2 in order to
>> compensate, but this might fail some ongoing builds.
>>
>> The good news is if we do get it working with EC2 workers, then we
>> will have burst capability in the future - e.g. on release deadlines.
>> So it's not all bad!
>>
>> - Patrick


Run ScalaTest inside Intellij IDEA

2014-06-10 Thread 申毅杰
Hi All,

I want to run ScalaTest Suite in IDEA directly, but it seems didn’t pass the 
make phase before test running.
The problems are as follows:

/Users/yijie/code/apache.spark.master/core/src/main/scala/org/apache/spark/executor/MesosExecutorBackend.scala
Error:(44, 35) type mismatch;
 found   : org.apache.mesos.protobuf.ByteString
 required: com.google.protobuf.ByteString
  .setData(ByteString.copyFrom(data))
  ^
/Users/yijie/code/apache.spark.master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala
Error:(119, 35) type mismatch;
 found   : org.apache.mesos.protobuf.ByteString
 required: com.google.protobuf.ByteString
  .setData(ByteString.copyFrom(createExecArg()))
  ^
Error:(257, 35) type mismatch;
 found   : org.apache.mesos.protobuf.ByteString
 required: com.google.protobuf.ByteString
  .setData(ByteString.copyFrom(task.serializedTask))
  ^

Before I run test in IDEA, I build spark through ’sbt/sbt assembly’,
import projects into IDEA after ’sbt/sbt gen-idea’, 
and able to run test in Terminal ’sbt/sbt test’

Are there anything I leave out in order to run/debug testsuite inside IDEA?

Best regards,
Yijie

Suggestion: rdd.compute()

2014-06-10 Thread innowireless TaeYun Kim
Hi,

Regarding the following scenario, Would it be nice to have an action method
named like 'compute()' that does nothing but computing/materializing the
whole partitions of an RDD?
It can also be useful for the profiling.


-Original Message-
From: innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr] 
Sent: Wednesday, June 11, 2014 11:40 AM
To: u...@spark.apache.org
Subject: Question about RDD cache, unpersist, materialization

Hi,

What I (seems to) know about RDD persisting API is as follows:
- cache() and persist() is not an action. It only does a marking.
- unpersist() is also not an action. It only removes a marking. But if the
rdd is already in memory, it is unloaded.

And there seems no API to forcefully materialize the RDD without requiring a
data by an action method, for example first().

So, I am faced with the following scenario.

{
JavaRDD rddUnion = sc.parallelize(new ArrayList());  // create
empty for merging
for (int i = 0; i < 10; i++)
{
JavaRDD rdd = sc.textFile(inputFileNames[i]);
rdd.cache();  // Since it will be used twice, cache.
rdd.map(...).filter(...).saveAsTextFile(outputFileNames[i]);  //
Transform and save, rdd materializes
rddUnion = rddUnion.union(rdd.map(...).filter(...));  // Do another
transform to T and merge by union
rdd.unpersist();  // Now it seems not needed. (But needed actually)
}
// Here, rddUnion actually materializes, and needs all 10 rdds that
already unpersisted.
// So, rebuilding all 10 rdds will occur.
rddUnion.saveAsTextFile(mergedFileName);
}

If rddUnion can be materialized before the rdd.unpersist() line and
cache()d, the rdds in the loop will not be needed on
rddUnion.saveAsTextFile().

Now what is the best strategy?
- Do not unpersist all 10 rdds in the loop.
- Materialize rddUnion in the loop by calling 'light' action API, like
first().
- Give up and just rebuild/reload all 10 rdds when saving rddUnion.

Is there some misunderstanding?

Thanks.




Re: Suggestion: rdd.compute()

2014-06-10 Thread Ankur Dave
You can achieve an equivalent effect by calling rdd.foreach(x => {}), which
is the lightest possible action that forces materialization of the whole
RDD.

Ankur 


Re: Run ScalaTest inside Intellij IDEA

2014-06-10 Thread Qiuzhuang Lian
I also run into this problem when running examples in IDEA. The issue looks
that it uses depends on too many jars and that the classpath seems to have
length limit. So I import the assembly jar and put the head of the list
dependent path and it works.

Thanks,
Qiuzhuang


On Wed, Jun 11, 2014 at 10:39 AM, 申毅杰  wrote:

> Hi All,
>
> I want to run ScalaTest Suite in IDEA directly, but it seems didn’t pass
> the make phase before test running.
> The problems are as follows:
>
>
> /Users/yijie/code/apache.spark.master/core/src/main/scala/org/apache/spark/executor/MesosExecutorBackend.scala
> Error:(44, 35) type mismatch;
>  found   : org.apache.mesos.protobuf.ByteString
>  required: com.google.protobuf.ByteString
>   .setData(ByteString.copyFrom(data))
>   ^
>
> /Users/yijie/code/apache.spark.master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala
> Error:(119, 35) type mismatch;
>  found   : org.apache.mesos.protobuf.ByteString
>  required: com.google.protobuf.ByteString
>   .setData(ByteString.copyFrom(createExecArg()))
>   ^
> Error:(257, 35) type mismatch;
>  found   : org.apache.mesos.protobuf.ByteString
>  required: com.google.protobuf.ByteString
>   .setData(ByteString.copyFrom(task.serializedTask))
>   ^
>
> Before I run test in IDEA, I build spark through ’sbt/sbt assembly’,
> import projects into IDEA after ’sbt/sbt gen-idea’,
> and able to run test in Terminal ’sbt/sbt test’
>
> Are there anything I leave out in order to run/debug testsuite inside IDEA?
>
> Best regards,
> Yijie


Re: Constraint Solver for Spark

2014-06-10 Thread Debasish Das
Hi,

I am bit confused wiht the code here:

// Solve the least-squares problem for each user and return the new feature
vectors

Array.range(0, numUsers).map { index =>

  // Compute the full XtX matrix from the lower-triangular part we got
above

  fillFullMatrix(userXtX(index), fullXtX)

  // Add regularization

  var i = 0

  while (i < rank) {

fullXtX.data(i * rank + i) += lambda

i += 1

  }

  // Solve the resulting matrix, which is symmetric and
positive-definite

  algo match {

case ALSAlgo.Implicit =>
Solve.solvePositive(fullXtX.addi(YtY.get.value),
userXy(index)).data

case ALSAlgo.Explicit => Solve.solvePositive(fullXtX, userXy
(index)).data

  }

}


On Fri, Jun 6, 2014 at 10:42 AM, Debasish Das 
wrote:

> Hi Xiangrui,
>
> It's not the linear constraint, It is quadratic inequality with slack,
> first order taylor approximation of off diagonal cross terms and a cyclic
> coordinate descent, which we think will yield orthogonalityIt's still
> under works...
>
> Also we want to put a L1 constraint as set of linear equations when
> solving for ALS...
>
> I will create the JIRA...as I see it, this will evolve to a generic
> constraint solver for machine learning problems that has a QP
> structureALS is one exampleanother example is kernel SVMs...
>
> I did not know that lgpl solver can be added to the classpathif it can
> be then definitely we should add these in ALS.scala...
>
> Thanks.
> Deb
>
>
>
> On Thu, Jun 5, 2014 at 11:31 PM, Xiangrui Meng  wrote:
>
>> I don't quite understand why putting linear constraints can promote
>> orthogonality. For the interfaces, if the subproblem is determined by
>> Y^T Y and Y^T b for each iteration, then the least squares solver, the
>> non-negative least squares solver, or your convex solver is simply a
>> function
>>
>> (A, b) -> x.
>>
>> You can define it as an interface, and make the solver pluggable by
>> adding a setter to ALS. If you want to use your lgpl solver, just
>> include it in the classpath. Creating two separate files still seems
>> unnecessary to me. Could you create a JIRA and we can move our
>> discussion there? Thanks!
>>
>> Best,
>> Xiangrui
>>
>> On Thu, Jun 5, 2014 at 7:20 PM, Debasish Das 
>> wrote:
>> > Hi Xiangrui,
>> >
>> > For orthogonality properties in the factors we need a constraint solver
>> > other than the usuals (l1, upper and lower bounds, l2 etc)
>> >
>> > The interface of constraint solver is standard and I can add it in mllib
>> > optimization
>> >
>> > But I am not sure how will I call the gpl licensed ipm solver from
>> > mllibassume the solver interface is as follows:
>> >
>> > Qpsolver (densematrix h, array [double] f, int linearEquality, int
>> > linearInequality, bool lb, bool ub)
>> >
>> > And then I have functions to update equalities, inequalities, bounds etc
>> > followed by the run which generates the solution
>> >
>> > For l1 constraints I have to use epigraph formulation which needs a
>> > variable transformation before the solve
>> >
>> > I was thinking that for the problems that does not need constraints
>> people
>> > will use ALS.scala and ConstrainedALS.scala will have the constrained
>> > formulations
>> >
>> > I can point you to the code once it is ready and then you can guide me
>> how
>> > to refactor it to mllib als ?
>> >
>> > Thanks.
>> > Deb
>> > Hi Deb,
>> >
>> > Why do you want to make those methods public? If you only need to
>> > replace the solver for subproblems. You can try to make the solver
>> > pluggable. Now it supports least squares and non-negative least
>> > squares. You can define an interface for the subproblem solvers and
>> > maintain the IPM solver at your own code base, if the only information
>> > you need is Y^T Y and Y^T b.
>> >
>> > Btw, just curious, what is the use case for quadratic constraints?
>> >
>> > Best,
>> > Xiangrui
>> >
>> > On Thu, Jun 5, 2014 at 3:38 PM, Debasish Das 
>> > wrote:
>> >> Hi,
>> >>
>> >> We are adding a constrained ALS solver in Spark to solve matrix
>> >> factorization use-cases which needs additional constraints (bounds,
>> >> equality, inequality, quadratic constraints)
>> >>
>> >> We are using a native version of a primal dual SOCP solver due to its
>> > small
>> >> memory footprint and sparse ccs matrix computation it uses...The solver
>> >> depends on AMD and LDL packages from Timothy Davis for sparse ccs
>> matrix
>> >> algebra (released under lgpl)...
>> >>
>> >> Due to GPL dependencies, it won't be possible to release the code as
>> > Apache
>> >> license for now...If we get good results on our use-cases, we will
>> plan to
>> >> write a version in breeze/modify joptimizer for sparse ccs
>> operations...
>> >>
>> >> I derived ConstrainedALS from Spark mllib ALS and I am comparing the
>> >> performance with default ALS and non-negative ALS as baseline. Plan is
>> to
>> >> release the code as GPL license for community review...

Re: Constraint Solver for Spark

2014-06-10 Thread Debasish Das
Sorry last one went out by mistake:

Is not for users (0 to numUsers), fullXtX is same ? In the ALS formulation
this is W^TW or H^TH which should be same for all the users ? Why we are
reading userXtX(index) and adding it to fullXtX in the loop over all
numUsers ?

// Solve the least-squares problem for each user and return the new feature
vectors

Array.range(0, numUsers).map { index =>

  // Compute the full XtX matrix from the lower-triangular part we got
above

  fillFullMatrix(userXtX(index), fullXtX)

  // Add regularization

  var i = 0

  while (i < rank) {

fullXtX.data(i * rank + i) += lambda

i += 1

  }

  // Solve the resulting matrix, which is symmetric and
positive-definite

  algo match {

case ALSAlgo.Implicit =>
Solve.solvePositive(fullXtX.addi(YtY.get.value),
userXy(index)).data

case ALSAlgo.Explicit => Solve.solvePositive(fullXtX, userXy
(index)).data

  }

}


On Tue, Jun 10, 2014 at 8:56 PM, Debasish Das 
wrote:

> Hi,
>
> I am bit confused wiht the code here:
>
> // Solve the least-squares problem for each user and return the new
> feature vectors
>
> Array.range(0, numUsers).map { index =>
>
>   // Compute the full XtX matrix from the lower-triangular part we
> got above
>
>   fillFullMatrix(userXtX(index), fullXtX)
>
>   // Add regularization
>
>   var i = 0
>
>   while (i < rank) {
>
> fullXtX.data(i * rank + i) += lambda
>
> i += 1
>
>   }
>
>   // Solve the resulting matrix, which is symmetric and
> positive-definite
>
>   algo match {
>
> case ALSAlgo.Implicit => 
> Solve.solvePositive(fullXtX.addi(YtY.get.value),
> userXy(index)).data
>
> case ALSAlgo.Explicit => Solve.solvePositive(fullXtX, userXy
> (index)).data
>
>   }
>
> }
>
>
> On Fri, Jun 6, 2014 at 10:42 AM, Debasish Das 
> wrote:
>
>> Hi Xiangrui,
>>
>> It's not the linear constraint, It is quadratic inequality with slack,
>> first order taylor approximation of off diagonal cross terms and a cyclic
>> coordinate descent, which we think will yield orthogonalityIt's still
>> under works...
>>
>> Also we want to put a L1 constraint as set of linear equations when
>> solving for ALS...
>>
>> I will create the JIRA...as I see it, this will evolve to a generic
>> constraint solver for machine learning problems that has a QP
>> structureALS is one exampleanother example is kernel SVMs...
>>
>> I did not know that lgpl solver can be added to the classpathif it
>> can be then definitely we should add these in ALS.scala...
>>
>> Thanks.
>> Deb
>>
>>
>>
>> On Thu, Jun 5, 2014 at 11:31 PM, Xiangrui Meng  wrote:
>>
>>> I don't quite understand why putting linear constraints can promote
>>> orthogonality. For the interfaces, if the subproblem is determined by
>>> Y^T Y and Y^T b for each iteration, then the least squares solver, the
>>> non-negative least squares solver, or your convex solver is simply a
>>> function
>>>
>>> (A, b) -> x.
>>>
>>> You can define it as an interface, and make the solver pluggable by
>>> adding a setter to ALS. If you want to use your lgpl solver, just
>>> include it in the classpath. Creating two separate files still seems
>>> unnecessary to me. Could you create a JIRA and we can move our
>>> discussion there? Thanks!
>>>
>>> Best,
>>> Xiangrui
>>>
>>> On Thu, Jun 5, 2014 at 7:20 PM, Debasish Das 
>>> wrote:
>>> > Hi Xiangrui,
>>> >
>>> > For orthogonality properties in the factors we need a constraint solver
>>> > other than the usuals (l1, upper and lower bounds, l2 etc)
>>> >
>>> > The interface of constraint solver is standard and I can add it in
>>> mllib
>>> > optimization
>>> >
>>> > But I am not sure how will I call the gpl licensed ipm solver from
>>> > mllibassume the solver interface is as follows:
>>> >
>>> > Qpsolver (densematrix h, array [double] f, int linearEquality, int
>>> > linearInequality, bool lb, bool ub)
>>> >
>>> > And then I have functions to update equalities, inequalities, bounds
>>> etc
>>> > followed by the run which generates the solution
>>> >
>>> > For l1 constraints I have to use epigraph formulation which needs a
>>> > variable transformation before the solve
>>> >
>>> > I was thinking that for the problems that does not need constraints
>>> people
>>> > will use ALS.scala and ConstrainedALS.scala will have the constrained
>>> > formulations
>>> >
>>> > I can point you to the code once it is ready and then you can guide me
>>> how
>>> > to refactor it to mllib als ?
>>> >
>>> > Thanks.
>>> > Deb
>>> > Hi Deb,
>>> >
>>> > Why do you want to make those methods public? If you only need to
>>> > replace the solver for subproblems. You can try to make the solver
>>> > pluggable. Now it supports least squares and non-negative least
>>> > squares. You can define an interface for the subproblem solvers and
>>> > maintain the IPM solver at your own code base, if the only information
>>> > yo