Hi,
I have recognized a strange behavior of spark core in combination with
mllib. Running my pipeline results in a RDD.
Calling count() on this RDD results in 160055.
Calling count() directly afterwards results in 160044 and so on.
The RDD seems to be unstable.
How can that be? Do you maybe have
build life-cycle.
Thanks,
Niklas
On 29.10.2014 19:01, Patrick Wendell wrote:
> One thing is you need to do a "maven package" before you run tests.
> The "local-cluster" tests depend on Spark already being packaged.
>
> - Patrick
>
> On Wed, Oct 29, 2014 at 1
. I tried some different configurations like
[1,1,512], [2,1,1024] etc. but couldn't get the tests run without a failure.
Could this be a configuration issue?
On 28.10.2014 19:03, Sean Owen wrote:
> On Tue, Oct 28, 2014 at 6:18 PM, Niklas Wilcke
> <1wil...@informatik.uni-hamburg.de&g
Hi,
I want to contribute to the MLlib library but I can't get the tests up
working. I've found three ways of running the tests on the commandline.
I just want to execute the MLlib tests.
1. via dev/run-tests script
This script executes all tests and take several hours to finish.
Some tests fa
Hi Egor Pahomov,
thanks for your suggestions. I think I will do the dirty workaround
because I don't want to maintain my own version of spark for now. Maybe
I will do later when I feel ready to contribute to the project.
Kind Regards,
Niklas Wilcke
On 25.09.2014 16:27, Egor Pahomov wrote
LabeledPoint[T](label: T, features: Vector)
In my opinion making LabeledPoint abstract is necessary and introducing
a generic label would be nice to have.
Just to clarify my priorities.
Kind Regards,
Niklas Wilcke
On 25.09.2014 16:02, Yu Ishikawa wrote:
> Hi Niklas Wilcke,
>
> As you
Hi Spark developers,
I try to implement a framework with Spark and MLlib to do duplicate
detection. I'm not familiar with Spark and Scala so please be patient
with me. In order to enrich the LabeledPoint class with some information
I tried to extend it and added some properties.
But the ML algorit