Re: k-means core function for temporal geo data

2015-05-19 Thread Xiangrui Meng
I'm not sure whether k-means would converge with this customized distance measure. You can list (weighted) time as a feature along with coordinates, and then use Euclidean distance. For other supported distance measures, you can check Derrick's package: http://spark-packages.org/package/derrickburn

Re: Spark and Flink

2015-05-19 Thread Alexander Alexandrov
Sorry, we're using a forked version which changed groupID. 2015-05-19 15:15 GMT+02:00 Till Rohrmann : > I guess it's a typo: "eu.stratosphere" should be replaced by > "org.apache.flink" > > On Tue, May 19, 2015 at 1:13 PM, Alexander Alexandrov < > alexander.s.alexand...@gmail.com> wrote: > >> We

Re: Spark and Flink

2015-05-19 Thread Till Rohrmann
I guess it's a typo: "eu.stratosphere" should be replaced by "org.apache.flink" On Tue, May 19, 2015 at 1:13 PM, Alexander Alexandrov < alexander.s.alexand...@gmail.com> wrote: > We managed to do this with the following config: > > // properties > > 2.2.0 >

Re: Informing the runtime about data already repartitioned using "output contracts"

2015-05-19 Thread Fabian Hueske
Alright, so if both inputs of the CoGroup are read from the file system, there should be a way to do the co-group on co-located data without repartitioning. In fact, I have some code lying around to do co-located joins from local FS [1]. Haven't tested it thoroughly and it also relies on a number o

Re: Informing the runtime about data already repartitioned using "output contracts"

2015-05-19 Thread Alexander Alexandrov
Thanks for the feedback, Fabian. This is related to the question I sent on the user mailing list yesterday. Mustafa is working on a master thesis where we try to abstract an operator for the update of stateful datasets (decoupled from the current native iterations logic) and use it in conjunction

Re: Spark and Flink

2015-05-19 Thread Alexander Alexandrov
We managed to do this with the following config: // properties 2.2.0 0.9-SNAPSHOT 1.2.1 // form the dependency management org.apache.hadoop

ClassReader could not be created

2015-05-19 Thread Flavio Pompermaier
Hi to all, I tried to run my job on a brand new Flink cluster (0.9-SNAPSHOT) from the web client using the shading strategy of the quickstart example but I get this exception: Caused by: java.lang.RuntimeException: Could not create ClassReader: java.io.IOException: Class not found at org.apache.f

how can rturn all row in dataset include mult value example

2015-05-19 Thread hagersaleh
want return all row include all value in valuesfromsubquery this code just return row include frist value BUILDING public static ArrayList valuesfromsubquery = new ArrayList(); valuesfromsubquery.add("BUILDING"); valuesfromsubquery.add("MACHINERY"); valuesfromsubquery.add("A

Re: Spark and Flink

2015-05-19 Thread Pa Rö
it's sound good, maybe you can send me pseudo structure, that is my fist maven project. best regards, paul 2015-05-18 14:05 GMT+02:00 Robert Metzger : > Hi, > I would really recommend you to put your Flink and Spark dependencies into > different maven modules. > Having them both in the same proj

Re: Package multiple jobs in a single jar

2015-05-19 Thread Flavio Pompermaier
Nice feature Matthias! My suggestion is to create a specific Flink interface to get also description of a job and standardize parameter passing. Then, somewhere (e.g. Manifest) you could specify the list of packages (or also directly the classes) to inspect with reflection to extract the list of av