spark vs flink low memory available

2015-08-10 Thread Pa
hi community, i have build a spark and flink k-means application. my test case is a clustering on 1 million points on 3node cluster. in memory bottlenecks begins flink to outsource to disk and work slowly but works. however spark lose executers if the memory is full and starts again (infinety loo

filter as termination condition

2015-07-21 Thread Pa
hello, i have define a filter for the termination condition by k-means. if i run my app it always compute only one iteration. i think the problem is here: DataSet finalCentroids = loop.closeWith(newCentroids, newCentroids.join(loop).where("*").equalTo("*").filter(new MyFilter())); or maybe the fi

Re: loop break operation

2015-07-20 Thread Pa
xception e) { e.printStackTrace(); } } 2015-07-20 12:58 GMT+02:00 Fabian Hueske : > Use a broadcastset to distribute the old centers to a map which has the > new centers as regular input. Put the old centers in a hashmap in open() > and check the distance to the n

Re: loop break operation

2015-07-20 Thread Pa
. +91-9871457685 > On Jul 20, 2015 3:21 PM, "Pa Rö" wrote: > >> i not found the "iterateWithTermination" function, only "iterate" and >> "iterateDelta". i use flink 0.9.0 with java. >> >> 2015-07-20 11:30 GMT+02:00 Sachin Goel : &

Re: loop break operation

2015-07-20 Thread Pa
iteration then would be (next solution, isConverged) where > isConverged is an empty data set if you wish to terminate. > However, this is something I have a pull request for: > https://github.com/apache/flink/pull/918. Take a look. > > -- Sachin Goel > Computer Science, IIT Delhi &g

loop break operation

2015-07-20 Thread Pa
hello community, i have write a k-means app in flink, now i want change my terminate condition from max iteration to checking the changing of the cluster centers, but i don't know how i can break the flink loop. here my execution code of flink: public void run() { //load properties

flink on yarn configuration

2015-07-14 Thread Pa
hello community, i want run my flink app on a cluster (cloudera 5.4.4) with 3 nodes (one pc has i7 8core with 16GB RAM). now i want submit my spark job on yarn (20GB RAM). my script to deploy the flink cluster on yarn: export HADOOP_CONF_DIR=/etc/hadoop/conf/ ./flink-0.9.0/bin/yarn-session.sh -n

Re: time measured for each iteration in KMeans

2015-07-01 Thread Pa
ager which is running the Sync task is logging when its >> starting the next iteration. I know its not very convenient. >> You can also log the time and Iteration id (from the >> IterationRuntimeContext) in the open() method. >> >> On Fri, Jun 26, 2015 at 9:57 AM, Pa

Re: time measured for each iteration in KMeans

2015-06-26 Thread Pa
ging when its > starting the next iteration. I know its not very convenient. > You can also log the time and Iteration id (from the > IterationRuntimeContext) in the open() method. > > On Fri, Jun 26, 2015 at 9:57 AM, Pa Rö > wrote: > >> hello flink community, >> &g

time measured for each iteration in KMeans

2015-06-26 Thread Pa
hello flink community, i have write a k means app for clustering temporal geo data. now i want know how many time flink need for compute one iteration. Is it possible to measure that, cause of the execution engine of flink? best regards, paul

memory flush on cluster

2015-06-23 Thread Pa
hi flink community, to time i test my flink app with a benchmark on an hadoop cluster (flink on yarn). my results show me that flink need for the first round more time as all other rounds. maybe flink cache something in memory? and if i run the benchmark 100 rounds my system freeze, i think the me

benchmark my application on hadoop cluster

2015-06-18 Thread Pa
hello, i want benchmark my mapreduce, mahout, spark, flink k-means on hadoop cluster. i have write a jhm benchmark, but i get a error by run on cluster, local it's work fine. maybe someone can solve this problem, i have post on stackoverflow: http://stackoverflow.com/questions/30892720/jmh-benchm

Re: flink k-means on hadoop cluster

2015-06-08 Thread Pa
the path inputs and outputs is not correct since you get > the error message *chown `output’: No such file or directory*. Try to > provide the full path to the chown command such as > hdfs://ServerURI/path/to/your/directory. > ​ > > On Mon, Jun 8, 2015 at 11:23 AM Pa Rö > wrote:

Re: flink k-means on hadoop cluster

2015-06-08 Thread Pa
Hi Robert, i have see that you write me on stackoverflow, thanks. now the path is right and i get the old exception: org.apache.flink.runtime.JobException: Creating the input splits caused an error: File file:/127.0.0.1:8020/home/user/cloudera/outputs/seed-1 does not exist or the user running Flin

Re: flink k-means on hadoop cluster

2015-06-04 Thread Pa
;KMeans Flink"); } catch (Exception e) { e.printStackTrace(); } } maybe i can't use the following for the hdfs? clusteredPoints.writeAsCsv(outputPath+"/points", "\n", " "); finalCentroids.writeAsText(outputPath

Re: flink k-means on hadoop cluster

2015-06-04 Thread Pa
ileInputFormat.java:51) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.(ExecutionJobVertex.java:146) ... 23 more 2015-06-04 17:38 GMT+02:00 Pa Rö : > sorry, i see my yarn end before i can run my app, i must set the write > access for yarn, maybe this solve my problem. > > 2015-06-04 17:

Re: flink k-means on hadoop cluster

2015-06-04 Thread Pa
sorry, i see my yarn end before i can run my app, i must set the write access for yarn, maybe this solve my problem. 2015-06-04 17:33 GMT+02:00 Pa Rö : > i start the yarn-session.sh with sudo > and than the flink run command with sudo, > i get the following exception: > > cloudera

Re: flink k-means on hadoop cluster

2015-06-04 Thread Pa
:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) the FlinkMain.java: 70 is: env.execute("KMeans Flink"); 2015-06-04 17:17 GMT+02:00 Pa Rö : > i try this:

Re: flink k-means on hadoop cluster

2015-06-04 Thread Pa
esent in your current, local directory. > The bash expansion is not able to expand to the files in HDFS. > > > On Thu, Jun 4, 2015 at 5:08 PM, Pa Rö > wrote: > >> [cloudera@quickstart bin]$ sudo su yarn >> bash-4.1$ hadoop fs -chmod 777 >> -chmod: Not enough

Re: flink k-means on hadoop cluster

2015-06-04 Thread Pa
i get the same exception Using JobManager address from YARN properties quickstart.cloudera/ 127.0.0.1:53874 java.io.IOException: Mkdirs failed to create /user/cloudera/outputs 2015-06-04 17:09 GMT+02:00 Pa Rö : > bash-4.1$ hadoop fs -chmod 777 * > chmod: `config.sh': No such file o

Re: flink k-means on hadoop cluster

2015-06-04 Thread Pa
client.sh': No such file or directory chmod: `taskmanager.sh': No such file or directory chmod: `webclient.sh': No such file or directory chmod: `yarn-session.sh': No such file or directory 2015-06-04 17:08 GMT+02:00 Pa Rö : > [cloudera@quickstart bin]$ sudo su yarn > b

Re: flink k-means on hadoop cluster

2015-06-04 Thread Pa
uot; which is running Flink doesn't have > permission to access the files. > > Can you do "sudo su yarn" to become the "yarn" user. Then, you can do > "hadoop fs -chmod 777" to make the files accessible for everyone. > > > On Thu, Jun 4, 201

Re: flink k-means on hadoop cluster

2015-06-04 Thread Pa
04 16:51 GMT+02:00 Robert Metzger : > Once you've started the YARN session, you can submit a Flink job with > "./bin/flink run ". > > The jar file of your job doesn't need to be in HDFS. It has to be in the > local file system and flink will send it to all

Re: flink k-means on hadoop cluster

2015-06-04 Thread Pa
okay, now it run on my hadoop. how i can start my flink job? and where must the jar file save, at hdfs or as local file? 2015-06-04 16:31 GMT+02:00 Robert Metzger : > Yes, you have to run these commands in the command line of the Cloudera VM. > > On Thu, Jun 4, 2015 at 4:28 PM, Pa Rö

Re: flink k-means on hadoop cluster

2015-06-04 Thread Pa
ase post the exact error message you > are getting and I can help you to get it to run. > > > On Thu, Jun 4, 2015 at 4:18 PM, Pa Rö > wrote: > >> hi robert, >> >> i think the problem is the hue api, >> i had the same problem with spark submit script, >> b

Re: flink k-means on hadoop cluster

2015-06-04 Thread Pa
n the Hue user forum / > mailing list: > https://groups.google.com/a/cloudera.org/forum/#!forum/hue-user. > > On Thu, Jun 4, 2015 at 4:09 PM, Pa Rö > wrote: > >> thanks, >> now i want run my app on cloudera live vm single node, >> how i can define my flink job with hue? >

Re: flink k-means on hadoop cluster

2015-06-04 Thread Pa
specify the paths like this: hdfs:///path/to/data. > > On Tue, Jun 2, 2015 at 2:48 PM, Pa Rö > wrote: > >> nice, >> >> which file system i must use for the cluster? java.io or hadoop.fs or >> flink? >> >> 2015-06-02 14:29 GMT+02:00 Robert Metzger :

Re: flink k-means on hadoop cluster

2015-06-02 Thread Pa
4096 > > > > > > On Tue, Jun 2, 2015 at 2:03 PM, Pa Rö > wrote: > >> hi community, >> >> i want test my flink k-means on a hadoop cluster. i use the cloudera live >> distribution. how i can run flink on this cluster? maybe only the java >> dependencies are engouth? >> >> best regards, >> paul >> > >

flink k-means on hadoop cluster

2015-06-02 Thread Pa
hi community, i want test my flink k-means on a hadoop cluster. i use the cloudera live distribution. how i can run flink on this cluster? maybe only the java dependencies are engouth? best regards, paul

count the k-means iteration

2015-05-26 Thread Pa
hi community, my k-means works fine now. thanks a lot for your help. now i want test something, how is the best way in flink to cout the iteration? best regards, paul

Re: k means - waiting for dataset

2015-05-26 Thread Pa
like my other >> implementation? >> >> best regards, >> paul >> >> >> >> Am 22.05.2015 um 16:52 schrieb Stephan Ewen: >> >> Sorry, I don't understand the question. >> >> Can you describe a bit better what you mean with "h

Re: k means - waiting for dataset

2015-05-22 Thread Pa
Geo()); list.add(input2.getGeo()); LatLongSeriable geo = Geometry.getGeoCenterOf(list); return new GeoTimeDataTupel(geo,time,"POINT"); } how i can sum all points and share thoug the counter? 2015-05-22 9:53 GMT+02:00 Pa Rö : > hi, > if i print the centroids

Re: k means - waiting for dataset

2015-05-22 Thread Pa
> Hi! >> >> This problem should not depend on any user code. There are no user-code >> dependent actors in Flink. >> >> Is there more stack trace that you can send us? It looks like it misses >> the core exception that is causing the issue is not part of th

Re: k means - waiting for dataset

2015-05-21 Thread Pa
ohrmann : > Hi Paul, > > could you share your code with us so that we see whether there is any > error. > > Does this error also occurs with 0.9-SNAPSHOT? > > Cheers, > Till > > Che > > On Thu, May 21, 2015 at 11:11 AM, Pa Rö > wrote: > >> hi flin

k means - waiting for dataset

2015-05-21 Thread Pa
hi flink community, i have implement k-means for clustering temporal geo data. i use the following github project and my own data structure: https://github.com/apache/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/clustering/KMeans.java not i hav

Re: Spark and Flink

2015-05-21 Thread Pa
>>> >>> >>> >>> >>> eu.stratosphere >>> flink-scala >>> >>> >>> eu.stratosphere >>> flink-java >>> >

Re: Spark and Flink

2015-05-19 Thread Pa
nk > --> yourproject-spark > > > > On Mon, May 18, 2015 at 10:00 AM, Pa Rö > wrote: > >> hi, >> if i add your dependency i get over 100 errors, now i change the version >> number: >> >>

k-means core function for temporal geo data

2015-05-18 Thread Pa
hallo, i want cluster geo data (lat,long,timestamp) with k-means. now i search for a good core function, i can not find good paper or other sources for that. to time i multiplicate the time and the space distance: public static double dis(GeoData input1, GeoData input2) { double timeDis = Math

Fwd: Spark and Flink

2015-05-18 Thread Pa
ai 2015 15:15:34 MESZ, schrieb Ted Yu : >>> >>> You can run the following command: >>> mvn dependency:tree >>> >>> And see what jetty versions are brought in. >>> >>> Cheers >>> >>> >>> >>> On M

Spark and Flink

2015-05-13 Thread Pa
hi, i use spark and flink in the same maven project, now i get a exception on working with spark, flink work well the problem are transitiv dependencies. maybe somebody know a solution, or versions, which work together. best regards paul ps: a cloudera maven repo flink would be desirable my

Re: Channel received an event before completing the current partial record

2015-05-13 Thread Pa
it should not happen > ;-) > > Is this error reproducable? If yes, we can probably fix it well... > > Greetings, > Stephan > > > On Wed, May 13, 2015 at 1:16 PM, Pa Rö > wrote: > >> my function code: >> private static DataSet >> getPoint

Re: Channel received an event before completing the current partial record

2015-05-13 Thread Pa
.map(new TuplePointConverter()); } and i use the GDET data from here: http://data.gdeltproject.org/events/index.html 2015-05-13 13:09 GMT+02:00 Pa Rö : > hi, > > i read a csv file from disk with flink (java, maven version 8.1) and get > the following exception: > >

Re: flink ml - k-means

2015-05-13 Thread Pa
mple? The code is for three-dimensional points, > but you should be able to generalize it easily. > That would be the fastest way to go. without waiting for any release > dates... > > Stephan > > > On Mon, May 11, 2015 at 2:46 PM, Pa Rö > wrote: > >> hi, >>

Channel received an event before completing the current partial record

2015-05-13 Thread Pa
hi, i read a csv file from disk with flink (java, maven version 8.1) and get the following exception: ERROR operators.DataSinkTask: Error in user code: Channel received an event before completing the current partial record.: DataSink(Print to System.out) (4/4) java.lang.IllegalStateException: Ch

Re: flink ml - k-means

2015-05-11 Thread Pa
hi, now i want implement kmeans with flink, maybe you know a release date for flink ml kmeans? best regards paul 2015-04-27 9:36 GMT+02:00 Pa Rö : > Hi Alexander and Till, > > thanks for your informations, I look forward to the release. > I'm curious how well is flink ml a

flink ml k means relase

2015-05-11 Thread Pa
hi, now i want implement kmeans with flink, maybe you know a release date for flink ml kmeans? best regards paul

Re: flink ml - k-means

2015-04-27 Thread Pa
jira/browse/FLINK-1731 >> >> Regards, >> Alexander >> >> PS. Bear in mind that we will start with a vanilla implementation of >> K-Means. For a thorough evaluation you might want to also check variants >> like K-Means++. >> >> >> 2015-04-

flink ml - k-means

2015-04-24 Thread Pa
hi flink community, at the time I write my master thesis in the field machine learning. My main task is to evaluated different k-means variants for large data sets (BigData). I would like test flink ml against Apache Mahout and Apache Hadoop MapReduce in areas of scalability and performance(time a

[no subject]

2015-04-24 Thread Pa
user-sc.1429880470. oeiopbmoofcapkjibfab-paul.roewer1990=googlemail@flink.apache.org