hi community,
i have build a spark and flink k-means application.
my test case is a clustering on 1 million points on 3node cluster.
in memory bottlenecks begins flink to outsource to disk and work slowly but
works.
however spark lose executers if the memory is full and starts again
(infinety loo
hello,
i have define a filter for the termination condition by k-means.
if i run my app it always compute only one iteration.
i think the problem is here:
DataSet finalCentroids = loop.closeWith(newCentroids,
newCentroids.join(loop).where("*").equalTo("*").filter(new MyFilter()));
or maybe the fi
xception e) {
e.printStackTrace();
}
}
2015-07-20 12:58 GMT+02:00 Fabian Hueske :
> Use a broadcastset to distribute the old centers to a map which has the
> new centers as regular input. Put the old centers in a hashmap in open()
> and check the distance to the n
. +91-9871457685
> On Jul 20, 2015 3:21 PM, "Pa Rö" wrote:
>
>> i not found the "iterateWithTermination" function, only "iterate" and
>> "iterateDelta". i use flink 0.9.0 with java.
>>
>> 2015-07-20 11:30 GMT+02:00 Sachin Goel :
&
iteration then would be (next solution, isConverged) where
> isConverged is an empty data set if you wish to terminate.
> However, this is something I have a pull request for:
> https://github.com/apache/flink/pull/918. Take a look.
>
> -- Sachin Goel
> Computer Science, IIT Delhi
&g
hello community,
i have write a k-means app in flink, now i want change my terminate
condition from max iteration to checking the changing of the cluster
centers, but i don't know how i can break the flink loop. here my execution
code of flink:
public void run() {
//load properties
hello community,
i want run my flink app on a cluster (cloudera 5.4.4) with 3 nodes (one pc
has i7 8core with 16GB RAM). now i want submit my spark job on yarn (20GB
RAM).
my script to deploy the flink cluster on yarn:
export HADOOP_CONF_DIR=/etc/hadoop/conf/
./flink-0.9.0/bin/yarn-session.sh -n
ager which is running the Sync task is logging when its
>> starting the next iteration. I know its not very convenient.
>> You can also log the time and Iteration id (from the
>> IterationRuntimeContext) in the open() method.
>>
>> On Fri, Jun 26, 2015 at 9:57 AM, Pa
ging when its
> starting the next iteration. I know its not very convenient.
> You can also log the time and Iteration id (from the
> IterationRuntimeContext) in the open() method.
>
> On Fri, Jun 26, 2015 at 9:57 AM, Pa Rö
> wrote:
>
>> hello flink community,
>>
&g
hello flink community,
i have write a k means app for clustering temporal geo data. now i want
know how many time flink need for compute one iteration. Is it possible to
measure that, cause of the execution engine of flink?
best regards,
paul
hi flink community,
to time i test my flink app with a benchmark on an hadoop cluster (flink on
yarn).
my results show me that flink need for the first round more time as all
other rounds. maybe flink cache something in memory? and if i run the
benchmark 100 rounds my system freeze, i think the me
hello,
i want benchmark my mapreduce, mahout, spark, flink k-means on hadoop
cluster.
i have write a jhm benchmark, but i get a error by run on cluster, local
it's work fine.
maybe someone can solve this problem, i have post on stackoverflow:
http://stackoverflow.com/questions/30892720/jmh-benchm
the path inputs and outputs is not correct since you get
> the error message *chown `output’: No such file or directory*. Try to
> provide the full path to the chown command such as
> hdfs://ServerURI/path/to/your/directory.
>
>
> On Mon, Jun 8, 2015 at 11:23 AM Pa Rö
> wrote:
Hi Robert,
i have see that you write me on stackoverflow, thanks. now the path is
right and i get the old exception:
org.apache.flink.runtime.JobException: Creating the input splits caused an
error: File file:/127.0.0.1:8020/home/user/cloudera/outputs/seed-1 does not
exist or the user running Flin
;KMeans Flink");
} catch (Exception e) {
e.printStackTrace();
}
}
maybe i can't use the following for the hdfs?
clusteredPoints.writeAsCsv(outputPath+"/points", "\n", " ");
finalCentroids.writeAsText(outputPath
ileInputFormat.java:51)
at
org.apache.flink.runtime.executiongraph.ExecutionJobVertex.(ExecutionJobVertex.java:146)
... 23 more
2015-06-04 17:38 GMT+02:00 Pa Rö :
> sorry, i see my yarn end before i can run my app, i must set the write
> access for yarn, maybe this solve my problem.
>
> 2015-06-04 17:
sorry, i see my yarn end before i can run my app, i must set the write
access for yarn, maybe this solve my problem.
2015-06-04 17:33 GMT+02:00 Pa Rö :
> i start the yarn-session.sh with sudo
> and than the flink run command with sudo,
> i get the following exception:
>
> cloudera
:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
the FlinkMain.java: 70 is:
env.execute("KMeans Flink");
2015-06-04 17:17 GMT+02:00 Pa Rö :
> i try this:
esent in your current, local directory.
> The bash expansion is not able to expand to the files in HDFS.
>
>
> On Thu, Jun 4, 2015 at 5:08 PM, Pa Rö
> wrote:
>
>> [cloudera@quickstart bin]$ sudo su yarn
>> bash-4.1$ hadoop fs -chmod 777
>> -chmod: Not enough
i get the same exception
Using JobManager address from YARN properties quickstart.cloudera/
127.0.0.1:53874
java.io.IOException: Mkdirs failed to create /user/cloudera/outputs
2015-06-04 17:09 GMT+02:00 Pa Rö :
> bash-4.1$ hadoop fs -chmod 777 *
> chmod: `config.sh': No such file o
client.sh': No such file or directory
chmod: `taskmanager.sh': No such file or directory
chmod: `webclient.sh': No such file or directory
chmod: `yarn-session.sh': No such file or directory
2015-06-04 17:08 GMT+02:00 Pa Rö :
> [cloudera@quickstart bin]$ sudo su yarn
> b
uot; which is running Flink doesn't have
> permission to access the files.
>
> Can you do "sudo su yarn" to become the "yarn" user. Then, you can do
> "hadoop fs -chmod 777" to make the files accessible for everyone.
>
>
> On Thu, Jun 4, 201
04 16:51 GMT+02:00 Robert Metzger :
> Once you've started the YARN session, you can submit a Flink job with
> "./bin/flink run ".
>
> The jar file of your job doesn't need to be in HDFS. It has to be in the
> local file system and flink will send it to all
okay, now it run on my hadoop.
how i can start my flink job? and where must the jar file save, at hdfs or
as local file?
2015-06-04 16:31 GMT+02:00 Robert Metzger :
> Yes, you have to run these commands in the command line of the Cloudera VM.
>
> On Thu, Jun 4, 2015 at 4:28 PM, Pa Rö
ase post the exact error message you
> are getting and I can help you to get it to run.
>
>
> On Thu, Jun 4, 2015 at 4:18 PM, Pa Rö
> wrote:
>
>> hi robert,
>>
>> i think the problem is the hue api,
>> i had the same problem with spark submit script,
>> b
n the Hue user forum /
> mailing list:
> https://groups.google.com/a/cloudera.org/forum/#!forum/hue-user.
>
> On Thu, Jun 4, 2015 at 4:09 PM, Pa Rö
> wrote:
>
>> thanks,
>> now i want run my app on cloudera live vm single node,
>> how i can define my flink job with hue?
>
specify the paths like this: hdfs:///path/to/data.
>
> On Tue, Jun 2, 2015 at 2:48 PM, Pa Rö
> wrote:
>
>> nice,
>>
>> which file system i must use for the cluster? java.io or hadoop.fs or
>> flink?
>>
>> 2015-06-02 14:29 GMT+02:00 Robert Metzger :
4096
>
>
>
>
>
> On Tue, Jun 2, 2015 at 2:03 PM, Pa Rö
> wrote:
>
>> hi community,
>>
>> i want test my flink k-means on a hadoop cluster. i use the cloudera live
>> distribution. how i can run flink on this cluster? maybe only the java
>> dependencies are engouth?
>>
>> best regards,
>> paul
>>
>
>
hi community,
i want test my flink k-means on a hadoop cluster. i use the cloudera live
distribution. how i can run flink on this cluster? maybe only the java
dependencies are engouth?
best regards,
paul
hi community,
my k-means works fine now. thanks a lot for your help.
now i want test something, how is the best way in flink to cout
the iteration?
best regards,
paul
like my other
>> implementation?
>>
>> best regards,
>> paul
>>
>>
>>
>> Am 22.05.2015 um 16:52 schrieb Stephan Ewen:
>>
>> Sorry, I don't understand the question.
>>
>> Can you describe a bit better what you mean with "h
Geo());
list.add(input2.getGeo());
LatLongSeriable geo = Geometry.getGeoCenterOf(list);
return new GeoTimeDataTupel(geo,time,"POINT");
}
how i can sum all points and share thoug the counter?
2015-05-22 9:53 GMT+02:00 Pa Rö :
> hi,
> if i print the centroids
> Hi!
>>
>> This problem should not depend on any user code. There are no user-code
>> dependent actors in Flink.
>>
>> Is there more stack trace that you can send us? It looks like it misses
>> the core exception that is causing the issue is not part of th
ohrmann :
> Hi Paul,
>
> could you share your code with us so that we see whether there is any
> error.
>
> Does this error also occurs with 0.9-SNAPSHOT?
>
> Cheers,
> Till
>
> Che
>
> On Thu, May 21, 2015 at 11:11 AM, Pa Rö
> wrote:
>
>> hi flin
hi flink community,
i have implement k-means for clustering temporal geo data. i use the
following github project and my own data structure:
https://github.com/apache/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/clustering/KMeans.java
not i hav
>>>
>>>
>>>
>>>
>>> eu.stratosphere
>>> flink-scala
>>>
>>>
>>> eu.stratosphere
>>> flink-java
>>>
>
nk
> --> yourproject-spark
>
>
>
> On Mon, May 18, 2015 at 10:00 AM, Pa Rö
> wrote:
>
>> hi,
>> if i add your dependency i get over 100 errors, now i change the version
>> number:
>>
>>
hallo,
i want cluster geo data (lat,long,timestamp) with k-means. now i search for
a good core function, i can not find good paper or other sources for that.
to time i multiplicate the time and the space distance:
public static double dis(GeoData input1, GeoData input2)
{
double timeDis = Math
ai 2015 15:15:34 MESZ, schrieb Ted Yu :
>>>
>>> You can run the following command:
>>> mvn dependency:tree
>>>
>>> And see what jetty versions are brought in.
>>>
>>> Cheers
>>>
>>>
>>>
>>> On M
hi,
i use spark and flink in the same maven project,
now i get a exception on working with spark, flink work well
the problem are transitiv dependencies.
maybe somebody know a solution, or versions, which work together.
best regards
paul
ps: a cloudera maven repo flink would be desirable
my
it should not happen
> ;-)
>
> Is this error reproducable? If yes, we can probably fix it well...
>
> Greetings,
> Stephan
>
>
> On Wed, May 13, 2015 at 1:16 PM, Pa Rö
> wrote:
>
>> my function code:
>> private static DataSet
>> getPoint
.map(new TuplePointConverter());
}
and i use the GDET data from here:
http://data.gdeltproject.org/events/index.html
2015-05-13 13:09 GMT+02:00 Pa Rö :
> hi,
>
> i read a csv file from disk with flink (java, maven version 8.1) and get
> the following exception:
>
>
mple? The code is for three-dimensional points,
> but you should be able to generalize it easily.
> That would be the fastest way to go. without waiting for any release
> dates...
>
> Stephan
>
>
> On Mon, May 11, 2015 at 2:46 PM, Pa Rö
> wrote:
>
>> hi,
>>
hi,
i read a csv file from disk with flink (java, maven version 8.1) and get
the following exception:
ERROR operators.DataSinkTask: Error in user code: Channel received an event
before completing the current partial record.: DataSink(Print to
System.out) (4/4)
java.lang.IllegalStateException: Ch
hi,
now i want implement kmeans with flink,
maybe you know a release date for flink ml kmeans?
best regards
paul
2015-04-27 9:36 GMT+02:00 Pa Rö :
> Hi Alexander and Till,
>
> thanks for your informations, I look forward to the release.
> I'm curious how well is flink ml a
hi,
now i want implement kmeans with flink,
maybe you know a release date for flink ml kmeans?
best regards
paul
jira/browse/FLINK-1731
>>
>> Regards,
>> Alexander
>>
>> PS. Bear in mind that we will start with a vanilla implementation of
>> K-Means. For a thorough evaluation you might want to also check variants
>> like K-Means++.
>>
>>
>> 2015-04-
hi flink community,
at the time I write my master thesis in the field machine learning. My main
task is to evaluated different k-means variants for large data sets
(BigData). I would like test flink ml against Apache Mahout and Apache
Hadoop MapReduce in areas of scalability and performance(time a
user-sc.1429880470.
oeiopbmoofcapkjibfab-paul.roewer1990=googlemail@flink.apache.org
49 matches
Mail list logo