Re: how to convert DataStream to DataSet

2016-03-15 Thread Balaji Rajagopalan
There was a similar question before the answer was to use org.apache.flink.api.common.io.OutputFormat to do the conversion. On Tue, Mar 15, 2016 at 7:48 PM, subash basnet wrote: > Hello all, > > In WikipediaAnalysis.java we get *result *of type *DataStream Long>>*, > > I would want to convert *r

Re: Flink streaming throughput

2016-03-15 Thread おぎばやしひろのり
Milinda, Thanks. I will try. Regards, Hironori 2016/03/16 1:31 "Milinda Pathirage" : > Hi Hironori, > > [1] and [2] describes the process of measuring Kafka performance. I think > the perf test code is under org.apache.kafka.tools package in 0.9, so you > may have to change commands in [2] to re

Re: Integration Alluxio and Flink

2016-03-15 Thread Andrea Sella
Hi Robert, I've missed to tell that I built a fat-jar of the job using `sbt assembly` so my job is including alluxio-core-client. 2016-03-15 17:45 GMT+01:00 Robert Metzger : > Hi Andrea, > > the filesystem class can not be in the job jar. You have to put it into > the lib folder. > > On Tue, Ma

Re: Integration Alluxio and Flink

2016-03-15 Thread Robert Metzger
Hi Andrea, the filesystem class can not be in the job jar. You have to put it into the lib folder. On Tue, Mar 15, 2016 at 5:40 PM, Andrea Sella wrote: > Hi Till, > > I put the jar as dependency of my job on build.sbt. I need to do > somenthing else? > > val flinkDependencies = Seq( > "org.ap

Re: JobManager Dashboard and YARN execution

2016-03-15 Thread Andrea Sella
Thanks Till for the update. BR, Andrea 2016-03-15 17:16 GMT+01:00 Till Rohrmann : > Hi Andrea, > > there is also a PR [1] which will allow you to access the TaskManager logs > via the UI. > > [1] https://github.com/apache/flink/pull/1790 > > Cheers, > Till > > On Wed, Mar 9, 2016 at 1:58 PM, Ste

Re: Integration Alluxio and Flink

2016-03-15 Thread Andrea Sella
Hi Till, I put the jar as dependency of my job on build.sbt. I need to do somenthing else? val flinkDependencies = Seq( "org.apache.flink" %% "flink-scala" % flinkVersion % "provided", "org.apache.flink" %% "flink-streaming-scala" % flinkVersion % "provided", ("org.alluxio" % "alluxio-core-

Re: Accumulators checkpointed?

2016-03-15 Thread Fabian Hueske
Hi Zach, at the moment, accumulators are not checkpointed and reset if if a failed task is restarted. Best, Fabian 2016-03-15 17:27 GMT+01:00 Zach Cox : > Are accumulators stored in checkpoint state? If a job fails and restarts, > are all accumulator values lost, or are they restored from check

Javadoc version

2016-03-15 Thread Ken Krugler
Perusing the docs, and noticed this... https://ci.apache.org/projects/flink/flink-docs-release-1.0/api/java/ says "flink 1.0-SNAPSHOT API" I assume this shouldn't be called the snapshot version. -- Ken -- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custo

Re: Flink streaming throughput

2016-03-15 Thread Milinda Pathirage
Hi Hironori, [1] and [2] describes the process of measuring Kafka performance. I think the perf test code is under org.apache.kafka.tools package in 0.9, so you may have to change commands in [2] to reflect that. Thanks Milinda [1] https://engineering.linkedin.com/kafka/benchmarking-apache-kafka

Accumulators checkpointed?

2016-03-15 Thread Zach Cox
Are accumulators stored in checkpoint state? If a job fails and restarts, are all accumulator values lost, or are they restored from checkpointed state? Thanks, Zach

Re: JobManager Dashboard and YARN execution

2016-03-15 Thread Till Rohrmann
Hi Andrea, there is also a PR [1] which will allow you to access the TaskManager logs via the UI. [1] https://github.com/apache/flink/pull/1790 Cheers, Till On Wed, Mar 9, 2016 at 1:58 PM, Stephan Ewen wrote: > Hi! > > Yes, the dashboard is available in both cases. It is proxied through the >

Re: Integration Alluxio and Flink

2016-03-15 Thread Till Rohrmann
Hi Andrea, can it be that the alluxio.hadoop.FileSystem is not in your classpath? Have you put the respective jar file in Flink’s lib folder? Cheers, Till ​ On Tue, Mar 15, 2016 at 12:55 PM, Andrea Sella wrote: > Hi Till, > > I've tried your suggestion (source-code >

Re: Flink job on secure Yarn fails after many hours

2016-03-15 Thread Maximilian Michels
Hi Thomas, Nils (CC) and I found out that you need at least Hadoop version 2.6.1 to properly run Kerberos applications on Hadoop clusters. Versions before that have critical bugs related to the internal security token handling that may expire the token although it is still valid. That said, there

Re: Flink streaming throughput

2016-03-15 Thread おぎばやしひろのり
Robert, Thank you for your response. I would like to try kafka-console-consumer but I have no idea about how to measure the consuming throughput. Are there any standard way? I would also try Kafka broker on physical servers. Regarding version, I have upgraded to Flink 1.0.0 and replaced FlinkKaf

how to convert DataStream to DataSet

2016-03-15 Thread subash basnet
Hello all, In WikipediaAnalysis.java we get *result *of type *DataStream>*, I would want to convert *result* to *newResult* of type *DataSet>*, So tried the following: DataSet>newResult = result.map(new getResult()); public static final class getResult implements MapFunction, Tuple1> { @Overri

Re: kafka.javaapi.consumer.SimpleConsumer class not found

2016-03-15 Thread Robert Metzger
Great to hear! On Tue, Mar 15, 2016 at 1:14 PM, Balaji Rajagopalan < balaji.rajagopa...@olacabs.com> wrote: > Robert, > I got it working for 1.0.0. > > balaji > > On Mon, Mar 14, 2016 at 4:41 PM, Balaji Rajagopalan < > balaji.rajagopa...@olacabs.com> wrote: > >> Yep the same issue as before(cla

Re: kafka.javaapi.consumer.SimpleConsumer class not found

2016-03-15 Thread Balaji Rajagopalan
Robert, I got it working for 1.0.0. balaji On Mon, Mar 14, 2016 at 4:41 PM, Balaji Rajagopalan < balaji.rajagopa...@olacabs.com> wrote: > Yep the same issue as before(class not found) with flink 0.10.2 with > scala version 2.11. I was not able to use scala 2.10 since connector for > flink_con

Re: Integration Alluxio and Flink

2016-03-15 Thread Andrea Sella
Hi Till, I've tried your suggestion (source-code ) and now it throws: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found. The core-site.xml has been set correctly and into the alluxio-wordcount jar

RE: TimeWindow not getting last elements any longer with flink 1.0 vs 0.10.1

2016-03-15 Thread LINZ, Arnaud
Hi, All right… I find this new behavior dangerous since you’ll always miss the last elements of a source that does not last forever if you use processing time windows. I’ve created a source wrapper that sleeps at the end of the last element so that unit test that use processing time work. Chee

Re: OutofMemoryError: Java heap space & Loss of Taskmanager

2016-03-15 Thread Till Rohrmann
Hi Ravinder, the log of the TM you've sent is the log of the only TM which has not been disassociated from the JM. Can it be that you simply stopped the cluster which results in the disassociation events? Normally, Flink should kill all processes. If you have some processes lingering around, then

Re: XGBoost4J: Portable Distributed XGboost in Flink

2016-03-15 Thread Christophe Salperwyck
Hi, The paper compares the performance between your XGBoost and the Spark MLlib version. It would be nice to see how it scales when using Spark or Flink as an engine and also compare it to your native distributed version (with rabit, right?). If you have some charts, they are welcome :-) BTW, wh

Re: OutofMemoryError: Java heap space & Loss of Taskmanager

2016-03-15 Thread Ravinder Kaur
Hi Till, Log of JobManager 09:55:31,574 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 09:55:31,742 INFO org.apache.flink.runtime.jobmanager.JobManager -

Re: OutofMemoryError: Java heap space & Loss of Taskmanager

2016-03-15 Thread Till Rohrmann
Hi Ravinder, this should not be the relevant log extract. The log says that the TM is started on port 49653 and the JM log says that the TM on port 4 is lost. Would you mind to share the complete JM and TM logs with us? Cheers, Till On Tue, Mar 15, 2016 at 10:54 AM, Ravinder Kaur wrote: >

Re: OutofMemoryError: Java heap space & Loss of Taskmanager

2016-03-15 Thread Ravinder Kaur
Hello Ufuk, Yes, the same WordCount program is being run. Kind Regards, Ravinder Kaur On Tue, Mar 15, 2016 at 10:45 AM, Ufuk Celebi wrote: > What do you mean with iteration in this context? Are you repeatedly > running the same WordCount program for streaming and batch > respectively? > > – Uf

Re: OutofMemoryError: Java heap space & Loss of Taskmanager

2016-03-15 Thread Ravinder Kaur
Hi Till, Following is the log file of one of the taskmanagers 09:55:37,071 INFO org.apache.flink.runtime.util.LeaderRetrievalUtils - Trying to select the network interface and address to use by connecting to the leading JobManager. 09:55:37,072 INFO org.apache.flink.runtime.util.LeaderRetr

Re: OutofMemoryError: Java heap space & Loss of Taskmanager

2016-03-15 Thread Ufuk Celebi
What do you mean with iteration in this context? Are you repeatedly running the same WordCount program for streaming and batch respectively? – Ufuk On Tue, Mar 15, 2016 at 10:22 AM, Till Rohrmann wrote: > Hi Ravinder, > > could you tell us what's written in the taskmanager log of the failing > t

Re: MatrixMultiplication

2016-03-15 Thread Till Rohrmann
Hi Lydia, the implementation looks correct. What you could do to speed up the computation is to exploit existing partitionings in order to avoid unnecessary network shuffles. Moreover, you could block your matrices to increase the data granularity at the cost of parallelism. Cheers, Till On Mon,

Re: XGBoost4J: Portable Distributed XGboost in Flink

2016-03-15 Thread Till Rohrmann
Great to hear Tianqi :-) I will try it out. Cheers, Till On Tue, Mar 15, 2016 at 12:41 AM, Tianqi Chen wrote: > Hi Flink Community: > I am sending this email to let you know we just release XGBoost4J > which also runs on Flink. In short, XGBoost is a machine learning package > that is used

Re: OutofMemoryError: Java heap space & Loss of Taskmanager

2016-03-15 Thread Till Rohrmann
Hi Ravinder, could you tell us what's written in the taskmanager log of the failing taskmanager? There should be some kind of failure why the taskmanager stopped working. Moreover, given that you have 64 GB of main memory, you could easily give 50GB as heap memory to each taskmanager. Cheers, Ti

OutofMemoryError: Java heap space & Loss of Taskmanager

2016-03-15 Thread Ravinder Kaur
Hello All, I'm running a simple word count example using the quickstart package from the Flink(0.10.1), on an input dataset of 500MB. This dataset is a set of randomly generated words of length 8. Cluster Configuration: Number of machines: 7 Total cores : 25 Memory on each: 64GB I'm interested