XGBoost4J: Portable Distributed XGboost in Flink

2016-03-14 Thread Tianqi Chen
Hi Flink Community: I am sending this email to let you know we just release XGBoost4J which also runs on Flink. In short, XGBoost is a machine learning package that is used by more than half of the machine challenge winning solutions and is already widely used in industry. The distributed versi

MatrixMultiplication

2016-03-14 Thread Lydia Ickler
Hi, I wrote to you before about the MatrixMultiplication in Flink … Unfortunately, the multiplication of a pair of 1000 x 1000 matrices is taking already almost a minute. Would you please take a look at my attached code. Maybe you can suggest something to make it faster? Or would it be better

Re: asm IllegalArgumentException with 1.0.0

2016-03-14 Thread Zach Cox
Yes pretty much - we use sbt to run the job in a local environment, not Intellij, but should be the same thing. We were also seeing that exception running unit tests locally. We did not see the exception when assembling a fat jar and submitting to a remote Flink cluster. It seems like the flink-co

Re: Memory ran out PageRank

2016-03-14 Thread Ovidiu-Cristian MARCU
Correction: successfully CC I am running is on top of your friend, Spark :) Best, Ovidiu > On 14 Mar 2016, at 20:38, Ovidiu-Cristian MARCU > wrote: > > Yes, largely different. I was expecting for the solution set to be spillable. > This is somehow very hard limitation, the layout of the data ma

Re: Memory ran out PageRank

2016-03-14 Thread Ovidiu-Cristian MARCU
Yes, largely different. I was expecting for the solution set to be spillable. This is somehow very hard limitation, the layout of the data makes the difference. By contract, I am able to run successfully CC on the synthetic data but RDDs are persisted in memory or on disk. Best, Ovidiu > On 14

Re: asm IllegalArgumentException with 1.0.0

2016-03-14 Thread Andrew Whitaker
We're having the same issue (we also have a dependency on flink-connector-elasticsearch). It's only happening to us in IntelliJ though. Is this the case for you as well? On Thu, Mar 10, 2016 at 3:20 PM, Zach Cox wrote: > After some poking around I noticed > that flink-connector-elasticsearch_2.1

Re: Memory ran out PageRank

2016-03-14 Thread Ufuk Celebi
Probably the limitation is that the number of keys is different in the real and the synthetic data set respectively. Can you confirm this? The solution set for delta iterations is currently implemented as an in-memory hash table that works on managed memory segments, but is not spillable. – Ufuk

Re: Memory ran out PageRank

2016-03-14 Thread Ovidiu-Cristian MARCU
This problem is surprising as I was able to run PR and CC on a larger graph (2bil edges) but with this synthetic graph (1bil edges groups of 10) I ran out of memory; regarding configuration (memory and parallelism, other internals) I was using the same. There is some limitation somewhere I will

Re: Behavior of SlidingProessingTimeWindow with CountTrigger

2016-03-14 Thread Vishnu Viswanath
Hi Aljoscha, Thank you for the explanation and the link on IBM infosphere. That explains whey I am seeing (a,3) and (b,3) in my example. Yes, the name Evictor is confusing. Thanks and Regards, Vishnu Viswanath, www.vishnuviswanath.com On Mon, Mar 14, 2016 at 11:24 AM, Aljoscha Krettek wrote:

Re: Memory ran out PageRank

2016-03-14 Thread Martin Junghanns
Hi, I understand the confusion. So far, I did not run into the problem, but I think this needs to be adressed as all our graph processing abstractions are implemented on top of the delta iteration. According to the previous mailing list discussion, the problem is with the solution set and it

Re: Memory ran out PageRank

2016-03-14 Thread Vasiliki Kalavri
Hi Ovidiu, this option won't fix the problem if your system doesn't have enough memory :) It only defines whether the solution set is kept in managed memory or not. For more iteration configuration options, take a look at the Gelly documentation [1]. -Vasia. [1]: https://ci.apache.org/projects/f

Re: Memory ran out PageRank

2016-03-14 Thread Ovidiu-Cristian MARCU
Thank you for this alternative. I don’t understand how the workaround will fix this on systems with limited memory and maybe larger graph. Running Connected Components on the same graph gives the same problem. IterationHead(Unnamed Delta Iteration)(82/88) switched to FAILED java.lang.RuntimeExce

Re: Memory ran out PageRank

2016-03-14 Thread Martin Junghanns
Hi I think this is the same issue we had before on the list [1]. Stephan recommended the following workaround: A possible workaround is to use the option "setSolutionSetUnmanaged(true)" on the iteration. That will eliminate the fragmentation issue, at least. Unfortunately, you cannot set th

Re: Integration Alluxio and Flink

2016-03-14 Thread Till Rohrmann
Hi Andrea, the problem won’t be netty-all but netty, I suspect. Flink is using version 3.8 whereas alluxio-core-client uses version 3.2.2. I think you have to exclude or shade this dependency away. Cheers, Till ​ On Mon, Mar 14, 2016 at 5:12 PM, Andrea Sella wrote: > Hi Till, > I tried to down

Re: Behavior of SlidingProessingTimeWindow with CountTrigger

2016-03-14 Thread Aljoscha Krettek
Hi, sure, the evictors are a bit confusing (especially the fact that they are called evictors). They should more correctly called “Keepers”. The process is the following: 1. Trigger Fires 2. Evictor decides what elements to keep, so a CountEvictor.of(3) says, keep only three elements, all other

Re: Integration Alluxio and Flink

2016-03-14 Thread Andrea Sella
Hi Till, I tried to downgrade the Alluxio's netty version from 4.0.28.Final to 4.0.27.Final to align Flink and Alluxio dependencies. First of all, Flink 1.0.0 uses 4.0.27.Final, is it correct? Btw it doesn't work, same error as above. BR, Andrea 2016-03-14 15:30 GMT+01:00 Till Rohrmann : > Yes i

Re: Behavior of SlidingProessingTimeWindow with CountTrigger

2016-03-14 Thread Vishnu Viswanath
Hi Aijoscha, Wow, great illustration. That was very clear explanation. Yes, I did enter the elements fast for case b and I was seeing more of case As. Also, sometimes I have seen a window getting triggered when I enter 1 or 2 elements, I believe that is expansion of case A, w.r.t to window 2. Al

Memory ran out PageRank

2016-03-14 Thread Ovidiu-Cristian MARCU
Hi, While running PageRank on a synthetic graph I run into this problem: Any advice on how should I proceed to overcome this memory issue? IterationHead(Vertex-centric iteration (org.apache.flink.graph.library.PageRank$VertexRankUpdater@7712cae0 | org.apache.flink.graph.library.PageRank$RankMe

Re: Integration Alluxio and Flink

2016-03-14 Thread Till Rohrmann
Yes it seems as if you have a netty version conflict. Maybe the alluxio-core-client.jar pulls in an incompatible netty version. Could you check whether this is the case? But maybe you also have another dependencies which pulls in a wrong netty version, since the Alluxio documentation indicates that

Re: Application logging on YARN

2016-03-14 Thread Stefano Baghino
Ok, my bad, I was simply looking in the wrong place. I though the logs were sent to YARN but they were actually stored in the Flink logs folder. Problem solved, sorry for the mix up. On Sun, Mar 13, 2016 at 8:48 PM, Stefano Baghino < stefano.bagh...@radicalbit.io> wrote: > There's another open th

Integration Alluxio and Flink

2016-03-14 Thread Andrea Sella
Hi to all, I'm trying to integrate Alluxio and Apache Flink, I followed Running Flink on Alluxio to setup Flink. I tested in local mode executing: bin/flink run ./examples/batch/WordCount.jar --input alluxio:///flink/README.

Re: TimeWindow not getting last elements any longer with flink 1.0 vs 0.10.1

2016-03-14 Thread Till Rohrmann
Hi Arnaud, with version 1.0 the behaviour for window triggering in case of a finite stream was slightly changed. If you use event time, then all unfinished windows are triggered in case that your stream ends. This can be motivated by the fact that the end of a stream is equivalent to no elements w

Re: Flink and YARN ship folder

2016-03-14 Thread Andrea Sella
Hi Robert, Ok, thank you. 2016-03-14 11:13 GMT+01:00 Robert Metzger : > Hi Andrea, > > You don't have to manually replicate any operations on the slaves. All > files in the lib/ folder are transferred to all containers (Jobmanagers and > TaskManagers). > > > On Sat, Mar 12, 2016 at 3:25 PM, Andr

TimeWindow not getting last elements any longer with flink 1.0 vs 0.10.1

2016-03-14 Thread LINZ, Arnaud
Hello, I’ve switched my Flink version from 0.10.1 to 1.0 and I have a regression in some of my unit tests. To narrow the problem, here is what I’ve figured out: - I use a simple Streaming application with a source defined as “fromElements("Element 1", "Element 2", "Element 3") -

RE:Flink job on secure Yarn fails after many hours

2016-03-14 Thread Thomas Lamirault
Hello everyone, We are facing the same probleme now in our Flink applications, launch using YARN. Just want to know if there is any update about this exception ? Thanks Thomas De : ni...@basj.es [ni...@basj.es] de la part de Niels Basjes [ni...@basjes

Re: kafka.javaapi.consumer.SimpleConsumer class not found

2016-03-14 Thread Balaji Rajagopalan
Yep the same issue as before(class not found) with flink 0.10.2 with scala version 2.11. I was not able to use scala 2.10 since connector for flink_connector_kafka for 0.10.2 is not available. balaji On Mon, Mar 14, 2016 at 4:20 PM, Balaji Rajagopalan < balaji.rajagopa...@olacabs.com> wrote: >

Re: kafka.javaapi.consumer.SimpleConsumer class not found

2016-03-14 Thread Balaji Rajagopalan
Yes figured that out, thanks for point that, my bad. I have put back 0.10.2 as flink version, will try to reproduce the problem again, this time I have explicitly called out the scala version as 2.11. On Mon, Mar 14, 2016 at 4:14 PM, Robert Metzger wrote: > Hi, > > flink-connector-kafka_ doesn'

Re: Kafka integration error

2016-03-14 Thread Robert Metzger
Hi Stefanos, this looks like an issue with Kafka. Which version does your broker have? Can you check the logs of the broker you are trying to connect to? On Fri, Mar 11, 2016 at 5:27 PM, Stefanos Antaris < antaris.stefa...@gmail.com> wrote: > Hi to all, > > i am trying to make Flink to work with

Re: Checkpoint

2016-03-14 Thread Aljoscha Krettek
Hi, I’m not aware of a problem where pending files are not moved to their final locations. So if you have such a behavior it would indicate a bug. Also, the "trigger checkpoint” does not yet indicate that the checkpoint is happening. If you have a very long sleep interval in some of your operati

Re: kafka.javaapi.consumer.SimpleConsumer class not found

2016-03-14 Thread Robert Metzger
Hi, flink-connector-kafka_ doesn't exist for 1.0.0. You have to use either flink-connector-kafka-0.8_ or flink-connector-kafka-0.9_ On Mon, Mar 14, 2016 at 11:17 AM, Balaji Rajagopalan < balaji.rajagopa...@olacabs.com> wrote: > What I noticied was that, if I remove the dependency on > flink-con

Re: Behavior of SlidingProessingTimeWindow with CountTrigger

2016-03-14 Thread Aljoscha Krettek
Hi, I created a visualization to help explain the situation: http://s21.postimg.org/dofhcw52f/window_example.png The SlidingProcessingTimeWindows assigner assigns elements to windows based on the current processing time. The CountTrigger only fires if a window contains 5 elements (or more). In

Re: kafka.javaapi.consumer.SimpleConsumer class not found

2016-03-14 Thread Balaji Rajagopalan
What I noticied was that, if I remove the dependency on flink-connector-kafka so it is clearly to do something with that dependency. On Mon, Mar 14, 2016 at 3:46 PM, Balaji Rajagopalan < balaji.rajagopa...@olacabs.com> wrote: > Robert, >I have moved on to latest version of flink of 1.0.0 ho

Re: kafka.javaapi.consumer.SimpleConsumer class not found

2016-03-14 Thread Balaji Rajagopalan
Robert, I have moved on to latest version of flink of 1.0.0 hoping that will solve my problem with kafka connector . Here is my pom.xml but now I cannot get the code compiled. [ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.1:compile (scala-compile-first) on project fl

Re: Log4j configuration on YARN

2016-03-14 Thread Robert Metzger
Hi Nick, the name of the "log4j-yarn-session.properties" file might be a bit misleading. The file is just used for the YARN session client, running locally. The Job- and TaskManager are going to use the log4j.properties on the cluster. On Fri, Mar 11, 2016 at 7:20 PM, Ufuk Celebi wrote: > Hey N

Re: Flink and YARN ship folder

2016-03-14 Thread Robert Metzger
Hi Andrea, You don't have to manually replicate any operations on the slaves. All files in the lib/ folder are transferred to all containers (Jobmanagers and TaskManagers). On Sat, Mar 12, 2016 at 3:25 PM, Andrea Sella wrote: > Hi Ufuk, > > I'm trying to execute the WordCount batch example wit

Using a POJO class wrapping an ArrayList

2016-03-14 Thread Mengqi Yang
Hi all, Now I am building a POJO class for key selectors. Here is the class: public class Ids implements Comparable, Serializable{ private static final long serialVersionUID = 1L; private ArrayList ids = new ArrayList(); Ids() {}

Re: KMeans result folder not created

2016-03-14 Thread Aljoscha Krettek
Hi, the problem is that print() eagerly executes the program even before execute() is called. For running the program on a cluster I would suggest to completely remove the “.print()”. Cheers, Aljoscha > On 13 Mar 2016, at 16:13, subash basnet wrote: > > Hello all, > > I created KMeans.jar fro

Re: kafka.javaapi.consumer.SimpleConsumer class not found

2016-03-14 Thread Robert Metzger
Can you send me the full build file to further investigate the issue? On Fri, Mar 11, 2016 at 4:56 PM, Balaji Rajagopalan < balaji.rajagopa...@olacabs.com> wrote: > Robert, > That did not fix it ( using flink and connector same version) . Tried > with scala version 2.11, so will try to see scal

Re: Passing two value to the ConvergenceCriterion function

2016-03-14 Thread Riccardo Diomedi
Ok! On 14 Mar 2016, at 10:41, Robert Metzger wrote: > Hi, > > take a look at the "Record" class. That one implements the Value interface > and can have multiple values. > > On Fri, Mar 11, 2016 at 6:01 PM, Riccardo Diomedi > wrote: > Hi > > I want to send two value to the ConvergenceCriteri

Re: Passing two value to the ConvergenceCriterion function

2016-03-14 Thread Robert Metzger
Hi, take a look at the "Record" class. That one implements the Value interface and can have multiple values. On Fri, Mar 11, 2016 at 6:01 PM, Riccardo Diomedi < riccardo.diomed...@gmail.com> wrote: > Hi > > I want to send two value to the ConvergenceCriterion function, so i > decided to use an a