Re: Powered by Flink

2016-04-06 Thread Sebastian
You should also add Apache Mahout, whose new Samsara DSL also runs on Flink. -s On 06.04.2016 08:50, Henry Saputra wrote: Thanks, Slim. I have just updated the wiki page with this entries. On Tue, Apr 5, 2016 at 10:20 AM, Slim Baltagi mailto:sbalt...@gmail.com>> wrote: Hi The followi

Re: Powered by Flink

2016-04-06 Thread Suneel Marthi
I was gonna hold off on that until we get Mahout 0.12.0 out of the door (targeted for this weekend). I would add Apache NiFi to the list. Future : Apache Mahout Apache BigTop Openstack and Kubernetes (skunkworks) On Wed, Apr 6, 2016 at 3:03 AM, Sebastian wrote: > You should also add Apache

Running Flink jobs directly from Eclipse

2016-04-06 Thread Serhiy Boychenko
Cheerz, I have been working last few month on the comparison of different data processing engines and recently came across Apache Flink. After reading different academic papers on comparison of Flink with other data processing I would definitely give it a shot. The only issue I am currently hav

Re: Running Flink jobs directly from Eclipse

2016-04-06 Thread Christophe Salperwyck
>From my side I was starting the YARN session from the cluster: flink-0.10.1/bin/yarn-session.sh -n 64 -s 4 -jm 4096 -tm 4096 Then getting the IP/port from the WebUI and then from Eclipse: ExecutionEnvironment env = ExecutionEnvironment.createRemoteEnvironment("xx.xx.xx.xx", 40631, "target/FlinkTe

Re: Integrate Flink with S3 on EMR cluster

2016-04-06 Thread Ufuk Celebi
Yes, for sure. I added some documentation for AWS here: https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/aws.html Would be nice to update that page with your pull request. :-) – Ufuk On Wed, Apr 6, 2016 at 4:58 AM, Chiwan Park wrote: > Hi Timur, > > Great! Bootstrap action fo

Re: RemoteTransportException when trying to redis in flink code

2016-04-06 Thread Till Rohrmann
Hi Balaji, from the stack trace it looks as if you cannot open a connection redis. Have you checked that you can access redis from all your TaskManager nodes? Cheers, Till On Wed, Apr 6, 2016 at 7:46 AM, Balaji Rajagopalan < balaji.rajagopa...@olacabs.com> wrote: > I am trying to use AWS EMR ya

Flink CEP AbstractCEPPatternOperator fail after event detection

2016-04-06 Thread norman sp
Hi, I'm trying out the new CEP library but have some problems with event detection. In my case Flink detects the event pattern: A followed by B within 10 seconds. But short time after event detection when the event pattern isn't matched anymore, the program crashes with the error message: 04/06/20

Re: Checkpoint state stored in backend, and deleting old checkpoint state

2016-04-06 Thread Stephan Ewen
Hi Zach! I am working on incremental checkpointing, hope to have it in the master in the next weeks. The current approach is a to have a full self-contained checkpoint every once in a while, and have incremental checkpoints most of the time. Having a full checkpoint every now and then spares you

Re: RemoteTransportException when trying to redis in flink code

2016-04-06 Thread Balaji Rajagopalan
Till, I have checked from all the taskmanager nodes I am able to establish a connection by installing a redis-cli on those nodes. The thing is in the constructor I am able to set and get values, also I am getting PONG for the ping. But once object is initialized when I try to call DriverStreamHel

Re: Flink CEP AbstractCEPPatternOperator fail after event detection

2016-04-06 Thread Till Rohrmann
Hi Norman, which version of Flink are you using? We recently fixed some issues with the CEP library which looked similar to your error message. The problem occurred when using the CEP library with processing time. Switching to event or ingestion time, solve the problem. The fixes to make it also

Re: RemoteTransportException when trying to redis in flink code

2016-04-06 Thread Till Rohrmann
Hmm I'm not a Redis expert, but are you sure that you see a successful ping reply in the logs of the TaskManagers and not only in the client logs? Another thing: Is the redisClient thread safe? Multiple map tasks might be accessing the set and get methods concurrently. Another question: The code

Re: Back Pressure details

2016-04-06 Thread Ufuk Celebi
Hey Zach, just added some documentation, which will be available in ~ 30 mins here: https://ci.apache.org/projects/flink/flink-docs-release-1.0/internals/back_pressure_monitoring.html If you think that something is missing there, I would appreciate some feedback. :-) Back pressure is determined

Re: Running Flink jobs directly from Eclipse

2016-04-06 Thread Shannon Carey
Thanks for the info! It is a bit difficult to tell based on the documentation whether or not you need to put your jar onto the Flink master node and run the flink command from there in order to get a job running. The documentation on https://ci.apache.org/projects/flink/flink-docs-release-1.0/se

Re: Running Flink jobs directly from Eclipse

2016-04-06 Thread Christophe Salperwyck
For me it was taking the local jar and uploading it into the cluster. 2016-04-06 13:16 GMT+02:00 Shannon Carey : > Thanks for the info! It is a bit difficult to tell based on the > documentation whether or not you need to put your jar onto the Flink master > node and run the flink command from th

[ANNOUNCE] Flink 1.0.1 Released

2016-04-06 Thread Ufuk Celebi
The Flink PMC is pleased to announce the availability of Flink 1.0.1. The official release announcement: http://flink.apache.org/news/2016/04/06/release-1.0.1.html Release binaries: http://apache.openmirror.de/flink/flink-1.0.1/ Please update your Maven dependencies to the new 1.0.1 version and

Re: CEP blog post

2016-04-06 Thread Ufuk Celebi
The website has been updated for 1.0.1. :-) @Till: If you don't mention it in the post, it makes sense to have a note at the end of the post saying that the code examples only work with 1.0.1. On Mon, Apr 4, 2016 at 3:35 PM, Till Rohrmann wrote: > Thanks a lot to all for the valuable feedback. I

Re: CEP blog post

2016-04-06 Thread Matthias J. Sax
"Getting Started" in main page shows "Download 1.0" instead of 1.0.1 -Matthias On 04/06/2016 02:03 PM, Ufuk Celebi wrote: > The website has been updated for 1.0.1. :-) > > @Till: If you don't mention it in the post, it makes sense to have a > note at the end of the post saying that the code exam

Re: RemoteTransportException when trying to redis in flink code

2016-04-06 Thread Balaji Rajagopalan
Till, Found the issue, it was my bad assumption about GlobalConfiguration, what I thought was once the configuration is read from the client machine GlobalConfiguration params will passed on to the task manager nodes, as well, it was not and values from default was getting pickup, which was local

Re: Accessing RDF triples using Flink

2016-04-06 Thread Flavio Pompermaier
Ho Ritesh, I have sone experience with Rdf and Flink. What do you mean for accessing a Jena model? How do you create it? >From my experience reading triples from jena models is evil because it has some problems with garbage collection. On 6 Apr 2016 00:51, "Ritesh Kumar Singh" wrote: > Hi, > > I

Re: CEP blog post

2016-04-06 Thread Till Rohrmann
That is a good point Ufuk. Will add the note. On Wed, Apr 6, 2016 at 2:03 PM, Ufuk Celebi wrote: > The website has been updated for 1.0.1. :-) > > @Till: If you don't mention it in the post, it makes sense to have a > note at the end of the post saying that the code examples only work > with 1.0

Re: CEP blog post

2016-04-06 Thread Ufuk Celebi
On Wed, Apr 6, 2016 at 2:18 PM, Matthias J. Sax wrote: > "Getting Started" in main page shows "Download 1.0" instead of 1.0.1 We always had it like that, but I agree that it can be confusing. 1.0 indicates the "series" and the download page shows the exact version. We can certainly change it. –

RE: Running Flink jobs directly from Eclipse

2016-04-06 Thread Serhiy Boychenko
What about YARN(and HDFS) configuration? I put yarn-site.xml directly into classpath? Or I can set the variables in the execution environment? I will give it a try tomorrow morning, will report back and if successful blog about it ofc ☺ From: Christophe Salperwyck [mailto:christophe.salperw...@

Re: RemoteTransportException when trying to redis in flink code

2016-04-06 Thread Till Rohrmann
Great to hear that you solved your problem :-) On Wed, Apr 6, 2016 at 2:29 PM, Balaji Rajagopalan < balaji.rajagopa...@olacabs.com> wrote: > Till, > Found the issue, it was my bad assumption about GlobalConfiguration, > what I thought was once the configuration is read from the client machine >

Re: Accessing RDF triples using Flink

2016-04-06 Thread Ritesh Kumar Singh
Hi Flavio, 1. How do you access your rdf dataset via flink? Are you reading it as a normal input file and splitting the records or you have some wrappers in place to convert the rdf data into triples? Can you please share some code samples if possible? 2. I am using Jena TDB command

Re: Running Flink jobs directly from Eclipse

2016-04-06 Thread Christophe Salperwyck
I exported it in an environment variable before starting Flink: flink-0.10.1/bin/yarn-session.sh -n 64 -s 4 -jm 4096 -tm 4096 2016-04-06 15:36 GMT+02:00 Serhiy Boychenko : > What about YARN(and HDFS) configuration? I put yarn-site.xml directly into > classpath? Or I can set the variables in the e

Re: State in external db (dynamodb)

2016-04-06 Thread Stephan Ewen
Hi Shannon! Welcome to the Flink community! You are right, sinks need in general to be idempotent if you want "exactly-once" semantics, because there can be a replay of elements that were already written. However, what you describe later, overwriting of a key with a new value (or the same value

Possible use case: Simulating iterative batch processing by rewinding source

2016-04-06 Thread Raul Kripalani
Hello, I'm getting started with Flink for a use case that could leverage the window processing abilities of Flink that Spark does not offer. Basically I have dumps of timeseries data (10y in ticks) which I need to calculate many metrics in an exploratory manner based on event time. NOTE: I don't

Re: Back Pressure details

2016-04-06 Thread Zach Cox
The new back pressure docs are great, thanks Ufuk! I'm sure those will help others as well. In the Source => A => B => Sink example, if A and B show HIGH back pressure, should Source also show HIGH? In my case it's a Kafka source and Elasticsearch sink. I know currently our Kafka can provide data

Re: Checkpoint state stored in backend, and deleting old checkpoint state

2016-04-06 Thread Zach Cox
Hi Stephan - incremental checkpointing sounds really interesting and useful, I look forward to trying it out. Thanks, Zach On Wed, Apr 6, 2016 at 4:39 AM Stephan Ewen wrote: > Hi Zach! > > I am working on incremental checkpointing, hope to have it in the master > in the next weeks. > > The cur

Re: Back Pressure details

2016-04-06 Thread Ufuk Celebi
Ah sorry, I forgot to mention that in the docs. The way that data is pulled from Kafka is bypassing Flink's task Thread. The topic is consumed in a separate Thread and the task Thread is just waiting. That's why you don't see any back pressure for Kafka sources. I would expect your Kafka source to

Re: Back Pressure details

2016-04-06 Thread Zach Cox
Yeah I don't think that's the case for my setup either :) I wrote a simple Flink job that just consumes from Kafka and sinks events/sec rate to Graphite. That consumes from Kafka several orders of magnitude higher than the job that also sinks to Elasticsearch. As you said, the downstream back pres

Re: Possible use case: Simulating iterative batch processing by rewinding source

2016-04-06 Thread Christophe Salperwyck
Hi, I am interested too. For my part, I was thinking to use HBase as a backend so that my data are stored sorted. Nice to have to generate timeseries in the good order. Cheers, Christophe 2016-04-06 21:22 GMT+02:00 Raul Kripalani : > Hello, > > I'm getting started with Flink for a use case that

Re: Handling large state (incremental snapshot?)

2016-04-06 Thread Hironori Ogibayashi
I tried RocksDB, but the result was almost the same. I used the following code and put 2.6million distinct records into Kafka. After processing all records, the state on the HDFS become about 250MB and time needed for the checkpoint was almost 5sec. Processing throughput was FsStateBackend-> 8000m

Using native libraries in Flink EMR jobs

2016-04-06 Thread Timur Fayruzov
Hello, I'm not sure whether it's a Hadoop or Flink-specific question, but since I ran into this in the context of Flink I'm asking here. I would be glad if anyone can suggest a more appropriate place. I have a native library that I need to use in my Flink batch job that I run on EMR, and I try to

Re: Flink CEP AbstractCEPPatternOperator fail after event detection

2016-04-06 Thread norman sp
Hi Till, I used Flink version 1.0.0 and tried all three TimeCharacteristics. Not I tried the new Flink 1.0.1 that gives me the following error. After detecting an event it processes a few stream tuples but then crashes. I'm not sure how to solve that part of the error message: "This can indicate t