S3 checkpointing in AWS in Frankfurt

2016-11-22 Thread Jonathan Share
Hi, I'm interested in hearing if anyone else has experience with using Amazon S3 as a state backend in the Frankfurt region. For political reasons we've been asked to keep all European data in Amazon's Frankfurt region. This causes a problem as the S3 endpoint in Frankfurt requires the use of AWS

Re: Why use Kafka after all?

2016-11-22 Thread Tzu-Li (Gordon) Tai
Hi Matt, Just to be clear, what I'm looking for is a way to serialize a POJO class for Kafka but also for Flink, I'm not sure the interface of both frameworks are compatible but it seems they aren't. For Kafka (producer) I need a Serializer and a Deserializer class, and for Flink (consumer) a 

Re: IterativeStream seems to ignore maxWaitTimeMillis

2016-11-22 Thread Juan Rodríguez Hortalá
Thanks for your answer Aljoscha, The source stops, when I comment all the transformed streams and just print the input, the program completes. But this is custom SourceFunction, could this be related to this? Maybe I should implement emitWatermark? I'm using ingestion time so I assumed this wasn't

Re: Flink application and curator integration issues

2016-11-22 Thread Liu Tongwei
Thanks Stephan,what you said really solved the problem. Previously I build Flink against a vendor specific Hadoop version and my Maven version is 3.3.9. Thanks, Liu -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Fwd-Flink-application-and-c

Re: ContinuousEventTimeTrigger breaks coGrouped windowed streams?

2016-11-22 Thread William Saar
Thanks! One difference is that my topology had 2 sources. I have updated your example to also use 2 sources and that breaks the co-group operation in the example as well! https://gist.github.com/saarw/8f9513435a41ab29b36da77c16a8b0ed Nice to know that purging can be added to the event trigger.

questions about how to implement some functionality in Flink

2016-11-22 Thread Drew Verlee
Greetings Flink Support! I'm reaching to you about a question about how we might achieve some specific functionality using Flink . As i'm not sure how this type of exchange normally works, i have outlined everything in a document (see attachment). Let me know if you need anything, Drew Verlee S

Re: Flink streaming with 1+ TB of managed state

2016-11-22 Thread Gyula Fóra
Hi Steven, Let me go try to address your questions :) 1. We take checkpoints approximately every hour for these large states to remove some strain from our networks. Obviously with incremental checkpoints we would go down to every couple of minutes. 2. We don't have anything additional and you a

Re: Creating Job Savepoints at Regular Intervals

2016-11-22 Thread Gyula Fóra
Hi, I personally use cron + grep the output for the savepoint path and write it to a file to keep track of the latest savepoints for each job. I can then use the last line of this file to restore from the latest savepoint if necessary. Cheers, Gyula Scott Kidder ezt írta (időpont: 2016. nov. 2

Re: Flink application and curator integration issues

2016-11-22 Thread Stephan Ewen
Hi Liu! Did you build Flink from the source yourself? Maybe you run into the Maven shading problem described here: https://github.com/apache/flink/blob/master/docs/setup/building.md#dependency-shading Best, Stephan On Tue, Nov 22, 2016 at 8:03 PM, Fabian Hueske wrote: > This looks rather lik

Re: Flink application and curator integration issues

2016-11-22 Thread Fabian Hueske
This looks rather like a version conflict. If Curator wasn't on the classpath it should be a ClassNotFoundException. Can you check if any of your jobs dependencies depends on a different Curator version? Best, Fabian 2016-11-22 12:06 GMT+01:00 Maximilian Michels : > As far as I know we're shadi

Creating Job Savepoints at Regular Intervals

2016-11-22 Thread Scott Kidder
I'd like to create job savepoints at regular intervals to be used in the event of a total job failure where it's not possible to restore from a checkpoint. I'm aware that automatic savepoints are planned as part of FLIP-10, but I need something more immediate (using Flink 1.1.3). I'm curious how o

Reading files from an S3 folder

2016-11-22 Thread Alex Reid
Hi, I've been playing around with using apache flink to process some data, and I'm starting out using the batch DataSet API. To start, I read in some data from files in an S3 folder: DataSet records = env.readTextFile("s3://my-s3-bucket/some-folder/"); Within the folder, there are 20 gzipped fi

Re: flink-dist shading

2016-11-22 Thread Foster, Craig
Thanks Max. Yep, I just confirmed it works. On 11/22/16, 2:09 AM, "Maximilian Michels" wrote: Hi Craig, I've left a comment on the original Maven JIRA issue to revive the discussion. For BigTop, you can handle this in the build script by building flink-dist again after a su

Re: add FLINK_LIB_DIR to classpath on yarn -> add properties file to class path on yarn

2016-11-22 Thread vinay patil
OK, got that, thank you for your solution, it worked for me with --yarnship option Regards, Vinay Patil On Tue, Nov 22, 2016 at 4:44 PM, Maximilian Michels [via Apache Flink User Mailing List archive.] wrote: > Hi Vinay, > > I was referring to setting up an deploying a YARN cluster directly > t

Re: Can not stop cluster gracefully

2016-11-22 Thread Andrew Ge Wu
Thanks, I’ll give that a try. > On 22 Nov 2016, at 12:18, Maximilian Michels wrote: > > The stop script relies on a file in the /tmp directory (location can > be changed by setting env.pid.dir in the Flink config). If that file > somehow gets cleanup up occasionally, the stop script can't find th

Re: add FLINK_LIB_DIR to classpath on yarn -> add properties file to class path on yarn

2016-11-22 Thread Maximilian Michels
Hi Vinay, I was referring to setting up an deploying a YARN cluster directly through a Java program, instead of using the command-line interface. When you do that, you typically construct a YarnClusterDescriptor and parameterize it, then you call deploy() to deploy the cluster and create a YarnClu

Re: FLINK-2821 - Flink on Kubernetes

2016-11-22 Thread Maximilian Michels
Hi Aparup, Could you go into a bit more detail on what you're trying to do and what kind of errors you're facing? Thanks, Max -Max On Fri, Nov 18, 2016 at 1:29 AM, Aparup Banerjee (apbanerj) wrote: > Hi Max, > > > > I am running into an issue on running flink on Kubernetes – basically during

Re: Can not stop cluster gracefully

2016-11-22 Thread Maximilian Michels
The stop script relies on a file in the /tmp directory (location can be changed by setting env.pid.dir in the Flink config). If that file somehow gets cleanup up occasionally, the stop script can't find the process identifiers inside that file to kill the processes. Another explanation could be th

Re: Flink application and curator integration issues

2016-11-22 Thread Maximilian Michels
As far as I know we're shading Curator so you shouldn't run into class conflicts. Have you checked that Curator is included in your jar? -Max On Tue, Nov 22, 2016 at 9:30 AM, Liu Tongwei wrote: > Hi all, > > I'm using flink 1.1.3. I need to use the curator inside the application to > operate zo

Re: PathIsNotEmptyDirectoryException in Namenode HDFS log when using Jobmanager HA in YARN

2016-11-22 Thread Maximilian Michels
This could be related to https://issues.apache.org/jira/browse/FLINK-5063 where some issues related to the cleanup of checkpointing files were fixed. -Max On Mon, Nov 21, 2016 at 10:05 PM, static-max wrote: > Update: I deleted the /flink/recovery folder on HDFS and even then I get the > same Ex

Re: ContinuousEventTimeTrigger breaks coGrouped windowed streams?

2016-11-22 Thread Maximilian Michels
Hi William, I've reproduced your example locally for some toy data and everything was working as expected (with the early triggering). So I'm assuming either there is something wrong with your input data or the behavior doesn't always manifest. Here's the example I run in case you want to try: ht

Re: ContinuousEventTimeTrigger breaks coGrouped windowed streams?

2016-11-22 Thread Gyula Fóra
Hello, I answered on the Flink ml, but we can always have a quick skype chat if you want to discuss some details, that's probably easier :) Gyula William Saar ezt írta (időpont: 2016. nov. 19., Szo, 18:28): > Hi! > > My topology below seems to work when I comment out all the lines with > Conti

Re: ContinuousEventTimeTrigger breaks coGrouped windowed streams?

2016-11-22 Thread Gyula Fóra
Hi, The sliding windows don't have to slide by one event at a time, in essence they are "jumping" windows. It is pretty much like saying I am interested in the computation over the last 2 days, computed every 2 hours or so. This also means that we can start preaggregating for every slide so we don

Can not stop cluster gracefully

2016-11-22 Thread Andrew Ge Wu
Hi all, You may hit this problem before, from time to time when i ran the stop-cluster script, I get this > No taskmanager daemon to stop on host app25 > No taskmanager daemon to stop on host app26 > No taskmanager daemon to stop on host app27 > No taskmanager daemon to stop on host app83 > No

Re: flink-dist shading

2016-11-22 Thread Maximilian Michels
Hi Craig, I've left a comment on the original Maven JIRA issue to revive the discussion. For BigTop, you can handle this in the build script by building flink-dist again after a successful build. That will always work independently of the Maven 3.x version. -Max On Mon, Nov 21, 2016 at 6:27 PM,

Flink Streaming Data Source Node

2016-11-22 Thread Adrienne Kole
Hi, I recognized that, if the number of data input sources are less than or equal to number of slots in one node, they (input source operators) are all deployed in the same node. What is the logic behind this? Can't this be a bottleneck for throughput and distribution of input sources? Thanks Ad

Re: Cassandra Connector

2016-11-22 Thread Stephan Epping
Hey Chesnay, that looks good. I like to use the same mechanism for all my sinks. Thus, this > readings.addSink(new CassandraTupleSink(, ); will be my desired way. best, Stephan > On 22 Nov 2016, at 09:33, Chesnay Schepler wrote: > > Actually this is a bit inaccurate. _Some_ implementations

Re: Cassandra Connector

2016-11-22 Thread Chesnay Schepler
Actually this is a bit inaccurate. _Some_ implementations are not implemented as a sink. Also, you can in fact instantiate the sinks yourself as well, as in readings.addSink(new CassandraTupleSink(, ); On 22.11.2016 09:30, Chesnay Schepler wrote: Hello, the CassandraSink is not implement

Re: Cassandra Connector

2016-11-22 Thread Chesnay Schepler
Hello, the CassandraSink is not implemented as a sink but as a special operator, so you wouldn't be able to use the addSink() method. (I can't remember the actual method being used.) There are also several different implementations for various types (tuples, pojo's, scala case classes) but we