Use a round-robin kafka partitioner

2017-10-24 Thread kla
Hey, I would like to use a round-robin kafka partitioner in the apache flink. (the default one) I forked the Kafka's code from the DefaultPartitioner class. public class HashPartitioner extends KafkaPartitioner implements Serializable { private final AtomicInteger counter = new AtomicInte

Re: How to test new sink

2017-10-24 Thread Timo Walther
Yes, if you think you need better public test utilities. Feel free to open an issue for it. Timo Am 10/23/17 um 5:32 PM schrieb Rinat: Timo, thx for your reply. I’m using gradle instead of maven, but I’ll look through the existing similar plugins for it. I don’t think, that sharing of exte

Re: Incompatible types of expression and result type.

2017-10-24 Thread Timo Walther
Hi, I could found the problem in your implementation. The Table API program is correct. However, the DataStream program that you construct in your TableSource has a wrong type. When ever you use a Row type, you need to specify the type either by implementing ResultTypeQueryable or in your can

Re: Questions about checkpoints/savepoints

2017-10-24 Thread Aljoscha Krettek
Hi, That distinction with externalised checkpoints is a bit of a pitfall and I'm hoping that we can actually get rid of that distinction in the next version or the version after that. With that change, all checkpoints would always be externalised, since it's not really any noticeable overhead.

Re: Local combiner on each mapper in Flink

2017-10-24 Thread Kurt Young
I think you can use WindowedStream.aggreate Best, Kurt On Tue, Oct 24, 2017 at 1:45 PM, Le Xu wrote: > Thanks Kurt. Maybe I wasn't clear before, I was wondering if Flink has > implementation of combiner in DataStream (to use after keyBy and windowing). > > Thanks again! > > Le > > On Sun, Oct 2

Re: HBase config settings go missing within Yarn.

2017-10-24 Thread Niels Basjes
I did some more digging. I added extra code to print both the environment variables and the classpath that is used by the HBaseConfiguration to load the resource files. I call this both locally and during startup of the job (i.e. these logs arrive in the jobmanager.log on the cluster) Summary of

Re: HBase config settings go missing within Yarn.

2017-10-24 Thread Niels Basjes
Minor correction: The HBase jar files are on the classpath, just in a different order. On Tue, Oct 24, 2017 at 11:18 AM, Niels Basjes wrote: > I did some more digging. > > I added extra code to print both the environment variables and the > classpath that is used by the HBaseConfiguration to loa

Re: HBase config settings go missing within Yarn.

2017-10-24 Thread Niels Basjes
I changed my cluster config (on all nodes) to include the HBase config dir in the classpath. Now everything works as expected. This may very well be a misconfiguration of my cluster. How ever ... My current assesment: Tools like Pig use the HBase config which has been specified on the LOCAL machin

Could not initialize keyed state backend on restart from checkpoint

2017-10-24 Thread Federico D'Ambrosio
Hello everyone, while trying to restart a flink job from an externalized checkpoint I'm getting the following exception: java.lang.IllegalStateException: Could not initialize keyed state backend. at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStr

Re: Flink REST API async?

2017-10-24 Thread Aljoscha Krettek
Hi, Unfortunately, the FLIP-6 efforts are taking longer than expected and we won't have those changes to the REST API in the 1.4 release (which should happen in about a month). We are planning to very quickly release 1.5 after that, with the changes to the REST API. The only work-around I can

Re: Avoid duplicate messages while restarting a job for an application upgrade

2017-10-24 Thread Aljoscha Krettek
Hi, Sorry for entering the discussion somewhat late but I wrote on the Issue you created, please have a look. Best, Aljoscha > On 20. Oct 2017, at 16:56, Antoine Philippot > wrote: > > Hi Piotrek, > > I come back to you with a Jira ticket that I created and a proposal > the ticket : https:/

Re: Flink flick cancel vs stop

2017-10-24 Thread Piotr Nowojski
I would propose implementations of NewSource to be not blocking/asynchronous. For example something like public abstract Future getCurrent(); Which would allow us to perform some certain actions while there are no data available to process (for example flush output buffers). Something like this

Re: Use a round-robin kafka partitioner

2017-10-24 Thread Chesnay Schepler
Could you expand a bit more on what you want to achieve? (In particular /where/ you want to use this partitioner; as an operation before a sink or within a kafka sink) On 24.10.2017 09:24, kla wrote: Hey, I would like to use a round-robin kafka partitioner in the apache flink. (the default on

Re: Monitoring folder in flink

2017-10-24 Thread Fabian Hueske
Hi, with PROCESS_CONTINUOUSLY the application monitors the directory and processes new arriving files or files that have been modified. In this case the application never terminates because it is waiting for new files to appear. With PROCESS_ONCE, the content of a directory is processed as it was

Re: Monitoring folder in flink

2017-10-24 Thread Sugandha Amatya
Hi I found that flink polls directory based on modified date. In windows when I copy files the modified date remained same. So, PROCESS_CONTINUOUSLY resolved the issue. On Tue, Oct 24, 2017 at 6:09 PM, Fabian Hueske wrote: > Hi, > > with PROCESS_CONTINUOUSLY the application monitors the directo

Re: Monitoring folder in flink

2017-10-24 Thread रविशंकर नायर
Can you please share the full code? Thanks, RAV On Oct 22, 2017 3:37 AM, "Sugandha Amatya" wrote: I have folder where new files arrive at schedule. Why is my flink readfile not reading new files. I have used but *PROCESS_ONCE* and *PROCESS_CONTINUOUSLY*. When I use *PROCESS_CONTINUOUSLY* it rea

Re: Monitoring folder in flink

2017-10-24 Thread kitex
The code I have pasted is all that I have. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

RE: Impersonation support in Flink

2017-10-24 Thread Newport, Billy
Our scenario is to enable a specific Kerberos to impersonate any Kerberos in a specific group, this is enabled the in hdfs configuration. That Kerberos does not need to be root, just a Kerberos allowed to impersonate that users in that group. We want the job to access HDFS as the impersonated K

Reading Yarn Application Name in flink

2017-10-24 Thread Navneeth Krishnan
Hi All, Is there a way to read the yarn application id/ name within flink so that the logs can be sent to an external logging stack like ELK or CloudWatch merged by the application? Thanks, Navneeth

Delta iteration not spilling to disk

2017-10-24 Thread Joshua Griffith
I’m currently using a delta iteration within a batch job and received the following error: java.lang.RuntimeException: Memory ran out. Compaction failed. numPartitions: 32 minPartition: 11 maxPartition: 24 number of overflow segments: 0 bucketSize: 125 Overall memory: 23232512 Partition memory:

Re: Use a round-robin kafka partitioner

2017-10-24 Thread kla
Hi Chesnay, Thanks for your reply. I would like to use the partitioner within the Kafka Sink operation. By default kafka sink is using FixedPartitioner: public FlinkKafkaProducer010(String topicId, KeyedSerializationSchema serializationSchema, Properties producerConfig) {

Re: Questions about checkpoints/savepoints

2017-10-24 Thread vipul singh
Thanks Aljoscha for the explanations. I was able to recover from the last externalized checkpoint, by using flink run -s I am curious, are there any options to save the metadata file name to some other place like dynamo etc at the moment? The reason why I am asking is, for the end launcher code