"unable to establish the security context" with shaded Hadoop S3

2018-10-05 Thread Averell
Hi everyone, I'm trying a built after this PR 6795 for S3 Recoverable writer, to write my stream into parquet file on S3 with Flink running on AWS EMR; and get the error "unable to establish the security context" with full stacktrace below. The shadin

Re: org.apache.flink.runtime.rpc.exceptions.FencingTokenException:

2018-10-05 Thread Till Rohrmann
Hi Samir, I think the problem is that you've specified for the TMs a different cluster id than for the JM: /flick_ns vs. /flink_ns. Cheers, Till On Fri, Oct 5, 2018 at 6:29 PM Samir Tusharbhai Chauhan < samir.tusharbhai.chau...@prudential.com.sg> wrote: > Hi Till, > > > > Attached are the logs.

RE: org.apache.flink.runtime.rpc.exceptions.FencingTokenException:

2018-10-05 Thread Samir Tusharbhai Chauhan
Hi Till, Attached are the logs. My architecture is like this. 3 Zookeeper (Confluent Open Source) 2 Job Managers 2 Task Managers. All running on different Linux VM. My I ask what should be value of high-availability.zookeeper.path.root: /flink as it is running in different server. Also /shar

Re: org.apache.flink.runtime.rpc.exceptions.FencingTokenException:

2018-10-05 Thread Till Rohrmann
Hi Samir, could you share the logs of the two JMs and the log where you saw the FencingTokenException with us? It looks to me as if the TM had an outdated fencing token (an outdated leader session id) with which it contacted the ResourceManager. This can happen and the TM should try to reconnect

Re: Unable to start session cluster using Docker

2018-10-05 Thread Till Rohrmann
Hi Vinay, are you referring to flink-contrib/docker-flink/docker-compose.yml? We recently fixed the command line parsing with Flink 1.5.4 and 1.6.1. Due to this, the removal of the second command line parameter intended to be introduced with 1.5.0 and 1.6.0 (see https://issues.apache.org/jira/brow

flink memory management / temp-io dir question

2018-10-05 Thread anand.gopinath
Hi , I had a question with respect flink memory management / overspill to /tmp. In the docs (https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/config.html#configuring-temporary-io-directories) it says: Although Flink aims to process as much data in main memory as possible, it is n

Re: Streaming to Parquet Files in HDFS

2018-10-05 Thread Averell
Hi Kostas, I tried your PR - trying to write to S3 from Flink running on AWS, and I got the following error. I copied the three jar files flink-hadoop-fs-1.7-SNAPSHOT.jar, flink-s3-fs-base-1.7-SNAPSHOT.jar, flink-s3-fs-hadoop-1.7-SNAPSHOT.jar to lib/ directory. Do I need to make any change to HADO

Re: Kafka Per-Partition Watermarks

2018-10-05 Thread Andrew Kowpak
Yes, my job does do a keyBy. It never occurred to me that keyBy would distributed data from different partitions to different tasks, but, now that you mention it, it actually makes perfect sense. Thanks you for the help. On Thu, Oct 4, 2018 at 5:11 PM Elias Levy wrote: > Does your job perform

Re: Streaming to Parquet Files in HDFS

2018-10-05 Thread Averell
What a great news. Thanks for that, Kostas. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: FlinkUserClassLoader in AggregateFunction

2018-10-05 Thread Chirag Dewan
That worked pretty well. Thank you so much Aljoscha :)  On Thursday, 4 October, 2018, 5:40:17 PM IST, Aljoscha Krettek wrote: Hi, you are right in that you can't get it from the RuntimeContext because AggregateFunction doesn't have access to that. As an alternative, you can use  Thread.c

Re: Streaming to Parquet Files in HDFS

2018-10-05 Thread Kostas Kloudas
Hi Averell, There is no such “out-of-the-box” solution, but there is an open PR for adding S3 support to the StreamingFileSink [1]. Cheers, Kostas [1] https://github.com/apache/flink/pull/6795 > On Oct 5, 2018, at 11:14 AM, Averell wrote: > > Hi K

Re: Streaming to Parquet Files in HDFS

2018-10-05 Thread Averell
Hi Kostas, Thanks for the info. Just one more question regarding writing parquet. I need to write my stream as parquet to S3. As per this ticket https://issues.apache.org/jira/browse/FLINK-9752 , it is now not supported. Is there any ready-to-us

Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

2018-10-05 Thread Till Rohrmann
Thanks Aljoscha for starting this discussion. The described problem brings us indeed a bit into a pickle. Even with option 1) I think it is somewhat API breaking because everyone who used lambdas without types needs to add them now. Consequently, I only see two real options out of the ones you've p

org.apache.flink.runtime.rpc.exceptions.FencingTokenException:

2018-10-05 Thread Samir Tusharbhai Chauhan
Hi, I am having issue in setting up cluster for Flink. I have 2 nodes for Job Manager and 2 nodes for Task Manager. My configuration file looks like this. jobmanager.rpc.port: 6123 jobmanager.heap.size: 2048m taskmanager.heap.size: 2048m taskmanager.numberOfTaskSlots: 64 parallelism

Re: Streaming to Parquet Files in HDFS

2018-10-05 Thread Kostas Kloudas
Hi Averell, You are right that for Bulk Formats like Parquet, we roll on every checkpoint. This is currently a limitation that has to do with the fact that bulk formats gather and rely on metadata that they keep internally and which we cannot checkpoint in Flink,as they do not expose them. Set