Re: Reading Data from zip/gzip

2018-10-22 Thread Amit Jain
Hi Chris, FileInputFormat automatically takes cares of file decompression for the files with gzip, xz, bz2 and deflate extensions. -- Thanks, Amit Source: https://github.com/apache/flink/blob/7b040b915504e59243c642b1f4a84c956d96d134/flink-core/src/main/java/org/apache/flink/api/common/io/FileInp

Re: Why am I getting AWS access denied error for request type [DeleteObjectRequest] in S3?

2018-10-15 Thread Amit Jain
Hi Harshith, Did you enable delete permission on S3 for running machines? Are you using IAM roles or access key id and secret access key combo? -- Thanks, Amit On Mon, Oct 15, 2018 at 3:15 PM Kumar Bolar, Harshith wrote: > Hi all, > > > > We store Flink checkpoints in Amazon S3. Flink periodic

Re: Flink1.6 Confirm that flink supports the plan function?

2018-10-15 Thread Amit Jain
Hi, 2) You may also want to look into ParameterTool[1] class to parse and read passed properties file [2]. -- Thanks, Amit [1] https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/java/utils/ParameterTool.html [2] https://ci.apache.org/projects/flink/flink-docs-re

Re: Flink streaming-job with redis sink: SocketTimeoutException

2018-10-14 Thread Amit Jain
Hi Marke, Stacktrace suggests it is more of a Redis connection issue rather than something with Flink. Could you share JedisPool configuration of Redis sink? Are you writing into Redis in continuity or some bulk logic? Looks like Redis connections are getting timeout here. -- Thanks, Amit On Mon

Re: DatabaseClient at async example

2018-10-03 Thread Amit Jain
Hi Nicos, DatabaseClient is an example class to describe the asyncio concept. There is no interface/class for this client in Flink codebase. You can use any mariaDB client implementation which supports concurrent request to DB. -- Cheers, Amit On Wed, Oct 3, 2018 at 8:14 PM Nicos Maris wrote:

Re: BucketingSink to S3: Missing class com/amazonaws/AmazonClientException

2018-10-03 Thread Amit Jain
Hi Julio, What's the Flink version for this setup? -- Thanks, Amit On Wed, Oct 3, 2018 at 4:22 PM Andrey Zagrebin wrote: > Hi Julio, > > Looks like some problem with dependencies. > Have you followed the recommended s3 configuration guide [1]? > Is it correct that your job already created chec

Re: Exception while submitting jobs through Yarn

2018-06-18 Thread Amit Jain
Hi Gravit, I think Till is interested to know about classpath details present at the start of JM and TM logs e.g. following logs provide classpath details used by TM in our case. 2018-06-17 19:01:30,656 INFO org.apache.flink.yarn.YarnTaskExecutorRunner -

Re: Implementing a “join” between a DataStream and a “set of rules”

2018-06-05 Thread Amit Jain
Hi Sandybayev, In the current state, Flink does not provide a solution to the mentioned use case. However, there is open FLIP[1] [2] which has been created to address the same. I can see in your current approach, you are not able to update the rule set data. I think you can update rule set data b

Re: Flink 1.5, failed to instantiate S3 FS

2018-06-02 Thread Amit Jain
Hi Hao, Have look over https://issues.apache.org/jira/browse/HADOOP-13811?focusedCommentId=15703276&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15703276 What version of Hadoop are you using? Could you provide classpath used by Flink Job Manager, it is present in

Re: Batch job stuck in Canceled state in Flink 1.5

2018-05-29 Thread Amit Jain
Thanks Till. `taskmanager.network.request-backoff.max` option helped in my case. We tried this on 1.5.0 and jobs are running fine. -- Thanks Amit On Thu 24 May, 2018, 4:58 PM Amit Jain, wrote: > Thanks! Till. I'll give a try on your suggestions and update the thread. > > On Wed

Re: Batch job stuck in Canceled state in Flink 1.5

2018-05-24 Thread Amit Jain
from the BlobServer, Flink will first try to download them from the > `high-availability-storageDir` > > Let me know if this solves your problem. > > Cheers, > Till > > On Tue, May 22, 2018 at 1:29 PM, Amit Jain wrote: >> >> Hi Nico, >> >> Please find the

Re: Strange Behaviour with task manager oom ?

2018-05-21 Thread Amit Jain
Hi, Could you share log of job and impacted task manager? How much memory you have allocated to the Job Manager? -- Thanks, Amit On Mon, May 21, 2018 at 8:46 PM, sohimankotia wrote: > Hi, > > I am running flink batch job . > > My job is running fine if i use 4 task manger and 8 slots = 32 parall

Re: Checkpointing when reading from files?

2018-05-21 Thread Amit Jain
Hi Alex, StreamingExecutionEnvironment#readFile is a helper function to create file reader data streaming source. It uses ContinuousFileReaderOperator and ContinuousFileMonitoringFunction internally. As both file reader operator and monitoring function uses checkpointing so is readFile [1], you c

Re: Message guarantees with S3 Sink

2018-05-21 Thread Amit Jain
0.n4.nabble.com/sink-with-BucketingSink-to-S3-files-override-td18433.html > [2] https://issues.apache.org/jira/browse/FLINK-6306 > > On Thu, May 17, 2018 at 11:57 AM, Amit Jain wrote: >> >> Hi, >> >> We are using Flink to process click stream data from Kafka and pus

Re: Message guarantees with S3 Sink

2018-05-17 Thread Amit Jain
nce semantics"[2] > > Thanks, > Rong > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/filesystem_sink.html > [2] > https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.h

Message guarantees with S3 Sink

2018-05-17 Thread Amit Jain
Hi, We are using Flink to process click stream data from Kafka and pushing the same in 128MB file in S3. What is the message processing guarantees with S3 sink? In my understanding, S3A client buffers the data on memory/disk. In failure scenario on particular node, TM would not trigger Writer#clo

Re: [Flink 1.5.0] BlobServer data for a job is not getting cleaned up at JM

2018-05-16 Thread Amit Jain
Thanks Chesnay for the quick reply. I raised the ticket https://issues.apache.org/jira/browse/FLINK-9381. On Wed, May 16, 2018 at 5:33 PM, Chesnay Schepler wrote: > Please open a JIRA. > > > On 16.05.2018 13:58, Amit Jain wrote: >> >> Hi, >> >> We are running

[Flink 1.5.0] BlobServer data for a job is not getting cleaned up at JM

2018-05-16 Thread Amit Jain
Hi, We are running Flink 1.5.0 rc3 with YARN as cluster manager and found Job Manager is getting killed due to out of disk error. Upon further analysis, we found blob server data for a job is not getting cleaned up. Right now, we wrote directory cleanup script based on directory creation time of

Re: Question about datasource replication

2018-05-04 Thread Amit Jain
Hi Flavio, Which version of Flink are you using? -- Thanks, Amit On Fri, May 4, 2018 at 6:14 PM, Flavio Pompermaier wrote: > Hi all, > I've a Flink batch job that reads a parquet dataset and then applies 2 > flatMap to it (see pseudocode below). > The problem is that this dataset is quite big a

Re: Batch job stuck in Canceled state in Flink 1.5

2018-05-03 Thread Amit Jain
(maybe we should not print it so prominently to not confuse users). There > must be something else happening on the JobManager. Can you share the JM > logs as well? > > Thanks a lot, > Stephan > > > On Wed, May 2, 2018 at 12:21 PM, Amit Jain wrote: >> >> Thanks! Fabian &

Re: Batch job stuck in Canceled state in Flink 1.5

2018-05-02 Thread Amit Jain
ter your commit. > > Do you have a chance to build the current release-1.5 branch and check if > the fix also resolves your problem? > > Otherwise it would be great if you could open a blocker issue for the 1.5 > release to ensure that this is fixed. > > Thanks, > Fabia

Re: Batch job stuck in Canceled state in Flink 1.5

2018-04-29 Thread Amit Jain
Cluster is running on commit 2af481a On Sun, Apr 29, 2018 at 9:59 PM, Amit Jain wrote: > Hi, > > We are running numbers of batch jobs in Flink 1.5 cluster and few of those > are getting stuck at random. These jobs having the following failure after > which operator status changes

Batch job stuck in Canceled state in Flink 1.5

2018-04-29 Thread Amit Jain
Hi, We are running numbers of batch jobs in Flink 1.5 cluster and few of those are getting stuck at random. These jobs having the following failure after which operator status changes to CANCELED and stuck to same. Please find complete TM's log at https://gist.github.com/imamitjain/066d0e0ee2

Re: Testing on Flink 1.5

2018-04-20 Thread Amit Jain
Hi Gary, This setting has resolved the issue. Does it increase timeout for all the RPC or specific components? We had following settings in Flink 1.3.2 and they did the job for us. akka.watch.heartbeat.pause: 600 s akka.client.timeout: 5 min akka.ask.timeout: 120 s -- Thanks, Amit

Re: Testing on Flink 1.5

2018-04-19 Thread Amit Jain
Hi Gary, We found the underlying issue with the following problem. Few of our jobs are stuck with logs [1], these jobs are only able to allocate JM and couldn't get any TM, however, there are ample resource on our cluster. We are running ETL merge job here. In this job, we first find new deltas a

Re: KeyedSream question

2018-04-04 Thread Amit Jain
Hi, KeyBy operation partition the data on given key and make sure same slot will get all future data belonging to same key. In default implementation, it can also map subset of keys in your DataStream to same slot. Assuming you have number of keys equal to number running slot then you may specif

Re: TaskManager deadlock on NetworkBufferPool

2018-04-04 Thread Amit Jain
+user@flink.apache.org On Wed, Apr 4, 2018 at 11:33 AM, Amit Jain wrote: > Hi, > > We are hitting TaskManager deadlock on NetworkBufferPool bug in Flink 1.3.2. > We have set of ETL's merge jobs for a number of tables and stuck with above > issue randomly daily. > > I&#