Re: Split one dataset into multiple

2018-11-05 Thread vino yang
Hi madan, I think you need to hash partition your records. Flink supports hash partitioning of data. The operator is keyBy. If the value of your tag field is enumerable, you can also use split/select to achieve your purpose. Thanks, vino. madan 于2018年11月5日周一 下午6:37写道: > Hi, > > I have a custom

Never gets into ProcessWindowFunction.process()

2018-11-05 Thread Vijay Balakrishnan
Hi, Running in IntelliJ IDE on a Mac with 4 vProcessors. Code compiles fine. It never gets into the Window5SecProcessing's process().I am able to get data from the Kinesis Consumer and it is deserialized properly when I debug the code. It gets into the Window5SecProcessing.open() method for initial

Understanding checkpoint behavior

2018-11-05 Thread PranjalChauhan
Hi, I am new Fink user and currently, using Flink 1.2.1 version. I am trying to understand how checkpoints actually work when Window operator is processing events. My pipeline has the following flow where each operator's parallelism is 1. source -> flatmap -> tumbling window -> sink In this pipel

Understanding checkpoint behavior

2018-11-05 Thread PranjalChauhan
Hi, I am new Fink user and currently, using Flink 1.2.1 version. I am trying to understand how checkpoints actually work when Window operator is processing events. My pipeline has the following flow where each operator's parallelism is 1. source -> flatmap -> tumbling window -> sink In this pipel

Re: Flink Kafka to BucketingSink to S3 - S3Exception

2018-11-05 Thread Ravi Bhushan Ratnakar
Hi there, some questions: 1. Is this using Flink 1.6.2 with dependencies (flink-s3-fs-hadoop, flink-statebackend-rocksdb, hadoop-common, hadoop-aws, hadoop-hdfs, hadoop-common) ? If so, could you please share your dependency versioning? [Ravi]- I am using Aws Emr 5.18 which supports Fli

Re: Flink Kafka to BucketingSink to S3 - S3Exception

2018-11-05 Thread Addison Higham
Hi there, This is going to be a bit of a long post, but I think there has been a lot of confusion around S3, so I am going to go over everything I know in hopes that helps. As mentioned by Rafi, The BucketingSink does not work for file systems like S3, as the bucketing sink makes some assumptions

Re: Non deterministic result with Table API SQL

2018-11-05 Thread Fabian Hueske
Thanks Flavio for reporting the error helping to debug it. A job to reproduce the error is very valuable :-) Best, Fabian Am Mo., 5. Nov. 2018 um 14:38 Uhr schrieb Flavio Pompermaier < pomperma...@okkam.it>: > Here it is the JIRA ticket and, attached to if, the Flink (Java) job to > reproduce th

Re: Flink 1.6 job cluster mode, job_id is always 00000000000000000000000000000000

2018-11-05 Thread Hao Sun
Thanks all. On Mon, Nov 5, 2018 at 2:05 AM Ufuk Celebi wrote: > On Sun, Nov 4, 2018 at 10:34 PM Hao Sun wrote: > > Thanks that also works. To avoid same issue with zookeeper, I assume I > have to do the same trick? > > Yes, exactly. The following configuration [1] entry takes care of this: > >

Re: Non deterministic result with Table API SQL

2018-11-05 Thread Flavio Pompermaier
Here it is the JIRA ticket and, attached to if, the Flink (Java) job to reproduce the error: https://issues.apache.org/jira/browse/FLINK-10795 On Wed, Oct 31, 2018 at 4:46 PM Timo Walther wrote: > As far as I know STDDEV_POP is translated into basic aggregate functions > (SUM/AVG/COUNT). But if

Re: Questions about Savepoints

2018-11-05 Thread Yun Tang
Hi Ning You have asked several questions, I'll try to answer some of them: - In my job, it takes over 30 minutes to take a savepoint of over 100GB on 3 TMs. Most time spent after the alignment. I assume it was serialization and uploading to S3. However, when I resume a new job from the save

Split one dataset into multiple

2018-11-05 Thread madan
Hi, I have a custom iterator which gives data of multitple entities. For example iterator gives data of Department, Employee and Address. Record's entity type is identified by a field value. And I need to apply different set of operations on each dataset. Ex., Department data may have aggregations

Re: Flink 1.6 job cluster mode, job_id is always 00000000000000000000000000000000

2018-11-05 Thread Ufuk Celebi
On Sun, Nov 4, 2018 at 10:34 PM Hao Sun wrote: > Thanks that also works. To avoid same issue with zookeeper, I assume I have > to do the same trick? Yes, exactly. The following configuration [1] entry takes care of this: high-availability.cluster-id: application-1 This will result in ZooKeeper

Always trigger calculation of a tumble window in Flink SQL

2018-11-05 Thread yinhua.dai
We have a requirement that always want to trigger a calculation on a timer basis e.g. every 1 minute. *If there are records come in flink during the time window then calculate it with the normal way, i.e. aggregate for each record and getResult() at end of the time window.* *If there are no recor

"org.apache.flink.client.program.ProgramInvocationException: Unknown I/O error while extracting contained jar files

2018-11-05 Thread wangziyu
Hi, I use monitor Restful api ,“/jars/{jars}/run” to test my environment.The exception is happend. I did exactly that: 1.I use “/jars/upload” to upload my jar. 2.I wanted to test my jar. That is all. How can I solve this exception. -- Sent from: http://apache-flink-user-mailing-list-ar