Re: Move savepoint to another s3 bucket

2022-03-08 Thread Dawid Wysakowicz
Hi Lukas, I am afraid you're hitting this bug: https://issues.apache.org/jira/browse/FLINK-25952 Best, Dawid On 08/03/2022 16:37, Lukáš Drbal wrote: Hello everyone, I'm trying to move savepoint to another s3 account but restore always failed with some weird 404 error. We are using lyft

Problem about adding custom kryo serializer

2022-03-08 Thread guoliubin85
Hi, I have an entity class built by Google Flatbuf, to raise the performance, I have tried written a serializer class. public class TransactionSerializer extends Serializer { @Override public void write(Kryo kryo, Output output, Transaction transaction) { ByteBuffer byteB

Re: Flatmap operator in an Asynchronous call

2022-03-08 Thread Diwakar Jha
Thanks Gen, I will look into customized Source and SpiltEnumerator. On Mon, Mar 7, 2022 at 10:20 PM Gen Luo wrote: > Hi Diwakar, > > An asynchronous flatmap function without the support of the framework can > be problematic. You should not call collector.collect outside the main > thread of the

Re: Left join query not clearing state after migrating from 1.9.0 to 1.14.3

2022-03-08 Thread Prakhar Mathur
Hi Roman, Thanks for the reply, here is the screenshot of the latest failed checkpoint. [image: Screenshot 2022-03-09 at 11.44.46 AM.png] I couldn't find the details for the last successful one as we only store the last 10 checkpoints' details. Also, can you give us an idea of exactly what detai

Re: Controlling group partitioning with DataStream

2022-03-08 Thread Guowei Ma
Hi, Ken If you are talking about the Batch scene, there may be another idea that the engine automatically and evenly distributes the amount of data to be processed by each Stage to each worker node. This also means that, in some cases, the user does not need to manually define a Partitioner. At p

Re: Controlling group partitioning with DataStream

2022-03-08 Thread Ken Krugler
Hi Dario, Just to close the loop on this, I answered my own question on SO. Unfortunately it seems like the recommended solution is to do the same hack I did a while ago, which is to generate (via trial-and-error) a key that gets assigned to the target slot. I was hoping for something a bit mo

replay kinesis events

2022-03-08 Thread Guoqin Zheng
Hi Flink experts, Wondering if there is a built-in way to replay already-processed events in an event queue. For example, if I have a flink app processing event stream from Kinesis. Now if I find a bug in the flink app and make a fix. And I would like to re-process events that are already processe

Re: Question about Flink counters

2022-03-08 Thread Shane Bishop
Hi, My issue has been resolved through discussion with AWS support. It turns out that Kinesis Data Analytics reports to CloudWatch in a way I did not expect. The way to view the accurate values for Flink counters is with Average in CloudWatch metrics. Below is the response from AWS support, fo

Re: PyFlink : submission via rest

2022-03-08 Thread aryan m
Thanks Dian! That worked ! On Sun, Mar 6, 2022 at 10:47 PM Dian Fu wrote: > The dedicated REST API is still not supported. However, you could try to > use PythonDriver just like you said and just submit it like a Java Flink > job. > > Regards, > Dian > > On Sun, Mar 6, 2022 at 3:38 AM aryan m w

Re: [statefun] hadoop dependencies and StatefulFunctionsConfigValidator

2022-03-08 Thread Filip Karnicki
Hi Roman, Igal (@ below) Thank you for your answer. I don't think I'll have access to flink's lib folder given it's a shared Cloudera cluster. The only thing I could think of is to not include com.google.protobuf in the classloader.parent-first-patterns.additional setting, and including protobuf-j

Move savepoint to another s3 bucket

2022-03-08 Thread Lukáš Drbal
Hello everyone, I'm trying to move savepoint to another s3 account but restore always failed with some weird 404 error. We are using lyft k8s operator [1] and flink 1.13.6 (in stacktrace you can see version 1.13.6-396a8d44-szn which is just internal build from flink commit b2ca390d478aa855eb0f202

Re: MapState.entries()

2022-03-08 Thread Alexey Trenikhun
Thank you ! From: Schwalbe Matthias Sent: Monday, March 7, 2022 11:36:22 PM To: Alexey Trenikhun ; Flink User Mail List Subject: RE: MapState.entries() Hi Alexey, To my best knowledge it’s lazy with RocksDBStateBackend, using the Java iterator you could eve

RE: Incremental checkpointing & RocksDB Serialization

2022-03-08 Thread Schwalbe Matthias
Hi Vidya, As to the choice of serializer: * Flink provides two implementations that support state migration, AVRO serializer, and Pojo serializer * Pojo serializer happens to be one of the fastest available serializers (faster than AVRO) * If your record sticks to Pojo coding rules

Re: Flink Checkpoint Timeout

2022-03-08 Thread Mahantesh Patil
I see for every consequential checkpoint timeout fail , number of tasks which completed checkpointing keeps decreasing, why would that happen? Does flink try to process data beyond old checkpoint barrier which failed to complete due to timeout? On Tue, Mar 8, 2022 at 12:48 AM yidan zhao wrote:

Re: [statefun] hadoop dependencies and StatefulFunctionsConfigValidator

2022-03-08 Thread Roman Khachatryan
Hi Filip, Have you tried putting protobuf-java 3.7.1 into the Flink's lib/ folder? Or maybe re-writing the dependencies you mentioned to be loaded as plugins? [1] I don't see any other ways to solve this problem. Probably Chesnay or Seth will suggest a better solution. [1] https://nightlies.apac

Savepoint API challenged with large savepoints

2022-03-08 Thread Schwalbe Matthias
Dear Flink Team, In the last weeks I was faced with a large savepoint (around 40GiB) that contained lots of obsolete data points and overwhelmed our infrastructure (i.e. failed to load/restart). We could not afford to lose the state, hence I spent the time to transcode the savepoint into someth

Re: Left join query not clearing state after migrating from 1.9.0 to 1.14.3

2022-03-08 Thread Roman Khachatryan
Hi Prakhar, Could you please share the statistics about the last successful and failed checkpoints, e.g. from the UI. Ideally, with detailed breakdown for the operators that seems problematic. Regards, Roman On Fri, Mar 4, 2022 at 8:48 AM Prakhar Mathur wrote: > > Hi, > > Can someone kindly hel

Re: [DISCUSS] Flink's supported APIs and Hive query syntax

2022-03-08 Thread Jingsong Li
Thanks all for your discussions. I'll share my opinion here: 1. Hive SQL and Hive-like SQL are the absolute mainstay of current Batch ETL in China. Hive+Spark (HiveSQL-like)+Databricks also occupies a large market worldwide. - Unlike OLAP SQL (such as presto, which is ansi-sql rather than hive s

Re: Flink Checkpoint Timeout

2022-03-08 Thread yidan zhao
If the checkpoint timeout leads to the job's fail, then the job will be recovered and data will be reprocessed from the last completed checkpoint. If the job doesn't fail, then not. Mahantesh Patil 于2022年3月8日周二 14:47写道: > Hello Team, > > What happens after checkpoint timeout? > > Does Flink repr

Re: [DISCUSS] Flink's supported APIs and Hive query syntax

2022-03-08 Thread Zou Dan
Hi Martijn, Thanks for bringing this up. Hive SQL (using in Hive & Spark) plays an important role in batch processing, it has almost become de facto standard in batch processing. In our company, there are hundreds of thousands of spark jobs each day. IMO, if we want to promote Flink batch, Hive s

Re: Re: [DISCUSS] Flink's supported APIs and Hive query syntax

2022-03-08 Thread Jark Wu
Hi Martijn, Thanks for starting this discussion. I think it's great for the community to to reach a consensus on the roadmap of Hive query syntax. I agree that the Hive project is not actively developed nowadays. However, Hive still occupies the majority of the batch market and the Hive ecosystem