Re: Question about processing a 3-level List data type in parquet

2020-11-04 Thread Naehee Kim
Hi Jingsong, Thanks for the feedback. Can you let me know the concept and timeline of BulkFormat/ParquetBulkFormat and the difference with ParquetInputFormat? Our use case is for backfill to process parquet files in case of any data issue is found in the normal processing of kafka input. Thus, we

Re: I have some interesting result with my test code

2020-11-04 Thread Jark Wu
Hi Kevin, Could you share the code of how you register the FlinkKafkaConsumer as a table? Regarding your initialization of FlinkKafkaConsumer, I would recommend to setStartFromEarliest() to guarantee it consumes all the records in partitions. Regarding the flush(), it seems it is in the foreach

Re: Dependency injection and flink.

2020-11-04 Thread Dan Diephouse
Just want to chime in here that it would be fantastic to have a way to DI in Flink. Ideally the injected services themselves don't get serialized at all since they're just singletons in our case. E.g. we have an API client that looks up data from our API and caches it for all the functions that nee

Re: A question about flink sql retreact stream

2020-11-04 Thread Jark Wu
Thanks Henry for the detailed example, I will explain why so many records at time 5. That is because the retraction mechanism is per-record triggered in Flink SQL, so there is record amplification in your case. At time 5, the LAST_VALUE aggregation for stream a will first emit -(1, 12345, 0) and t

Re: Multi-stream SQL-like processing

2020-11-04 Thread Jark Wu
Yes. The dynamism might be a problem. Does Kafka Connect support discovering new tables and synchronizing them dynamically? Best, Jark On Thu, 5 Nov 2020 at 04:39, Krzysztof Zarzycki wrote: > Hi Jark, thanks for joining the discussion! > I understand your point of view that SQL environment is p

union stream vs multiple operators

2020-11-04 Thread Alexey Trenikhun
Hello, I have two Kafka topics ("A" and "B") which provide similar structure wise data but with different load pattern, for example hundreds records per second in first topic while 10 records per second in second topic. Events processed using same algorithm and output in common sink, currently

Re: Questions on Table to Datastream conversion and onTimer method in KeyedProcessFunction

2020-11-04 Thread Fuyao Li
Hi Flink Users and Community, For the first part of the question, the 12 hour time difference is caused by a time extraction bug myself. I can get the time translated correctly now. The type cast problem does have some workarounds to solve it.. My major blocker right now is the onTimer part is no

Re: Multi-stream SQL-like processing

2020-11-04 Thread Krzysztof Zarzycki
Hi Jark, thanks for joining the discussion! I understand your point of view that SQL environment is probably not the best for what I was looking to achieve. The idea of a configuration tool sounds almost perfect :) Almost , because: Without the "StatementSet" that you mentioned at the end I would b

Re: Error while retrieving the leader gateway after making Flink config changes

2020-11-04 Thread Claude M
This issue had to do with the update strategy for the Flink deployment. When I changed it to the following, it will work: strategy: type: RollingUpdate rollingUpdate: maxSurge: 0 maxUnavailable: 1 On Tue, Nov 3, 2020 at 1:39 PM Robert Metzger wrote: > Thanks a lot for prov

Re: LEGACY('STRUCTURED_TYPE' to pojo

2020-11-04 Thread Rex Fenley
Thank you for the info! Is there a timetable for when the next version with this change might release? On Wed, Nov 4, 2020 at 2:44 AM Timo Walther wrote: > Hi Rex, > > sorry for the late reply. POJOs will have much better support in the > upcoming Flink versions because they have been fully int

A question about flink sql retreact stream

2020-11-04 Thread Henry Dai
Dear flink developers&users I have a question about flink sql, It gives me a lot of trouble, Thank you very much for some help. Lets's assume we have two data stream, `order` and `order_detail`, they are from mysql binlog. Table `order` schema: id int primary key

A question about flink sql retreact stream

2020-11-04 Thread Henry Dai
Dear flink developers&users I have a question about flink sql, It gives me a lot of trouble, Thank you very much for some help. Lets's assume we have two data stream, `order` and `order_detail`, they are from mysql binlog. Table `order` schema: id int primary key

A question of Flink SQL aggregation

2020-11-04 Thread Henry Dai
Hi, Let's assume we have two stream, order stream& order detail stream, they are from mysql binlog. Table `order` schema: id primary key, order_id and order_status Table `order_detail` schema: id primary key, order_id and quantity one order item have several order_detail items if we have follow

a couple of memory questions

2020-11-04 Thread Colletta, Edward
Using Flink 1.9.2 with FsStateBackend, Session cluster. 1. Does heap state get cleaned up when a job is cancelled? We have jobs that we run on a daily basis. We start each morning and cancel each evening. We noticed that the process size does not seem to shrink. We are looking at the res

Re: Increase in parallelism has very bad impact on performance

2020-11-04 Thread Arvid Heise
Hi Sidney, could you describe how your aggregation works and how your current pipeline looks like? Is the aggregation partially applied before shuffling the data? I'm a bit lost on how aggregation without keyby looks like. A decrease in throughput may also be a result of more overhead and less av

Re: Increase in parallelism has very bad impact on performance

2020-11-04 Thread Sidney Feiner
You're right, this is scale problem (for me that's performance). As for what you were saying about the data skew, that could be it so I tried running the job without using keyBy and I wrote an aggregator that accumulates the events per country and then wrote a FlatMap that takes that map and ret

Re: Missing help about run-application action in Flink CLI client

2020-11-04 Thread Flavio Pompermaier
Here it is: https://issues.apache.org/jira/browse/FLINK-19969 Best, Flavio On Wed, Nov 4, 2020 at 11:08 AM Kostas Kloudas wrote: > Could you also post the ticket here @Flavio Pompermaier and we will > have a look before the upcoming release. > > Thanks, > Kostas > > On Wed, Nov 4, 2020 at 10:5

Re: LEGACY('STRUCTURED_TYPE' to pojo

2020-11-04 Thread Timo Walther
Hi Rex, sorry for the late reply. POJOs will have much better support in the upcoming Flink versions because they have been fully integrated with the new table type system mentioned in FLIP-37 [1] (e.g. support for immutable POJOs and nested DataTypeHints etc). For queries, scalar, and table

Re: Flink kafka - Message Prioritization

2020-11-04 Thread Aljoscha Krettek
I'm afraid there's nothing in Flink that would make this possible right now. Have you thought about if this would be possible by using the vanilla Kafka Consumer APIs? I'm not sure that it's possible to read messages with prioritization using their APIs. Best, Aljoscha On 04.11.20 08:34, Rob

Re: Missing help about run-application action in Flink CLI client

2020-11-04 Thread Kostas Kloudas
Could you also post the ticket here @Flavio Pompermaier and we will have a look before the upcoming release. Thanks, Kostas On Wed, Nov 4, 2020 at 10:58 AM Chesnay Schepler wrote: > > Good find, this is an oversight in the CliFrontendParser; no help is > printed for the run-application action.

Re: Does Flink operators synchronize states?

2020-11-04 Thread Yuta Morisawa
Hi Arvid, Thank you for your detailed answer. I read your answer and finally found that I did not understand well on the difference between micro-batch model and continuous(one-by-one) processing model. I am familiar with micro-batch model but not with continuous one. So, I will search some d

Re: Missing help about run-application action in Flink CLI client

2020-11-04 Thread Chesnay Schepler
Good find, this is an oversight in the CliFrontendParser; no help is printed for the run-application action. Can you file a ticket? On 11/4/2020 10:53 AM, Flavio Pompermaier wrote: Hello everybody, I was looking into currently supported application-modes when submitting a Flink job so I tried

Missing help about run-application action in Flink CLI client

2020-11-04 Thread Flavio Pompermaier
Hello everybody, I was looking into currently supported application-modes when submitting a Flink job so I tried to use the CLI help (I'm using Flink 11.0) but I can't find any help about he action "run-application" at the moment...am I wrong? Is there any JIRA to address this missing documentation

Re: Multi-stream SQL-like processing

2020-11-04 Thread Jark Wu
Hi Krzysztof, This is a very interesting idea. I think SQL is not a suitable tool for this use case, because SQL is a structured query language where the table schema is fixed and never changes during job running. However, I think it can be a configuration tool project on top of Flink SQL. The