Re: [DISCUSS] Shall casting functions return null or throw exceptions for invalid input

2021-11-18 Thread Kurt Young
Sorry I forgot to add user ML. I also would like to gather some users feedback on this thing. Since I didn't get any feedback on this topic before from users. Best, Kurt On Thu, Nov 18, 2021 at 6:33 PM Kurt Young wrote: > (added user ML to this thread) > > HI all, > > I w

Re: State migration for sql job

2021-06-08 Thread Kurt Young
What kind of expectation do you have after you add the "max(a)" aggregation: a. Keep summing a and start to calculate max(a) after you added. In other words, max(a) won't take the history data into account. b. First process all the historical data to get a result of max(a), and then start to compu

Re: How to recovery from last count when using CUMULATE window after restart flink-sql job?

2021-05-07 Thread Kurt Young
Hi, please use user mailing list only to discuss these issues. Best, Kurt On Sat, May 8, 2021 at 1:05 PM 1095193...@qq.com <1095193...@qq.com> wrote: > Hi >I have tried cumalate window function in Flink-1.13 sql to accumulate > data from Kafka. When I restart a cumulate window sql job, las

Re: Flink 1.13 and CSV (batch) writing

2021-04-11 Thread Kurt Young
t us know. Best, Kurt On Sun, Apr 11, 2021 at 9:18 PM Flavio Pompermaier wrote: > Thanks for the suggestions Kurt. Actually I could use Table Api I think, > it's just that most of our Flink code use DataSet Api. > > Il dom 11 apr 2021, 13:44 Kurt Young ha scritto: > >&

Re: Flink 1.13 and CSV (batch) writing

2021-04-11 Thread Kurt Young
really migrate legacy code. > Also reduceGroup would help but less urgent. > I hope that my feedback as Flink user could be useful. > > Best, > Flavio > > On Fri, Apr 9, 2021 at 12:38 PM Kurt Young wrote: > >> Converting from table to DataStream in batch mode is indeed a

Re: Flink 1.13 and CSV (batch) writing

2021-04-09 Thread Kurt Young
ment.create(streamEnv, envSettings); > but in that case I can't use inBatchMode.. > > On Fri, Apr 9, 2021 at 11:44 AM Kurt Young wrote: > >> `format.ignore-first-line` is unfortunately a regression compared to the >> old one. >> I've created a ticket [1] to

Re: Flink 1.13 and CSV (batch) writing

2021-04-09 Thread Kurt Young
quot;format.ignore-first-line" but now I can't find >> another way to skip it. >> I could set csv.ignore-parse-errors to true but then I can't detect other >> parsing errors, otherwise I need to manually transofrm the header into a >> comment adding the # ch

Re: Flink 1.13 and CSV (batch) writing

2021-04-08 Thread Kurt Young
My DDL is: CREATE TABLE csv ( id BIGINT, name STRING ) WITH ( 'connector' = 'filesystem', 'path' = '.', 'format' = 'csv' ); Best, Kurt On Fri, Apr 9, 2021 at 10:00 AM Kurt Young wrote: >

Re: Flink 1.13 and CSV (batch) writing

2021-04-08 Thread Kurt Young
Hi Flavio, We would recommend you to use new table source & sink interfaces, which have different property keys compared to the old ones, e.g. 'connector' v.s. 'connector.type'. You can follow the 1.12 doc [1] to define your csv table, everything should work just fine. *Flink SQL> set table.dml-

Re: [DISCUSS] Feature freeze date for 1.13

2021-04-06 Thread Kurt Young
morrow morning. >>> >>> This is a critical fix as now predicate pushdown won't work for any >>> stream which generates a watermark and wants to push down predicates. >>> >>> On Thu, Apr 1, 2021, 10:56 Kurt Young wrote: >>> >>>> Thanks D

Re: [DISCUSS] Feature freeze date for 1.13

2021-04-01 Thread Kurt Young
reen, it's separate from other > components, and it's a super useful feature for Flink users. > > Best, > > Arvid > > [1] https://github.com/apache/flink/pull/15054 > > On Thu, Apr 1, 2021 at 6:21 AM Kurt Young wrote: > >> Hi Guowei and Dawid, >>

Re: [DISCUSS] Feature freeze date for 1.13

2021-03-31 Thread Kurt Young
Hi Guowei and Dawid, I want to request the permission to merge this feature [1], it's a useful improvement to sql client and won't affect other components too much. We were plan to merge it yesterday but met some tricky multi-process issue which has a very high possibility hanging the tests. It to

Re: Flink + Hive + Compaction + Parquet?

2021-03-15 Thread Kurt Young
Hi Theo, Regarding your first 2 questions, the answer is yes Flink supports streaming write to Hive. And Flink also supports automatically compacting small files during streaming write [1]. (Hive and Filesystem shared the same mechanism to do compaction, we forgot to add a dedicated document for h

Re: [DISCUSS] Deprecation and removal of the legacy SQL planner

2021-02-25 Thread Kurt Young
Hi Timo, First of all I want to thank you for introducing this planner design back in 1.9, this is a great work that allows lots of blink features to be merged to Flink in a reasonably short time. It greatly accelerates the evolution speed of Table & SQL. Everything comes with a cost, as you sa

Re: [DISCUSS] Correct time-related function behavior in Flink SQL

2021-01-20 Thread Kurt Young
cc this to user & user-zh mailing list because this will affect lots of users, and also quite a lot of users were asking questions around this topic. Let me try to understand this from user's perspective. Your proposal will affect five functions, which are: - PROCTIME() - NOW() - CURREN

Re: In 1.11.2/flinkSql/batch, tableEnv.getConfig.setNullCheck(false) seems to break group by-s

2020-10-16 Thread Kurt Young
Yes, I think this is a bug, feel free to open a jira and a pull request. Best, Kurt On Fri, Oct 16, 2020 at 4:13 PM Jon Alberdi wrote: > Hello to all, > > on flink-1.11.2 the program written at > https://gist.github.com/yetanotherion/d007fa113d97411226eaea4f20cd4c2d > > creates unexpected sta

Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Kurt Young
Congratulations Dian! Best, Kurt On Thu, Aug 27, 2020 at 7:28 PM Rui Li wrote: > Congratulations Dian! > > On Thu, Aug 27, 2020 at 5:39 PM Yuan Mei wrote: > >> Congrats! >> >> On Thu, Aug 27, 2020 at 5:38 PM Xingbo Huang wrote: >> >>> Congratulations Dian! >>> >>> Best, >>> Xingbo >>> >>> ji

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-16 Thread Kurt Young
Hi Kostas, Thanks for starting this discussion. The first part of this FLIP: "Batch vs Streaming Scheduling" looks reasonable to me. However, there is another dimension I think we should also take into consideration, which is whether checkpointing is enabled. This option is orthogonal (but not fu

Re: [DISCUSS] FLIP-131: Consolidate the user-facing Dataflow SDKs/APIs (and deprecate the DataSet API)

2020-07-30 Thread Kurt Young
+1, looking forward to the follow up FLIPs. Best, Kurt On Thu, Jul 30, 2020 at 6:40 PM Arvid Heise wrote: > +1 of getting rid of the DataSet API. Is DataStream#iterate already > superseding DataSet iterations or would that also need to be accounted for? > > In general, all surviving APIs shoul

Re: Print table content in Flink 1.11

2020-07-15 Thread Kurt Young
Hi Flavio, In 1.11 we have provided an easier way to print table content, after you got the `table` object, all you need to to is calling `table.execute().print();` Best, Kurt On Thu, Jul 16, 2020 at 9:35 AM Leonard Xu wrote: > Hi, Flavio > > > 在 2020年7月16日,00:19,Flavio Pompermaier 写道: > > f

Re: What's the best practice to determine whether a job has finished or not?

2020-05-08 Thread Kurt Young
+dev Best, Kurt On Fri, May 8, 2020 at 3:35 PM Caizhi Weng wrote: > Hi Jeff, > > Thanks for the response. However I'm using executeAsync so that I can run > the job asynchronously and get a JobClient to monitor the job. JobListener > only works for synchronous execute method. Is there other w

Re: table.show() in Flink

2020-05-05 Thread Kurt Young
A more straightforward way after FLIP-84 would be: TableResult result = tEnv.executeSql("select xxx ..."); result.print(); And if you are using 1.10 now, you can use TableUtils#collectToList(table) to collect the result to a list, and then print rows by yourself. Best, Kurt On Tue, May 5, 2020

Re: [DISCUSS] Hierarchies in ConfigOption

2020-04-29 Thread Kurt Young
IIUC FLIP-122 already delegate the responsibility for designing and parsing connector properties to connector developers. So frankly speaking, no matter which style we choose, there is no strong guarantee for either of these. So it's also possible that developers can choose a totally different way

Re: batch range sort support

2020-04-23 Thread Kurt Young
Hi Benchao, you can create a jira issue to track this. Best, Kurt On Thu, Apr 23, 2020 at 2:27 PM Benchao Li wrote: > Hi Jingsong, > > Thanks for your quick response. I've CC'ed Chongchen who understands the > scenario much better. > > > Jingsong Li 于2020年4月23日周四 下午12:34写道: > >> Hi, Benchao,

Re: Blink SQL java.lang.ArrayIndexOutOfBoundsException

2020-04-20 Thread Kurt Young
already fixed in 1.9.2 and 1.10.0. > Could you try it using 1.9.2? > > Best, > Jark > > On Mon, 20 Apr 2020 at 21:00, Kurt Young <[hidden email]> wrote: > >> Can you reproduce this in a local program with mini-cluster? >> >> Best, >> Kurt >&

Re: Joining table with row attribute against an enrichment table

2020-04-20 Thread Kurt Young
or not it is pretty well defined. > Maybe there is some fundamental concept that I dont understand, I don't > have much experience with this to be fair. > > Gyula > > On Mon, Apr 20, 2020 at 4:03 PM Kurt Young wrote: > >> The reason here is Flink doesn't know the

Re: Joining table with row attribute against an enrichment table

2020-04-20 Thread Kurt Young
The reason here is Flink doesn't know the hive table is static. After you create these two tables and trying to join them, Flink will assume both table will be changing with time. Best, Kurt On Mon, Apr 20, 2020 at 9:48 PM Gyula Fóra wrote: > Hi! > > The problem here is that I dont have a temp

Re: Blink SQL java.lang.ArrayIndexOutOfBoundsException

2020-04-20 Thread Kurt Young
Can you reproduce this in a local program with mini-cluster? Best, Kurt On Mon, Apr 20, 2020 at 8:07 PM Zahid Rahman wrote: > You can read this for this type error. > > > https://stackoverflow.com/questions/28189446/i-always-get-this-error-exception-in-thread-main-java-lang-arrayindexoutofbou#

[DISCUSS] Change default planner to blink planner in 1.11

2020-03-31 Thread Kurt Young
Hi Dev and User, Blink planner for Table API & SQL is introduced in Flink 1.9 and already be the default planner for SQL client in Flink 1.10. And since we already decided not introducing any new features to the original Flink planner, it already lacked of so many great features that the community

Re: [External] Re: From Kafka Stream to Flink

2020-03-28 Thread Kurt Young
I think this requirement can be satisfied by temporal table function [1], am I missing anything? [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming/temporal_tables.html#temporal-table-function Best, Kurt On Sat, Mar 28, 2020 at 2:47 PM Maatary Okouya wrote: > Hi al

Re: Re: flink sql-client read hive orc table no result

2020-03-18 Thread Kurt Young
esult from flink sql-client > > Thanks, > Lei > > ------ > wangl...@geekplus.com.cn > > *From:* Kurt Young > *Date:* 2020-03-18 17:41 > *To:* wangl...@geekplus.com.cn; lirui > *CC:* user > *Subject:* Re: flink sql-client read hive orc table

Re: flink sql-client read hive orc table no result

2020-03-18 Thread Kurt Young
My guess is we haven't support hive bucket table yet. cc Rui Li for confirmation. Best, Kurt On Wed, Mar 18, 2020 at 5:19 PM wangl...@geekplus.com.cn < wangl...@geekplus.com.cn> wrote: > Hive table store as orc format: > CREATE TABLE `robot_tr`(`robot_id` int, `robot_time` bigint, > `linear_

Re: Issues with Watermark generation after join

2020-03-16 Thread Kurt Young
Hi, could you share the SQL you written for your original purpose, not the one you attached ProcessFunction for debugging? Best, Kurt On Tue, Mar 17, 2020 at 3:08 AM Dominik Wosiński wrote: > Actually, I just put this process function there for debugging purposes. > My main goal is to join the

Re: Use flink to calculate sum of the inventory under certain conditions

2020-03-11 Thread Kurt Young
> The second reason is this query need to scan the whole table. I think we can do better :-) Not necessarily, you said all the changes will trigger a DDB stream, you can use Flink to consume such stream incrementally. For the 1st problem, I think you can use DataStream API and register a timer on

Re: Use flink to calculate sum of the inventory under certain conditions

2020-03-10 Thread Kurt Young
Hi Jiawai, Sorry I still didn't fully get your question. What's wrong with your proposed SQL? > select vendorId, sum(inventory units) > from dynamodb > where today's time - inbound time > 15 > group by vendorId My guess is that such query would only trigger calculations by new event. So if a ver

Re: Writing retract streams to Kafka

2020-03-06 Thread Kurt Young
@Gyula Fóra I think your query is right, we should produce insert only results if you have event time and watermark defined. I've create https://issues.apache.org/jira/browse/FLINK-16466 to track this issue. Best, Kurt On Fri, Mar 6, 2020 at 12:14 PM Kurt Young wrote: > Actually

Re: Writing retract streams to Kafka

2020-03-05 Thread Kurt Young
ctions that >> happened in the 5 seconds before the query. >> Every query.id belongs to a single query (event_time, itemid) but still >> I have to group :/ >> >> Gyula >> >> On Thu, Mar 5, 2020 at 3:45 PM Kurt Young wrote: >> >>> I think the i

Re: How to override flink-conf parameters for SQL Client

2020-03-05 Thread Kurt Young
ie in the together with the > CustomCommandLine logic) > > Gyula > > On Thu, Mar 5, 2020 at 3:49 PM Kurt Young wrote: > >> IIRC the tricky thing here is not all the config options belong to >> flink-conf.yaml can be adjust dynamically in user's program. >&g

Re: How to use self defined json format when create table from kafka stream?

2020-03-05 Thread Kurt Young
User defined formats also sounds like an interesting extension. Best, Kurt On Thu, Mar 5, 2020 at 3:06 PM Jark Wu wrote: > Hi Lei, > > Currently, Flink SQL doesn't support to register a binlog format (i.e. > just define "order_id" and "order_no", but the json schema has other binlog > fields).

Re: How to override flink-conf parameters for SQL Client

2020-03-05 Thread Kurt Young
IIRC the tricky thing here is not all the config options belong to flink-conf.yaml can be adjust dynamically in user's program. So it will end up like some of the configurations can be overridden but some are not. The experience is not quite good for users. Best, Kurt On Thu, Mar 5, 2020 at 10:1

Re: Writing retract streams to Kafka

2020-03-05 Thread Kurt Young
ssages"). > > As for the data completion, in my above example it is basically an event > time interval join. > With watermarks defined Flink should be able to compute results once in > exactly the same way as for the tumbling window. > > Gyula > > On Thu, Mar 5, 2020

Re: Writing retract streams to Kafka

2020-03-05 Thread Kurt Young
5, 2020 at 10:24 PM Kurt Young wrote: > > I also don't completely understand at this point why I can write the > result of a group, tumble window aggregate to Kafka and not this window > join / aggregate. > > If you are doing a tumble window aggregate with watermark enable

Re: Writing retract streams to Kafka

2020-03-05 Thread Kurt Young
> I also don't completely understand at this point why I can write the result of a group, tumble window aggregate to Kafka and not this window join / aggregate. If you are doing a tumble window aggregate with watermark enabled, Flink will only fire a final result for each window at once, no modifi

[ANNOUNCE] Jingsong Lee becomes a Flink committer

2020-02-20 Thread Kurt Young
Hi everyone, I'm very happy to announce that Jingsong Lee accepted the offer of the Flink PMC to become a committer of the Flink project. Jingsong Lee has been an active community member for more than a year now. He is mainly focus on Flink SQL, played an essential role during blink planner mergi

Re: [ANNOUNCE] Apache Flink 1.10.0 released

2020-02-12 Thread Kurt Young
Congratulations to everyone involved! Great thanks to Yu & Gary for being the release manager! Best, Kurt On Thu, Feb 13, 2020 at 10:06 AM Hequn Cheng wrote: > Great thanks to Yu & Gary for being the release manager! > Also thanks to everyone who made this release possible! > > Best, Hequn > >

[DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

2020-02-04 Thread Kurt Young
Hi all, I'd like to bring up a discussion about removing registration of TableSource and TableSink in TableEnvironment as well as in ConnectTableDescriptor. The affected method would be: TableEnvironment::registerTableSource TableEnvironment::fromTableSource TableEnvironment::registerTableSink Co

Re: Please suggest helpful tools

2020-01-13 Thread Kurt Young
ON l.coListAgentKeyL = ac.ucPKA AND > l.coListAgentKeyL IS NOT NULL" + > > > I tried this but noticed that it didn't work as the data skew (and heavy load > on one task) continued. Could you please let me know if I missed anything? > > > Thanks, > > Eva > &g

Re: Please suggest helpful tools

2020-01-12 Thread Kurt Young
Hi, You can try to filter NULL values with an explicit condition like " is not NULL". Best, Kurt On Sat, Jan 11, 2020 at 4:10 AM Eva Eva wrote: > Thank you both for the suggestions. > I did a bit more analysis using UI and identified at least one > problem that's occurring with the job rn

Re: Flink SQL Count Distinct performance optimization

2020-01-07 Thread Kurt Young
Hi, Could you try to find out what's the bottleneck of your current job? This would leads to different optimizations. Such as whether it's CPU bounded, or you have too big local state thus stuck by too many slow IOs. Best, Kurt On Wed, Jan 8, 2020 at 3:53 PM 贺小令 wrote: > hi sunfulin, > you ca

Re: Controlling the Materialization of JOIN updates

2020-01-05 Thread Kurt Young
ontribute to the Flink codebase. > > Anyway, shout out to Jark for resolving the bug and providing a patch! I > believe this will be a real enabler for CQRS architectures on Flink (we had > subscriptions with regular joins, and this patch enables querying the same > thing with very mi

Re: Duplicate tasks for the same query

2020-01-05 Thread Kurt Young
at's > expected as I need to process latest records, as long as it is sending only > the record(s) that's been updated. > > Thanks, > RKandoji > > On Fri, Jan 3, 2020 at 9:57 PM Kurt Young wrote: > >> Hi RKandoji, >> >> It looks like you have a data

Re: Controlling the Materialization of JOIN updates

2020-01-03 Thread Kurt Young
Hi Benoît, Before discussing all the options you listed, I'd like understand more about your requirements. The part I don't fully understand is, both your fact (Event) and dimension (DimensionAtJoinTimeX) tables are coming from the same table, Event or EventRawInput in your case. So it will resul

Re: Flink group with time-windowed join

2020-01-03 Thread Kurt Young
Looks like a bug to me, could you fire an issue for this? Best, Kurt On Thu, Jan 2, 2020 at 9:06 PM jeremyji <18868129...@163.com> wrote: > Two stream as table1, table2. We know that group with regular join won't > work > so we have to use time-windowed join. So here is my flink sql looks like:

Re: Duplicate tasks for the same query

2020-01-03 Thread Kurt Young
;>> >>>>>> Yes, I'm planning to try out DeDuplication when I'm done upgrading to >>>>>> version 1.9. Hopefully deduplication is done by only one task and reused >>>>>> everywhere else. >>>>>> >>>>&g

Re: Duplicate tasks for the same query

2019-12-30 Thread Kurt Young
BTW, you could also have a more efficient version of deduplicating user table by using the topn feature [1]. Best, Kurt [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sql.html#top-n On Tue, Dec 31, 2019 at 9:24 AM Jingsong Li wrote: > Hi RKandoji, > > In theory, you

Re: Flink SQL + savepoint

2019-12-30 Thread Kurt Young
I created a issue to trace this feature: https://issues.apache.org/jira/browse/FLINK-15440 Best, Kurt On Tue, Dec 31, 2019 at 8:00 AM Fanbin Bu wrote: > Kurt, > > Is there any update on this or roadmap that supports savepoints with Flink > SQL? > > On Sun, Nov 3, 2019 at 1

Re: testing - syncing data timeline

2019-12-25 Thread Kurt Young
window it keeps > history of several days . if I want to put the logic of #2 I will need to > manage it with timers, correct ? > > On Thu, Dec 26, 2019 at 6:28 AM Kurt Young wrote: > >> *This Message originated outside your organization.* >>

Re: testing - syncing data timeline

2019-12-25 Thread Kurt Young
Hi, You can merge the logic of #2 into #4, it will be much simpler. Best, Kurt On Wed, Dec 25, 2019 at 7:36 PM Avi Levi wrote: > Hi , > > I have the following pipeline : > 1. single hour window that counts the number of records > 2. single day window that accepts the aggregated data from #1

Re: Need guidance on a use case

2019-12-19 Thread Kurt Young
Hi Eva, Correct me If i'm wrong. You have an unbounded Task stream and you want to enrich the User info to the task event. Meanwhile, the User table is also changing by the time, so you basically want that when task event comes, join the latest data of User table and emit the results. Even if the

Re: How to convert retract stream to dynamic table?

2019-12-18 Thread Kurt Young
Hi James, If I understand correctly, you can use `TableEnvironment#sqlQuery` to achieve what you want. You can pass the whole sql statement in and get a `Table` back from the method. I believe this is the table you want which is semantically equivalent with the stream you mentioned. For example,

Re: Join a datastream with tables stored in Hive

2019-12-16 Thread Kurt Young
> OK, I believe I know enough to get my hands dirty with the code. I can > share later on what I was able to accomplish. And probably more questions > will show up when I finally start the implementation. > > Thanks > Krzysztof > > pon., 16 gru 2019 o 03:14 Kurt Young napisa

Re: Join a datastream with tables stored in Hive

2019-12-15 Thread Kurt Young
that feasible? > Do I understand correctly, that this option is available only with Blink > engine and also only with use of Flink SQL, no Table API? > > Same question comes up regarding reprocessing: do you think it would be > possible to use the same logic / SQL for reprocessing?

Re: Join a datastream with tables stored in Hive

2019-12-13 Thread Kurt Young
On Fri, Dec 13, 2019 at 4:37 PM Kurt Young wrote: > Hi Krzysztof, > > What you raised also interested us a lot to achieve in Flink. > Unfortunately, there > is no in place solution in Table/SQL API yet, but you have 2 options which > are both > close to this thus need some

Re: Join a datastream with tables stored in Hive

2019-12-13 Thread Kurt Young
Hi Krzysztof, What you raised also interested us a lot to achieve in Flink. Unfortunately, there is no in place solution in Table/SQL API yet, but you have 2 options which are both close to this thus need some modifications. 1. The first one is use temporal table function [1]. It needs you to wri

Re: Joining multiple temporal tables

2019-12-06 Thread Kurt Young
Hi Chris, If you only interest the latest data of the dimension table, maybe you can try the temporal table join: https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sql.html#operations see "Join with Temporal Table" Best, Kurt On Fri, Dec 6, 2019 at 11:13 PM Fabian Hueske wr

Re: Flink SQL + savepoint

2019-11-03 Thread Kurt Young
It's not possible for SQL and Table API jobs playing with savepoints yet, but I think this is a popular requirement and we should definitely discuss the solutions in the following versions. Best, Kurt On Sat, Nov 2, 2019 at 7:24 AM Fanbin Bu wrote: > Kurt, > > What do you recommend for Flink SQ

Re: Add Bucket File System Table Sink

2019-09-16 Thread Kurt Young
3 and see if there is a better solution to > combine these two functions. I am very willing to join this development. > > > > -- 原始邮件 -- > *发件人:* "Kurt Young"; > *发送时间:* 2019年9月17日(星期二) 中午11:19 > *收件人:* "Jun Zhang"<825875..

Re: Add Bucket File System Table Sink

2019-09-16 Thread Kurt Young
Kurt: > thank you very much. > I will take a closer look at the FLIP-63. > > I develop this PR, the underlying is StreamingFileSink, not > BuckingSink, but I gave him a name, called Bucket. > > > On 09/17/2019 10:57,Kurt Young > wrote: > > Hi Ju

Re: Add Bucket File System Table Sink

2019-09-16 Thread Kurt Young
Hi Jun, Thanks for bringing this up, in general I'm +1 on this feature. As you might know, there is another ongoing efforts about such kind of table sink, which covered in newly proposed partition support reworking[1]. In this proposal, we also want to introduce a new file system connector, which

Re: Will there be a Flink 1.9.1 release ?

2019-09-09 Thread Kurt Young
Hi Debasish, I think there is a good chance to have 1.9.1, the only question is when. 1.9.0 released ~2 weeks ago, and I think some users are still under the migration if they want to use 1.9.0. Wait another 1 or 2 weeks and also see whether there are some critical bugs in 1.9.0 sounds reasonable

Re: [ANNOUNCE] Apache Flink 1.9.0 released

2019-08-22 Thread Kurt Young
Great to hear! Thanks Gordon for driving the release, and it's been a great pleasure to work with you as release managers for the last couple of weeks. And thanks everyone who contributed to this version, you're making Flink an even better project! Best, Kurt Yun Tang 于2019年8月23日 周五02:17写道: > G

Re: [ANNOUNCE] Rong Rong becomes a Flink committer

2019-07-11 Thread Kurt Young
Congratulations Rong! Best, Kurt On Thu, Jul 11, 2019 at 10:53 PM Kostas Kloudas wrote: > Congratulations Rong! > > On Thu, Jul 11, 2019 at 4:40 PM Jark Wu wrote: > >> Congratulations Rong Rong! >> Welcome on board! >> >> On Thu, 11 Jul 2019 at 22:25, Fabian Hueske wrote: >> >>> Hi everyone,

Re: [ANNOUNCE] Apache Flink 1.8.1 released

2019-07-02 Thread Kurt Young
Thanks for being the release manager and great job! @Jincheng Best, Kurt On Wed, Jul 3, 2019 at 10:19 AM Tzu-Li (Gordon) Tai wrote: > Thanks for being the release manager @jincheng sun > :) > > On Wed, Jul 3, 2019 at 10:16 AM Dian Fu wrote: > >> Awesome! Thanks a lot for being the release ma

Re: How would I use OneInputStreamOperator to deal with data skew?

2019-04-17 Thread Kurt Young
t; > class implementation. > > Best, > Felipe > > *--* > *-- Felipe Gutierrez* > > *-- skype: felipe.o.gutierrez* > *--* *https://felipeogutierrez.blogspot.com > <https://felipeogutierrez.blogspot.com>* > > > On Wed, Apr 17, 2019 at 4:13 AM Kurt Young w

Re: How would I use OneInputStreamOperator to deal with data skew?

2019-04-17 Thread Kurt Young
I mean no particular reason. Best, Kurt On Wed, Apr 17, 2019 at 7:44 PM Kurt Young wrote: > There is no reason for it, the operator and function doesn't rely on the > logic which AbstractUdfStreamOperator supplied. > > Best, > Kurt > > > On Wed, Apr 17, 201

Re: How would I use OneInputStreamOperator to deal with data skew?

2019-04-16 Thread Kurt Young
/MqttSensorDataSkewedCombinerByKeySkewedDAG.java#L86> > . > > Thanks, > Felipe > > *--* > *-- Felipe Gutierrez* > > *-- skype: felipe.o.gutierrez* > *--* *https://felipeogutierrez.blogspot.com > <https://felipeogutierrez.blogspot.com>* > > > On Tue, Apr 16,

Re: How would I use OneInputStreamOperator to deal with data skew?

2019-04-15 Thread Kurt Young
t; > > > implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> > org.sense.flink.App > > > false > > > > > > > > > > Thanks > > > > *--* > *-- Felipe Gutierrez* > > *-- sky

Re: How would I use OneInputStreamOperator to deal with data skew?

2019-04-15 Thread Kurt Young
gt; <https://felipeogutierrez.blogspot.com>* > > > On Mon, Apr 15, 2019 at 9:49 AM Felipe Gutierrez < > felipe.o.gutier...@gmail.com> wrote: > >> Cool, thanks Kurt! >> *-* >> *- Felipe Gutierrez* >> >> *- skype: felipe.o.gutierrez* >> *- **https://felip

Re: How would I use OneInputStreamOperator to deal with data skew?

2019-04-14 Thread Kurt Young
Hi, You can checkout the bundle operator which used in Blink to perform similar thing you mentioned: https://github.com/apache/flink/blob/blink/flink-libraries/flink-table/src/main/java/org/apache/flink/table/runtime/bundle/BundleOperator.java Best, Kurt On Fri, Apr 12, 2019 at 8:05 PM Felipe G

Re: [VOTE] Release 1.8.0, release candidate #3

2019-03-20 Thread Kurt Young
+1 (non-binding) Checked items: - checked checksums and GPG files - verified that the source archives do not contains any binaries - checked that all POM files point to the same version - build from source successfully Best, Kurt On Wed, Mar 20, 2019 at 2:12 PM jincheng sun wrote: > Hi Aljosc

Re: How to split tuple2 returned by UDAF into two columns in a result table

2019-03-19 Thread Kurt Young
Hi Dongwon, AFAIK, Flink doesn't support the usage like "myscalar(col1, col2, col3) as (col4, col5)". Am I missing something? If you want to split Tuple2 into two different columns, you can use UDTF. Best, Kurt On Wed, Mar 20, 2019 at 9:59 AM Dongwon Kim wrote: > Hi, > > I want to split Tupl

Re: Expressing Flink array aggregation using Table / SQL API

2019-03-18 Thread Kurt Young
ors there. > Or would you typically in such scenarios go the route of either having a > retractable sink / sink that can update partial results by key? > > > > Thanks, > > > > -- Piyush > > > > > > *From: *Kurt Young > *Date: *Tuesday, Marc

Re: What should I take care if I enable object reuse

2019-03-14 Thread Kurt Young
Keep one thing in mind: if you want the element remains legal after the function call ends (maybe map(), flatmap(), depends on what you are using), you should copy the elements. Typical scenarios includes: 1. Save the elements into some collection like array, list, map for later usage, you should c

Re: Expressing Flink array aggregation using Table / SQL API

2019-03-12 Thread Kurt Young
rror like that. Best, Kurt On Tue, Mar 12, 2019 at 9:46 PM Piyush Narang wrote: > Thanks for getting back Kurt. Yeah this might be an option to try out. I > was hoping there would be a way to express this directly in the SQL though > ☹. > > > > -- Piyush > > >

Re: Expressing Flink array aggregation using Table / SQL API

2019-03-11 Thread Kurt Young
Hi Piyush, Could you try to add clientId into your aggregate function, and to track the map of inside your new aggregate function, and assemble what ever result when emit. The SQL will looks like: SELECT userId, some_aggregation(clientId, eventType, `timestamp`, dataField) FROM my_kafka_stream_t

Re: [ANNOUNCE] New Flink PMC member Thomas Weise

2019-02-12 Thread Kurt Young
Congrats Thomas! Best, Kurt On Wed, Feb 13, 2019 at 10:02 AM Shaoxuan Wang wrote: > Congratulations, Thomas! > > On Tue, Feb 12, 2019 at 5:59 PM Fabian Hueske wrote: > >> Hi everyone, >> >> On behalf of the Flink PMC I am happy to announce Thomas Weise as a new >> member of the Apache Flink P

Re: Dataset column statistics

2018-12-17 Thread Kurt Young
Hi, We have implemented ANALYZE TABLE in our internal version of Flink, and we will try to contribute back to the community. Best, Kurt On Thu, Nov 29, 2018 at 9:23 PM Fabian Hueske wrote: > I'd try to tune it in a single query. > If that does not work, go for as few queries as possible, spli

Re: Partitions vs. Subpartitions

2018-10-10 Thread Kurt Young
Hi, Partition is the output of a JobVertex which you can simply thought contains an operator. And in real world, JobVertex will run in parallel, each will output some data, which is conceptually called subpartition. Best, Kurt On Thu, Oct 11, 2018 at 10:27 AM Renjie Liu wrote: > Hi, Chris: >

Re: Local combiner on each mapper in Flink

2017-10-25 Thread Kurt Young
return null; > } > }); > > Because it looks like aggregate would only transfer WindowedStream to a > DataStream. But for a global aggregation phase (a reducer), should I > extract the window again? > > > Thanks! I apologize if that s

Re: Local combiner on each mapper in Flink

2017-10-24 Thread Kurt Young
> Le > > On Sun, Oct 22, 2017 at 8:52 PM, Kurt Young wrote: > >> Hi, >> >> The document you are looking at is pretty old, you can check the newest >> version here: https://ci.apache.org/projects/flink/flink-docs-releas >> e-1.3/dev/batch/dataset_tran

Re: Local combiner on each mapper in Flink

2017-10-22 Thread Kurt Young
Hi, The document you are looking at is pretty old, you can check the newest version here: https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/dataset_transformations.html Regarding to your question, you can use combineGroup Best, Kurt On Mon, Oct 23, 2017 at 5:22 AM, Le Xu wr

Re: the design of spilling to disk

2017-09-19 Thread Kurt Young
Copied from my earlier response to some similar question: "Here is a short description for how it works: there are totally 3 threads working together, one for reading, one for sorting partial data in memory, and the last one is responsible for spilling. Flink will first figure out how many memory

Re: Shuffling between map and keyBy operator

2017-09-06 Thread Kurt Young
Hi Marchant, I'm afraid that the serde cost still exists even if both operators run in same TaskManager. Best, Kurt On Tue, Sep 5, 2017 at 9:26 PM, Marchant, Hayden wrote: > I have a streaming application that has a keyBy operator followed by an > operator working on the keyed values (a custom

Re: How to maintain variable for each map operator

2017-07-13 Thread Kurt Young
aintain my ListStates > > 1. Does each Array List has its own ListState? > 2. I am not clear with the open function on the example given by Flink. I > wonder how I can initialize my arraylists with ListStateDescriptor. > > > > Desheng Zhang > E-mail: gzzhangdesh...@corp.

Re: How to maintain variable for each map operator

2017-07-12 Thread Kurt Young
Hi, I think you can use State to achieve your goal: https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/state.html Best, Kurt On Thu, Jul 13, 2017 at 1:14 PM, ZalaCheung wrote: > Hi all, > > I am stuck with a problem. I have a stream, I want keyby it and then do a > map func

Re: [POLL] Who still uses Java 7 with Flink ?

2017-07-12 Thread Kurt Young
+1 for droppint Java 7, we have been using Java 8 for more than one year in Alibaba and everything work fine. Best, Kurt On Thu, Jul 13, 2017 at 2:53 AM, Bowen Li wrote: > +1 for dropping Java 7 > > On Wed, Jul 12, 2017 at 9:04 AM, Gyula Fóra wrote: > > > +1 for dropping 1.7 from me as well. >

An addition to Netty's memory footprint

2017-06-30 Thread Kurt Young
Hi, Ufuk had write up an excellent document about Netty's memory allocation [1] inside Flink, and i want to add one more note after running some large scale jobs. The only inaccurate thing about [1] is how much memory will LengthFieldBasedFrameDecoder use. From our observations, it will cost at m

Re: Cannot write record to fresh sort buffer. Record too large.

2017-06-15 Thread Kurt Young
I think the only way is adding more managed memory. The large record handler only take effects in reduce side which used by the merge sorter. According to the exception, it is thrown during the combing phase which only uses an in-memory sorter, which doesn't have large record handle mechanism. Be

Re: Cannot write record to fresh sort buffer. Record too large.

2017-06-13 Thread Kurt Young
Hi, Can you paste some code snippet to show how you use the DataSet API? Best, Kurt On Tue, Jun 13, 2017 at 4:29 PM, Sebastian Neef < gehax...@mailbox.tu-berlin.de> wrote: > Hi Kurt, > > thanks for the input. > > What do you mean with "try to disable your combiner"? Any tips on how I > can do t

Re: Cannot write record to fresh sort buffer. Record too large.

2017-06-12 Thread Kurt Young
Hi, I think the reason is your record is too large to do a in-memory combine. You can try to disable your combiner. Best, Kurt On Mon, Jun 12, 2017 at 9:55 PM, Sebastian Neef < gehax...@mailbox.tu-berlin.de> wrote: > Hi, > > when I'm running my Flink job on a small dataset, it successfully > fi

  1   2   >