Re: [ANNOUNCE] Donation Flink CDC into Apache Flink has Completed

2024-03-20 Thread Jark Wu
Congratulations and welcome! Best, Jark On Thu, 21 Mar 2024 at 10:35, Rui Fan <1996fan...@gmail.com> wrote: > Congratulations! > > Best, > Rui > > On Thu, Mar 21, 2024 at 10:25 AM Hang Ruan wrote: > > > Congrattulations! > > > > Best, > > Hang > > > > Lincoln Lee 于2024年3月21日周四 09:54写道: > > > >

Re: [ANNOUNCE] Apache Flink 1.19.0 released

2024-03-18 Thread Jark Wu
Congrats! Thanks Lincoln, Jing, Yun and Martijn driving this release. Thanks all who involved this release! Best, Jark On Mon, 18 Mar 2024 at 16:31, Rui Fan <1996fan...@gmail.com> wrote: > Congratulations, thanks for the great work! > > Best, > Rui > > On Mon, Mar 18, 2024 at 4:26 PM Lincoln Le

Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Jark Wu
Congratulations and thanks release managers and everyone who has contributed! Best, Jark On Fri, 27 Oct 2023 at 12:25, Hang Ruan wrote: > Congratulations! > > Best, > Hang > > Samrat Deb 于2023年10月27日周五 11:50写道: > > > Congratulations on the great release > > > > Bests, > > Samrat > > > > On Fri

Re: [DISCUSS][FLINK-31788][FLINK-33015] Add back Support emitUpdateWithRetract for TableAggregateFunction

2023-09-07 Thread Jark Wu
+1 to fix it first. I also agree to deprecate it if there are few people using it, but this should be another discussion thread within dev+user ML. In the future, we are planning to introduce user-defined-operator based on the TVF functionality which I think can fully subsume the UDTAG, cc @Timo

Re: [ANNOUNCE] Apache Flink has won the 2023 SIGMOD Systems Award

2023-07-03 Thread Jark Wu
Congrats everyone! Best, Jark > 2023年7月3日 22:37,Yuval Itzchakov 写道: > > Congrats team! > > On Mon, Jul 3, 2023, 17:28 Jing Ge via user > wrote: >> Congratulations! >> >> Best regards, >> Jing >> >> >> On Mon, Jul 3, 2023 at 3:21 PM yuxia >

Re: [DISCUSS] Hive dialect shouldn't fall back to Flink's default dialect

2023-06-01 Thread Jark Wu
+1, I think this can make the grammar more clear. Please remember to add a release note once the issue is finished. Best, Jark On Thu, 1 Jun 2023 at 11:28, yuxia wrote: > Hi, Jingsong. It's hard to provide an option regarding to the fact that we > also want to decouple Hive with flink planner.

Re: [DISCUSS] Reverting sink metric name changes made in 1.15

2022-10-13 Thread Jark Wu
gger an discussion and gave a green pass on the change is that >>>>>> metrics are quite “trivial” to be noticed as public APIs. As mentioned by >>>>>> Martijn I couldn’t find a place noting that metrics are public APIs and >>>>>> should be treat

Re: [DISCUSS] Reverting sink metric name changes made in 1.15

2022-10-10 Thread Jark Wu
Thanks for discovering this problem, Qingsheng! I'm also +1 for reverting the breaking changes. IIUC, currently, the behavior of "numXXXOut" metrics of the new and old sink is inconsistent. We have to break one of them to have consistent behavior. Sink V2 is an evolving API which is just introduc

Re: [ANNOUNCE] Welcome to join the Apache Flink community on Slack

2022-06-02 Thread Jark Wu
Thank Xingtong for making this possible! Cheers, Jark Wu On Thu, 2 Jun 2022 at 15:31, Xintong Song wrote: > Hi everyone, > > I'm very happy to announce that the Apache Flink community has created a > dedicated Slack workspace [1]. Welcome to join us on Slack. > > ## J

Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-12 Thread Jark Wu
t; wrote: > > >>>>>>>> >> > >> > > >>>>>>>> >> > >> Thanks for the proposal, Xintong. > > >>>>>>>> >> > >> > > >>>>>>>> >> > >

Re: Fwd: [DISCUSS] Flink's supported APIs and Hive query syntax

2022-03-10 Thread Jark Wu
great if you can create an umbrella ticket for that. >> >> It would be great to get some insights from currently Flink and Hive >> users which versions are being used. >> @Jark I would indeed deprecate the old Hive versions in Flink 1.15 and >> then drop them in Flink

Re: Fwd: [DISCUSS] Flink's supported APIs and Hive query syntax

2022-03-09 Thread Jark Wu
Thanks Martijn for the reply and summary. I totally agree with your plan and thank Yuxia for volunteering the Hive tech debt issue. I think we can create an umbrella issue for this and target version 1.16. We can discuss details and create subtasks there. Regarding dropping old Hive versions, I'm

Re: Re: [DISCUSS] Flink's supported APIs and Hive query syntax

2022-03-08 Thread Jark Wu
Hi Martijn, Thanks for starting this discussion. I think it's great for the community to to reach a consensus on the roadmap of Hive query syntax. I agree that the Hive project is not actively developed nowadays. However, Hive still occupies the majority of the batch market and the Hive ecosystem

Re: [DISCUSS] Creating an external connector repository

2021-10-20 Thread Jark Wu
for > each connector as Arvid suggested. IMO we shouldn't block this effort on > the stability of the APIs. > > Cheers, > > Konstantin > > > > On Wed, Oct 20, 2021 at 8:56 AM Jark Wu wrote: > >> Hi, >> >> I think Thomas raised very good questions an

Re: [DISCUSS] Creating an external connector repository

2021-10-19 Thread Jark Wu
Hi, I think Thomas raised very good questions and would like to know your opinions if we want to move connectors out of flink in this version. (1) is the connector API already stable? > Separate releases would only make sense if the core Flink surface is > fairly stable though. As evident from Ic

Re: [ANNOUNCE] Flink mailing lists archive service has migrated to Apache Archive service

2021-09-06 Thread Jark Wu
Thanks Leonard, I have seen many users complaining that the Flink mailing list doesn't work (they were using Nabble). I think this information would be very helpful. Best, Jark On Mon, 6 Sept 2021 at 16:39, Leonard Xu wrote: > Hi, all > > The mailing list archive service Nabble Archive was bro

Re: [DISCUSS] Better user experience in the WindowAggregate upon Changelog (contains update message)

2021-07-01 Thread Jark Wu
cleaned accurately by watermark. We don't need to expose `table.exec.emit.late-fire.enabled` on docs and can remove it in the next version. Best, Jark On Thu, 1 Jul 2021 at 21:20, Jark Wu wrote: > Thanks Jing for bringing up this topic, > > The emit strategy configs are annotated a

Re: [DISCUSS] Better user experience in the WindowAggregate upon Changelog (contains update message)

2021-07-01 Thread Jark Wu
Thanks Jing for bringing up this topic, The emit strategy configs are annotated as Experiential and not public on documentations. However, I see this is a very useful feature which many users are looking for. I have posted these configs for many questions like "how to handle late events in SQL". T

Re: [Flink SQL] Lookup join hbase problem

2021-06-27 Thread Jark Wu
Yes. Currently, the HBase lookup source only supports lookup on rowkey. If there is more than one join on condition, it may fail. We should support lookup HBase on multiple fields (by Get#setFilter). Feel free to open issues. Best, Jark On Mon, 28 Jun 2021 at 12:48, 纳兰清风 wrote: > Hi, > > Whe

Re: Questions about UPDATE_BEFORE row kind in ElasticSearch sink

2021-06-27 Thread Jark Wu
UPDATE_BEFORE is required in cases such as Aggregation with Filter. For example: SELECT * FROM ( SELECT word, count(*) as cnt FROM T GROUP BY word ) WHERE cnt < 3; There is more discussion in this issue: https://issues.apache.org/jira/browse/FLINK-9528 Best, Jark On Mon, 28 Jun 2021 at 13

Re: Add control mode for flink

2021-06-07 Thread Jark Wu
Thanks Xintong for the summary, I'm big +1 for this feature. Xintong's summary for Table/SQL's needs is correct. The "custom (broadcast) event" feature is important to us and even blocks further awesome features and optimizations in Table/SQL. I also discussed offline with @Yun Gao several times

Re: How to recovery from last count when using CUMULATE window after restart flink-sql job?

2021-05-09 Thread Jark Wu
Hi, When restarting a Flink job, Flink will start the job with an empty state, because this is a new job. This is not a special for CUMULATE window, but for all Flink jobs. If you want to restore a Flink job from a state/savepoint, you have to specify the savepoint path, see [1]. Best, Jark [1]:

Re: Protobuf support with Flink SQL and Kafka Connector

2021-05-05 Thread Jark Wu
Hi Shipeng, Matthias is correct. FLINK-18202 should address this topic. There is already a pull request there which is in good shape. You can also download the PR and build the format jar yourself, and then it should work with Flink 1.12. Best, Jark On Mon, 3 May 2021 at 21:41, Matthias Pohl wr

Re: Nondeterministic results with SQL job when parallelism is > 1

2021-04-16 Thread Jark Wu
1 at 21:40, Dylan Forciea wrote: > Jark, > > > > Thanks for the heads up! I didn’t see this behavior when running in batch > mode with parallelism turned on. Is it safe to do this kind of join in > batch mode right now, or am I just getting lucky? > > > > Dylan >

Re: Nondeterministic results with SQL job when parallelism is > 1

2021-04-16 Thread Jark Wu
HI Dylan, I think this has the same reason as https://issues.apache.org/jira/browse/FLINK-20374. The root cause is that changelogs are shuffled by `attr` at second join, and thus records with the same `id` will be shuffled to different join tasks (also different sink tasks). So the data arrived at

Re: pipeline.auto-watermark-interval vs setAutoWatermarkInterval

2021-03-25 Thread Jark Wu
IIUC, pipeline.auto-watermak-interval = 0 just disable **periodic** watermark emission, it doesn't mean the watermark will never be emitted. In Table API/SQL, it has the same meaning. If watermark interval = 0, we disable periodic watermark emission, and emit watermark once it advances. So I thin

Re: LocalWatermarkAssigner causes predicate pushdown to be skipped

2021-03-07 Thread Jark Wu
Hi Yuval, That's correct you will always get a LogicalWatermarkAssigner if you assigned a watermark. If you implement SupportsWatermarkPushdown, the LogicalWatermarkAssigner will be pushed into TableSource, and then you can push Filter into source if source implement SupportsFilterPushdown. Best,

Re: [DISCUSS] Deprecation and removal of the legacy SQL planner

2021-03-04 Thread Jark Wu
big +1 from my side. Best, Jark On Thu, 4 Mar 2021 at 20:59, Leonard Xu wrote: > +1 for the roadmap. > > Thanks Timo for driving this. > > Best, > Leonard > > > 在 2021年3月4日,20:40,Timo Walther 写道: > > > > Last call for feedback on this topic. > > > > It seems everyone agrees to finally complete

Re: Union fields with time attributes have different types

2021-02-28 Thread Jark Wu
Hi Sebastián, `endts` in your case is a time attribute which is slightly different than a regular TIMESTAMP type. You can manually `cast(endts as timestamp(3)` to make this query work which removes the time attribute meta. SELECT `evt`, `value`, `startts`, cast(endts as timestamp(3) FROM aggs_1m

Re: Best way to handle BIGING to TIMESTAMP conversions

2021-02-28 Thread Jark Wu
Hi Sebastián, You can use `TO_TIMESTAMP(FROM_UNIXTIME(e))` to get a timestamp value. The BIGINT should be in seconds. Please note to declare the computed column in DDL schema and declare a watermark strategy on this computed field to make the field to be a rowtime attribute. Because streaming o

Re: LEAD/LAG functions

2021-02-01 Thread Jark Wu
Yes. RANK/ROW_NUMBER is not allowed with ROW/RANGE over window, i.e. the "ROWS BETWEEN 1 PRECEDING AND CURRENT ROW" clause. Best, Jark On Mon, 1 Feb 2021 at 22:06, Timo Walther wrote: > Hi Patrick, > > I could imagine that LEAD/LAG are translated into RANK/ROW_NUMBER > operations that are not s

Re: Conflicts between the JDBC and postgresql-cdc SQL connectors

2021-01-28 Thread Jark Wu
ctorformat-resources On Thu, 28 Jan 2021 at 17:28, Sebastián Magrí wrote: > Hi Jark! > > Please find the full pom file attached. > > Best Regards, > > On Thu, 28 Jan 2021 at 03:21, Jark Wu wrote: > >> Hi Sebastián, >> >> I think Dawid is right. >> &

Re: Converting non-mini-batch to mini-batch from checkpoint or savepoint

2021-01-27 Thread Jark Wu
Hi Rex, Currently, it is not state compatible, because we will add a new node called MiniBatchAssigner after the source which changes the JobGraph , thus uid is different. Best, Jark On Tue, 26 Jan 2021 at 18:33, Dawid Wysakowicz wrote: > I am pulling in Jark and Godfrey who are more familiar

Re: Conflicts between the JDBC and postgresql-cdc SQL connectors

2021-01-27 Thread Jark Wu
Hi Sebastián, I think Dawid is right. Could you share the pom file? I also tried to package flink-connector-postgres-cdc with ServicesResourceTransformer, and the Factory file contains com.alibaba.ververica.cdc.connectors.postgres.table.PostgreSQLTableFactory Best, Jark On Tue, 26 Jan 2021 a

Re: A few questions about minibatch

2021-01-27 Thread Jark Wu
Hi Rex, Could you share your query here? It would be helpful to identify the root cause if we have the query. 1) watermark The framework automatically adds a node (the MiniBatchAssigner) to generate watermark events as the mini-batch id to broadcast and trigger mini-batch in the pipeline. 2) Min

Re: [DISCUSS] Correct time-related function behavior in Flink SQL

2021-01-20 Thread Jark Wu
Great examples to understand the problem and the proposed changes, @Kurt! Thanks Leonard for investigating this problem. The time-zone problems around time functions and windows have bothered a lot of users. It's time to fix them! The return value changes sound reasonable to me, and keeping the r

Re: Flink SQL - IntervalJoin doesn't support consuming update and delete - trying to deduplicate rows

2021-01-13 Thread Jark Wu
Hi Dan, Sorry for the late reply. I guess you applied a "deduplication with keeping last row" before the interval join? That will produce an updating stream and interval join only supports append-only input. You can try to apply "deduplication with keeping *first* row" before the interval join. T

Re: Statement Sets

2021-01-13 Thread Jark Wu
No. The Kafka reader will be shared, that means Kafka data is only be read once. On Tue, 12 Jan 2021 at 03:04, Aeden Jameson wrote: > When using statement sets, if two select queries use the same table > (e.g. Kafka Topic), does each query get its own copy of data? > > Thank you, > Aeden >

Re: Primary keys go missing after using TableFunction + leftOuterJoinLateral

2020-12-09 Thread Jark Wu
Could you use 4 scalar functions instead of UDTF and map function? For example; select *, hasOrange(fruits), hasBanana(fruits), hasApple(fruits), hasWatermelon(fruits) from T; I think this can preserve the primary key. Best, Jark On Thu, 3 Dec 2020 at 15:28, Rex Fenley wrote: > It appears tha

Re: Error while connecting with MSSQL server

2020-12-07 Thread Jark Wu
Hi, Currently, flink-connector-jdbc doesn't support MS Server dialect. Only MySQL and Postgres are supported. Best, Jark On Tue, 8 Dec 2020 at 01:20, aj wrote: > Hello , > > I am trying to create a table with microsoft sql server using flink sql > > CREATE TABLE sampleSQLSink ( > id INTEG

Re: Partitioned tables in SQL client configuration.

2020-12-03 Thread Jark Wu
Only legacy connectors (`connector.type=kafka` instead of `connector=kafka`) are supported in the YAML at the moment. You can use regular DDL instead. There is a similar discussion in https://issues.apache.org/jira/browse/FLINK-20260 these days. Best, Jark On Thu, 3 Dec 2020 at 00:52, Till Rohrma

Re: Questions on Table to Datastream conversion and onTimer method in KeyedProcessFunction

2020-11-23 Thread Jark Wu
AFAIK, FLINK-10886 is not implemented yet. cc @Becket may know more plans about this feature. Best, Jark On Sat, 21 Nov 2020 at 03:46, wrote: > Hi Timo, > > One more question, the blog also mentioned a jira task to solve this > issue. https://issues.apache.org/jira/browse/FLINK-10886. Will thi

Re: Filesystem as a stream source in Table/SQL API

2020-11-22 Thread Jark Wu
acking > the filesystem streaming data source. > > On Mon, Nov 23, 2020 at 10:33 AM Jark Wu wrote: > >> Hi Kai, >> >> Streaming filesystem source is not supported yet in TableAPI/SQL. >> This is on the roadmap and there are some problems that need to be fixed. >>

Re: Filesystem as a stream source in Table/SQL API

2020-11-22 Thread Jark Wu
Hi Kai, Streaming filesystem source is not supported yet in TableAPI/SQL. This is on the roadmap and there are some problems that need to be fixed. As a workaround, you can use Hive connector to reading files continuously on filesystems [1]. Best, Jark [1]: https://ci.apache.org/projects/flink/f

Re: Force Join Unique Key

2020-11-21 Thread Jark Wu
avoid. >> >> What are the rules for the unique key and unique join key inference? >> Maybe we can reorganize our plan to allow it to infer unique keys more >> correctly. >> >> Thanks >> >> On Wed, Nov 18, 2020 at 9:50 PM Jark Wu wrote: >> >

Re: Filter Null in Array in SQL Connector

2020-11-19 Thread Jark Wu
quot;optional": true, "field": "xmin" } > ], > "optional": false, > "name": "io.debezium.connector.postgresql.Source", > "field": "source" > }, > { "type&qu

Re: Filter Null in Array in SQL Connector

2020-11-19 Thread Jark Wu
Hi Dylan, I think Rex encountered another issue, because he is using Kafka with Debezium format. Hi Rex, If you can share the json data and the exception stack, that would be helpful! Besides, you can try to enable 'debezium-json.ignore-parse-errors' option [1] to skip the dirty data. Best, Ja

Re: Force Join Unique Key

2020-11-18 Thread Jark Wu
oin? > > Thanks for the help so far! > > On Wed, Nov 18, 2020 at 6:30 PM Jark Wu wrote: > >> Actually, if there is no unique key, it's not O(1), because there maybe >> multiple rows are joined by the join key, i.e. iterate all the values in >> the MapState under th

Re: Force Join Unique Key

2020-11-18 Thread Jark Wu
effectively still be an O(1) lookup if the > join key is unique right? > > Also, I've been digging around the code to find where the lookup of rows > for a join key happens and haven't come across anything. Mind pointing me > in the right direction? > > Thanks! >

Re: Kafka SQL table Re-partition via Flink SQL

2020-11-18 Thread Jark Wu
Yes, it works with all the formats supported by the kafka connector. On Thu, 19 Nov 2020 at 10:18, Slim Bouguerra wrote: > Hi Jark > Thanks very much will this work with Avro > > On Tue, Nov 17, 2020 at 07:44 Jark Wu wrote: > >> Hi Slim, >> >> In 1.11, I thi

Re: Force Join Unique Key

2020-11-18 Thread Jark Wu
Views. Best, Jark On Wed, 18 Nov 2020 at 02:30, Rex Fenley wrote: > Ok, what are the performance consequences then of having a join with > NoUniqueKey if the left side's key actually is unique in practice? > > Thanks! > > > On Tue, Nov 17, 2020 at 7:35 AM Jark Wu wrote:

Re: Upsert UDFs

2020-11-18 Thread Jark Wu
he number of >> retracts we're seeing yet preserve the same end results, where our >> Elasticsearch documents stay up to date? >> >> On Sun, Nov 8, 2020 at 6:43 PM Jark Wu wrote: >> >>> Hi Rex, >>> >>> There is a similar question asked rec

Re: Kafka SQL table Re-partition via Flink SQL

2020-11-17 Thread Jark Wu
Hi Slim, In 1.11, I think you have to implement a custom FlinkKafkaPartitioner and set the class name to 'sink.partitioner' option. In 1.12, you can re-partition the data by specifying the key field (Kafka producer will partition data by the message key by default). You can do this by adding some

Re: Force Join Unique Key

2020-11-17 Thread Jark Wu
Hi Rex, Currently, the unique key is inferred by the optimizer. However, the inference is not perfect. There are known issues that the unique key is not derived correctly, e.g. FLINK-20036 (is this opened by you?). If you think you have the same case, please open an issue. Query hint is a nice wa

Re: FlinkSQL kafka->dedup->kafka

2020-11-12 Thread Jark Wu
gt; is this deduplication query the right fit therefore? > - if yes, how should it be written to generate an append-only stream? > - If not, are there other options? (Match Recognize, UDF, ?) > > Thanks a lot for your much appreciated help :). > > Best Regards, > >

Re: Stream aggregation using Flink Table API (Blink plan)

2020-11-12 Thread Jark Wu
ip_strategy" : "HASH", > "side" : "second" > } ] > }, { > "id" : 8, > "type" : "SinkConversionToTuple2", > "pact" : "Operator", > "contents" :

Re: FlinkSQL kafka->dedup->kafka

2020-11-11 Thread Jark Wu
? >3. When will release-1.12 be available? And when would it be >integrated in the Ververica platform? > > Thanks a lot for your help! > > Best Regards, > > Laurent. > > > > On Wed, 11 Nov 2020 at 03:31, Jark Wu wrote: > >> Hi Laurent, >> >&

Re: timestamp parsing in create table statement

2020-11-10 Thread Jark Wu
amp(ts, ...), but got the following error: > > Exception in thread "main" org.apache.flink.table.api.SqlParserException: > SQL parse failed. Encountered "(" at line > > looks like it complains about the second `(` in > create table t (... to_timestamp(...)...)

Re: timestamp parsing in create table statement

2020-11-10 Thread Jark Wu
Hi Fanbin, The example you gave is correct: create table t ( user_id string, action string, ts string, transform_ts_format(ts) as new_ts, watermark for new_ts as new_ts - interval '5' second ) with ( ... ) You can use "TO_TIMESTAMP" built-in function instead of the UDF, e.g. TO_TIMEST

Re: FlinkSQL kafka->dedup->kafka

2020-11-10 Thread Jark Wu
Hi Laurent, This is because the deduplicate node generates an updating stream, however Kafka currently only supports append-only stream. This can be addressed in release-1.12, because we introduce a new connector "upsert-kafka" which supports writing updating streams into Kafka compacted topics.

Re: Stream aggregation using Flink Table API (Blink plan)

2020-11-09 Thread Jark Wu
Hi Felipe, The "Split Distinct Aggregation", i.e. the "table.optimizer.distinct-agg.split.enabled" option, only works for distinct aggregations (e.g. COUNT(DISTINCT ...)). However, the query in your example is using COUNT(driverId). You can update it to COUNT(DISTINCT driverId), and it should sh

Re: Upsert UDFs

2020-11-08 Thread Jark Wu
Hi Rex, There is a similar question asked recently which I think is the same reason [1] called retraction amplification. You can try to turn on the mini-batch optimization to reduce the retraction amplification. Best, Jark [1]: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: I have some interesting result with my test code

2020-11-05 Thread Jark Wu
Thanks though! I appreciate your help > > On Thu, Nov 5, 2020 at 4:55 AM Jark Wu wrote: > >> Hi Kevin, >> >> Could you share the code of how you register the FlinkKafkaConsumer as a >> table? >> >> Regarding your initialization of FlinkKafka

Re: A question about flink sql retreact stream

2020-11-05 Thread Jark Wu
lieve the ChangeLog is > simply Diff(V2, V1). > > Actually, there are a lot of intermediate changes during processing R1. > > Thanks! > > > > Jark Wu 于2020年11月5日周四 上午11:36写道: > >> Thanks Henry for the detailed example, >> >> I will explain why so m

Re: I have some interesting result with my test code

2020-11-04 Thread Jark Wu
Hi Kevin, Could you share the code of how you register the FlinkKafkaConsumer as a table? Regarding your initialization of FlinkKafkaConsumer, I would recommend to setStartFromEarliest() to guarantee it consumes all the records in partitions. Regarding the flush(), it seems it is in the foreach

Re: A question about flink sql retreact stream

2020-11-04 Thread Jark Wu
Thanks Henry for the detailed example, I will explain why so many records at time 5. That is because the retraction mechanism is per-record triggered in Flink SQL, so there is record amplification in your case. At time 5, the LAST_VALUE aggregation for stream a will first emit -(1, 12345, 0) and t

Re: Multi-stream SQL-like processing

2020-11-04 Thread Jark Wu
g of KConnect). SQL will not be it, it will > probably be barely a way to implement the tool. > > I would appreciate your comments, Jark. > Also if anyone would like to add other ideas, feel welcome! > > Best, > Krzysztof > > śr., 4 lis 2020 o 09:37 Jark Wu napisał(a): &g

Re: Multi-stream SQL-like processing

2020-11-04 Thread Jark Wu
Hi Krzysztof, This is a very interesting idea. I think SQL is not a suitable tool for this use case, because SQL is a structured query language where the table schema is fixed and never changes during job running. However, I think it can be a configuration tool project on top of Flink SQL. The

Re: How to understand NOW() in SQL when using Table & SQL API to develop a streaming app?

2020-10-28 Thread Jark Wu
for the clarification. This improvement would be helpful, I >> believe. >> >> Cheers, >> Till >> >> On Tue, Oct 27, 2020 at 1:19 PM Jark Wu wrote: >> >>> Hi Till, >>> >>> The documentation mentions that "this function

Re: How to understand NOW() in SQL when using Table & SQL API to develop a streaming app?

2020-10-27 Thread Jark Wu
On Tue, 27 Oct 2020 at 15:59, Till Rohrmann wrote: > Quick question Jark: Is this difference in behaviour documented? I > couldn't find it in the docs. > > Cheers, > Till > > On Tue, Oct 27, 2020 at 7:30 AM Jark Wu wrote: > >> Hi Longdexin, >> >> In

Re: ValidationException using DataTypeHint in Scalar Function

2020-10-26 Thread Jark Wu
Hi Steve, Thanks for reaching out to the Flink community. I am pulling in Timo who might be able to help you with this question. Best, Jark On Mon, 26 Oct 2020 at 23:10, Steve Whelan wrote: > Hi, > > I have a column of type *RAW('java.util.Map', ?)* that I want to pass to > a scalar function

Re: How to understand NOW() in SQL when using Table & SQL API to develop a streaming app?

2020-10-26 Thread Jark Wu
Hi Longdexin, In traditional batch sql, NOW() is executed and determined before the job is submitted and will not change for every processed record. However, this doesn't make much sense in streaming sql, therefore, NOW() function in Flink is executed for every record. Best, Jark On Fri, 23 Oct

Re: autoCommit for postgres jdbc streaming in Table/SQL API

2020-10-09 Thread Jark Wu
f true. That > would conflict with the 1.11 patch you suggested. Let me know if you think > I should make the default true in the SQL API. > > > > https://github.com/apache/flink/pull/13570 > > > > Regards, > > Dylan > > > > *From: *Jark Wu > *Date: *Th

Re: autoCommit for postgres jdbc streaming in Table/SQL API

2020-10-08 Thread Jark Wu
Hi Dylan, Sorry for the late reply. We've just come back from a long holiday. Thanks for reporting this problem. First, I think this is a bug that `autoCommit` is false by default (JdbcRowDataInputFormat.Builder). We can fix the default to true in 1.11 series, and I think this can solve your prob

Re: [DISCUSS] Move Hive document to "Table & SQL Connectors" from "Table API & SQL"

2020-09-23 Thread Jark Wu
+1 to move it there. On Thu, 24 Sep 2020 at 12:16, Jingsong Li wrote: > Hi devs and users: > > After the 1.11 release, I heard some voices recently: How can't Hive's > documents be found in the "Table & SQL Connectors". > > Actually, Hive's documents are in the "Table API & SQL". Since the "Tabl

Re: Flink Table SQL and Job Names

2020-09-20 Thread Jark Wu
Hi Dan, Supporting customized job names is a nice feature, we have an issue [1] to track this effort. Please feel free to leave your thoughts there. Best, Jark [1]: https://issues.apache.org/jira/browse/FLINK-18545 On Sun, 20 Sep 2020 at 14:06, Dan Hill wrote: > I'm getting names like "insert

Re: Flink SQL - can I have multiple outputs per job?

2020-09-20 Thread Jark Wu
You got it :) On Sun, 20 Sep 2020 at 12:59, Dan Hill wrote: > I figured it out. TableEnvironment.StatementSet. > > Semi-related, query optimizers can mess up the reuse depending on which > tables the join IDs come from. > > > > > > > On Fri, Sep 18, 2020 at 9:40 PM Dan Hill wrote: > >> I have

Re: Debezium Flink EMR

2020-08-27 Thread Jark Wu
DDL? > tableEnv.executeSql(""" > CREATE TABLE ESAddresses ( > id INT, > customer_id INT, > street STRING, > city STRING, > state STRING, > zip STRING, > type STRING, > PRIMARY KEY (id) NOT ENFORCED > ) WITH ( > 'connector' = 'elasticsearc

Re: Async IO with SQL API

2020-08-27 Thread Jark Wu
Hi, Sorry for the late reply. AFAIK, it's impossible to do Async IO on pure Table API / SQL in 1.9 old planner. A doable way is convert the Table into DataStream and apply AsyncFunction on it. Best, Jark On Thu, 20 Aug 2020 at 00:35, Spurthi Chaganti wrote: > Thank you Till for your response.

Re: Debezium Flink EMR

2020-08-27 Thread Jark Wu
se. >> >> About performance, I'm summoning Kurt and @Jark Wu to >> the thread, who will be able to give you a more complete answer and likely >> also some optimization tips for your specific use case. >> >> Marta >> >> On Fri, Aug 21, 2020 at 8:55

Re: [Survey] Demand collection for stream SQL window join

2020-08-27 Thread Jark Wu
Thanks for the survey! I'm also interested on the use cases of DataStream window join. Best, Jark On Thu, 27 Aug 2020 at 14:40, Danny Chan wrote: > Hi, users, here i want to collect some use cases about the window join[1], > which is a supported feature on the data stream. The purpose is to ma

Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Jark Wu
Congratulations Dian! Best, Jark On Thu, 27 Aug 2020 at 19:37, Leonard Xu wrote: > Congrats, Dian! Well deserved. > > Best > Leonard > > > 在 2020年8月27日,19:34,Kurt Young 写道: > > > > Congratulations Dian! > > > > Best, > > Kurt > > > > > > On Thu, Aug 27, 2020 at 7:28 PM Rui Li wrote: > > > >>

Re: [SQL DDL] How to extract timestamps from Kafka message's metadata

2020-08-11 Thread Jark Wu
You can also watch this issue: https://issues.apache.org/jira/browse/FLINK-15869 On Tue, 11 Aug 2020 at 16:08, Dongwon Kim wrote: > Hi Timo, > > Thanks for your input. > We've been considering that as well, but this time I just wanted to solely > use TableEnvironment without DataStream APIs. > >

Re: [DISCUSS] Upgrade HBase connector to 2.2.x

2020-08-07 Thread Jark Wu
I'm +1 to add HBase 2.x However, I have some concerns about moving HBase 1.x to Bahir: 1) As discussed above, there are still lots of people using HBase 1.x. 2) Bahir doesn't have the infrastructure to run the existing HBase E2E tests. 3) We also paid lots of effort to provide an uber connector ja

Re: Unexpected unnamed sink in SQL job

2020-08-06 Thread Jark Wu
If there is a "’Sink: Unnamed" operator using pure SQL, I think we should improve this to give a meaningful operator name. On Tue, 4 Aug 2020 at 21:39, godfrey he wrote: > I think we assign a meaningful name to sink Transformation > like other Transformations in StreamExecLegacySink/BatchExecLe

Re: Re: How to retain the column'name when convert a Table to DataStream

2020-07-31 Thread Jark Wu
Hi, For now, you can explicitly set the RowTypeInfo to retain the field names. This works in master branch: *val t1Stream = t1.toAppendStream[Row](t1.getSchema.toRowType)* // t1 stream schema: Row(a: Integer, b: Integer) println(s"t1 stream schema: ${t1Stream.getType()}") tEnv.reg

Re: improve the performance of flink sql job which lookup 40+ table

2020-07-27 Thread Jark Wu
Hi, Yes, currently, multiple lookup join is not parallel and execute one by one. Async lookup + cache is the suggested way to improve performance. If the lookup tables are not large, you can also implement a ALL cache for the LookupTableSource to cache all the data in the database, and reload peri

Re: [ANNOUNCE] Apache Flink 1.11.1 released

2020-07-22 Thread Jark Wu
Congratulations! Thanks Dian for the great work and to be the release manager! Best, Jark On Wed, 22 Jul 2020 at 15:45, Yangze Guo wrote: > Congrats! > > Thanks Dian Fu for being release manager, and everyone involved! > > Best, > Yangze Guo > > On Wed, Jul 22, 2020 at 3:14 PM Wei Zhong wrote:

Re: Flink SQL - Join Lookup Table

2020-07-21 Thread Jark Wu
Hi Kelly, As a simple workaround, You can remove the watermark definition in `KafkaStream`, in this way, the stream-stream join will not complain "Rowtime attributes" exception. Best, Jark On Wed, 22 Jul 2020 at 03:13, Kelly Smith wrote: > Thanks Leonard and Danny, > > > > This makes a lot of

Re: How to use a nested column for CREATE TABLE PARTITIONED BY

2020-07-21 Thread Jark Wu
Hi Dongwon, I think this is a bug in the Filesystem connector which doesn't exclude the computed columns when building the TableSource. I created an issue [1] to track this problem. Best, Jark [1]: https://issues.apache.org/jira/browse/FLINK-18665 On Tue, 21 Jul 2020 at 17:31, Dongwon Kim wrot

Re: Flink 1.11 test Parquet sink

2020-07-14 Thread Jark Wu
I think this might be a bug in `tableEnv.fromValues`. Could you try to remove the DataType parameter, and let the framework derive the types? final Table inputTable = tableEnv.fromValues( Row.of(1L, "Hello"), // Row.of(2L, "Hello"), // Row.of(3L, ""), // Row.of(4L,

Re: Table API jobs migration to Flink 1.11

2020-07-13 Thread Jark Wu
peWithKryoSerializer(DateTime.class, > JodaDateTimeSerializer.class); > > Best, > Flavio > > On Mon, Jul 13, 2020 at 3:16 PM Jark Wu wrote: > >> Hi Flavio, >> >> tableEnv.registerTableSource is deprecated in order to migrate to use DDL >> and the new conne

Re: Table API jobs migration to Flink 1.11

2020-07-13 Thread Jark Wu
tableSource); > > The javadoc says to use executeSql but this requires some extra steps > (that are not mentioned in the documentation). > Do I have to create a TableFactory, right? How do I register it? Is there > an example somewhere? > > On Mon, Jul 13, 2020 at 2:28 PM Ja

Re: Table API jobs migration to Flink 1.11

2020-07-13 Thread Jark Wu
gal context". > > Thanks a lot for the support, > Flavio > > On Mon, Jul 13, 2020 at 1:41 PM Jark Wu wrote: > >> A typo of "INSERTO"? Try this? >> >> tableEnv.executeSql("INSERT INTO `out` SELECT * FROM MySourceDataset"); >> >&

Re: Table API jobs migration to Flink 1.11

2020-07-13 Thread Jark Wu
A typo of "INSERTO"? Try this? tableEnv.executeSql("INSERT INTO `out` SELECT * FROM MySourceDataset"); Best, Jark On Mon, 13 Jul 2020 at 18:25, Flavio Pompermaier wrote: > Now I'm able to run my code but there's something I don't understand: what > is the difference between the following two?

Re: Flink 1.11 Table API cannot process Avro

2020-07-12 Thread Jark Wu
>From the latest exception message, it seems that the avro factory problem has been resolved. The new exception indicates that you don't have proper Apache Avro dependencies (because flink-avro doesn't bundle Apache Avro), so you have to add Apache Avro into your project dependency, or export HADOO

Re: [ANNOUNCE] Apache Flink 1.11.0 released

2020-07-07 Thread Jark Wu
Congratulations! Thanks Zhijiang and Piotr for the great work as release manager, and thanks everyone who makes the release possible! Best, Jark On Wed, 8 Jul 2020 at 10:12, Paul Lam wrote: > Finally! Thanks for Piotr and Zhijiang being the release managers, and > everyone that contributed to t

Re: Re: Flip-105 can the debezium/canal SQL sink to database directly?

2020-07-01 Thread Jark Wu
I have created an issue [1] and a pull request to fix this. Hope we can catch up with this release. Best, Jark [1]: https://issues.apache.org/jira/browse/FLINK-18461 On Wed, 1 Jul 2020 at 18:16, Jingsong Li wrote: > CC: @Jark Wu and @Timo Walther > > Best, > Jingsong > > O

Re: two phase aggregation

2020-06-22 Thread Jark Wu
reduce pressure of the global operator. Best, Jark On Tue, 23 Jun 2020 at 13:09, Fanbin Bu wrote: > Jark, > thanks for the reply. Do you know whether it's on the roadmap or what's > the plan? > > On Mon, Jun 22, 2020 at 9:36 PM Jark Wu wrote: > >> Hi Fanbin,

Re: two phase aggregation

2020-06-22 Thread Jark Wu
Hi Fanbin, Currently, over window aggregation doesn't support two-phase optimization. Best, Jark On Tue, 23 Jun 2020 at 12:14, Fanbin Bu wrote: > Hi, > > Does over window aggregation support two-phase mode? > > https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/config.html#table-

  1   2   3   >