Re: [DISCUSS] FLIP-188: Introduce Built-in Dynamic Table Storage
Hi Ingo, Really appreciate your feedback. #1. The reason why we insist on using no "connector" option is that we want to bring the following design to users: - With the "connector" option, it is a mapping, unmanaged table. - Without the "connector" option, it is a managed table. It may be an Iceberg managed table, or may be a JDBC managed table, or may be a Flink managed table. #2. About: CREATE TABLE T (f0 INT); ALTER TABLE T SET ('connector' = '…'); I think it is dangerous, even for a generic table. The managed table should prohibit it. #3. DDL and Table API You are right, Table Api should be a superset of SQL. There is no doubt that it should support BDT. Best, Jingsong On Mon, Oct 25, 2021 at 2:18 PM Ingo Bürk wrote: > > Hi Jingsong, > > thanks again for the answers. I think requiring catalogs to implement an > interface to support BDTs is something we'll need (though personally I > still prefer explicit DDL here over the "no connector option" approach). > > What about more edge cases like > > CREATE TABLE T (f0 INT); > ALTER TABLE T SET ('connector' = '…'); > > This would have to first create the physical storage and then delete it > again, right? > > On a separate note, he FLIP currently only discusses SQL DDL, and you have > also mentioned > > > BDT only can be dropped by Flink SQL DDL now. > > Something Flink suffers from a lot is inconsistencies across APIs. I think > it is important that we support features on all major APIs, i.e. including > Table API. > For example for creating a BDT this would mean e.g. adding something like > #forManaged(…) to TableDescriptor. > > > Best > Ingo > > On Mon, Oct 25, 2021 at 5:27 AM Jingsong Li wrote: > > > Hi Ingo, > > > > I thought again. > > > > I'll try to sort out the current catalog behaviors. > > Actually, we can divide catalogs into three categories: > > > > 1. ExternalCatalog: it can only read or create a single table kind > > which connects to external storage. TableFactory is provided by > > Catalog, which can have nothing to do with Flink's Factory discovery > > mechanism, such as IcebergCatalog, JdbcCatalog, PostgresCatalog, etc. > > Catalog manages the life cycle of its **managed** tables, which means > > that creation and drop will affect the real physical storage. The DDL > > has no "connector" option. > > > > 2. GenericCatalog (or FlinkCatalog): only Flink tables are saved and > > factories are created through Flink's factory discovery mechanism. At > > this time, the catalog is actually only a storage medium for saving > > schema and options, such as GenericInMemoryCatalog. Catalog only saves > > meta information and does not manage the underlying physical storage > > of tables. These tables are **unmanaged**. The DDL must have a > > "connector" option. > > > > 3. HybridCatalog: It can save both its own **managed** table and > > generic Flink **unmanaged** table, such as HiveCatalog. > > > > We want to use the "connector" option to distinguish whether it is > > managed or not. > > > > Now, consider the Flink managed table in this FLIP. > > a. ExternalCatalog can not support Flink managed tables. > > b. GenericCatalog can support Flink managed tables without the > > "connector" option. > > c. What about HybridCatalog (HiveCatalog)? Yes, we want HiveCatalog to > > support Flink managed tables: > > - with "connector" option in Flink dialect is unmanaged tables > > - Hive DDL in Hive dialect is Hive managed tables, the parser will add > > "connector = hive" automatically. At present, there are many > > differences between Flink DDL and Hive DDL, and even their features > > have many differences. > > - without "connector" option in Flink dialect is Flink managed tables. > > > > In this way, we can support Flink managed tables while maintaining > > compatibility. > > > > Anyway, we need introduce a "SupportsFlinkManagedTable" to catalog. > > > > ## Back to your question # > > > > > but we should make it clear that this is a limitation and probably > > document how users can clean up the underlying physical storage manually in > > this case > > > > Yes, it's strange that the catalog should manage tables, but some > > catalogs don't have this ability. > > - For PersistentCatalog, the meta will continue until the underlying > > physical storage is deleted. > > - For InMemoryCatalog, yes, we should document it for the underlying > > physical storage of Flink managed tables. > > > > > the HiveCatalog doesn't list a 'connector' option for its tables. > > > > Actually, It can be divided into two steps: create and save: > > - When creating a table, the table seen by HiveCatalog must have > > "connector = hive", which is the hive table (Hive managed table). You > > can see the "HiveCatalog.isHiveTable". > > - When saving the table, it will remove the connector of the hive > > table. We can do this: with "connector" option is Flink generic table, > > without "connector" option is Hive table, with "flink-managed = true" > >
[jira] [Created] (FLINK-24628) Invalid JDBC query template when no fields are selected
Paul Lin created FLINK-24628: Summary: Invalid JDBC query template when no fields are selected Key: FLINK-24628 URL: https://issues.apache.org/jira/browse/FLINK-24628 Project: Flink Issue Type: Bug Components: Connectors / JDBC Affects Versions: 1.12.3 Reporter: Paul Lin A query like `select uuid() from mysql_table` will result in an invalid query template like `select from mysql_table` in JdbcDynamicTableSource. We should avoid making a TableScan when there're no relevant fields are actually used. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Should we drop Row SerializationSchema/DeserializationSchema?
Last remainder: unless there are any objections, I will proceed with deprecating these by the end of the week. On Thu, Oct 21, 2021 at 4:28 PM Konstantin Knauf wrote: > +1 for deprecating and then dropping them. > > On Thu, Oct 21, 2021 at 3:31 PM Timo Walther wrote: > > > Hi Francesco, > > > > thanks for starting this discussion. It is definitely time to clean up > > more connectors and formats that were used for the old planner but are > > actually not intended for the DataStream API. > > > > +1 for deprecating and dropping the mentioned formats. Users can either > > use Table API or implement a custom > > SerializationSchema/DeserializationSchema according to their needs. It > > is actually not that complicated to add Jackson and configure the > > ObjectMapper for reading JSON/CSV. > > > > Regards, > > Timo > > > > > > On 18.10.21 17:42, Francesco Guardiani wrote: > > > Hi all, > > > In flink-avro, flink-csv and flink-json we have implementations of > > > SerializationSchema/DeserializationSchema for the > > org.apache.flink.types.Row > > > type. In particular, I'm referring to: > > > > > > - org.apache.flink.formats.json.JsonRowSerializationSchema > > > - org.apache.flink.formats.json.JsonRowDeserializationSchema > > > - org.apache.flink.formats.avro.AvroRowSerializationSchema > > > - org.apache.flink.formats.avro.AvroRowDeserializationSchema > > > - org.apache.flink.formats.csv.CsvRowDeserializationSchema > > > - org.apache.flink.formats.csv.CsvRowSerializationSchema > > > > > > These classes were used in the old table planner, but now the table > > planner > > > doesn't use the Row type internally anymore, so these classes are > unused > > > from the flink-table packages. > > > > > > Because these classes are exposed (some have @PublicEvolving > annotation) > > > there might be some users out there using them when using the > DataStream > > > APIs, for example to convert an input stream of JSON from Kafka to a > Row > > > instance. > > > > > > Do you have any opinions about deprecating these classes in 1.15 and > then > > > drop them in 1.16? Or are you using them? If yes, can you describe your > > use > > > case? > > > > > > Thank you, > > > > > > FG > > > > > > > > > -- > > Konstantin Knauf > > https://twitter.com/snntrable > > https://github.com/knaufk >
[jira] [Created] (FLINK-24629) Sorting start-time/duration/end-time not working under pages of vertex taskManagers and sub-tasks
Junhan Yang created FLINK-24629: --- Summary: Sorting start-time/duration/end-time not working under pages of vertex taskManagers and sub-tasks Key: FLINK-24629 URL: https://issues.apache.org/jira/browse/FLINK-24629 Project: Flink Issue Type: Bug Reporter: Junhan Yang Attachments: image-2021-10-25-16-23-55-894.png Based on the definitions of `VertexTaskManagerDetailInterface` and `JobSubTaskInterface` interfaces, the sorting functions of start-time, duration and end-time are incorrectly stated as `details.XXX`. !image-2021-10-25-16-23-55-894.png|width=552,height=73! -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] FLIP-176: Unified Iteration to Support Algorithms (Flink ML)
Excellent work to support iteration for Flink. +1 (binding) Best regards, Yunfeng On Sat, Oct 23, 2021 at 12:27 AM Till Rohrmann wrote: > Thanks for creating this FLIP. > > +1 (binding) > > Cheers, > Till > > > On Fri, Oct 22, 2021 at 6:29 AM Guowei Ma wrote: > > > +1 (binding) > > > > Best, > > Guowei > > > > > > On Thu, Oct 21, 2021 at 3:58 PM Yun Gao > > wrote: > > > > > > > > Hi all, > > > > > > We would like to start the vote for FLIP-176: Unified Iteration to > > Support > > > Algorithms (Flink ML) [1]. > > > This FLIP was discussed in this thread [2][3]. The FLIP-176 targets at > > > implementing the iteration > > > API in flink-ml to support the implementation of the algorithms. > > > > > > The vote will be open for at least 72 hours till 26th Oct morning, > > > including the weekend. Very thanks! > > > > > > Best, > > > Yun > > > > > > [1] https://cwiki.apache.org/confluence/x/hAEBCw > > > [2] > > > > > > https://lists.apache.org/thread.html/r72e87a71b14baac3d7d16268346f5fc7c65f1de989e00b4ab2aab9ab%40%3Cdev.flink.apache.org%3E > > > [3] > > > > > > https://lists.apache.org/thread.html/r63914a616de05a91dbe8e1a3208eb2b7c7c840c5c366bbd224483754%40%3Cdev.flink.apache.org%3E > > >
Re: [VOTE] FLIP-176: Unified Iteration to Support Algorithms (Flink ML)
Sorry that I misunderstood the usage of "binding". I am not a Flink committer so my vote should be a non-binding one. Best, Yunfeng On Mon, Oct 25, 2021 at 4:33 PM Yunfeng Zhou wrote: > Excellent work to support iteration for Flink. > > +1 (binding) > > Best regards, > Yunfeng > > On Sat, Oct 23, 2021 at 12:27 AM Till Rohrmann > wrote: > >> Thanks for creating this FLIP. >> >> +1 (binding) >> >> Cheers, >> Till >> >> >> On Fri, Oct 22, 2021 at 6:29 AM Guowei Ma wrote: >> >> > +1 (binding) >> > >> > Best, >> > Guowei >> > >> > >> > On Thu, Oct 21, 2021 at 3:58 PM Yun Gao >> > wrote: >> > >> > > >> > > Hi all, >> > > >> > > We would like to start the vote for FLIP-176: Unified Iteration to >> > Support >> > > Algorithms (Flink ML) [1]. >> > > This FLIP was discussed in this thread [2][3]. The FLIP-176 targets at >> > > implementing the iteration >> > > API in flink-ml to support the implementation of the algorithms. >> > > >> > > The vote will be open for at least 72 hours till 26th Oct morning, >> > > including the weekend. Very thanks! >> > > >> > > Best, >> > > Yun >> > > >> > > [1] https://cwiki.apache.org/confluence/x/hAEBCw >> > > [2] >> > > >> > >> https://lists.apache.org/thread.html/r72e87a71b14baac3d7d16268346f5fc7c65f1de989e00b4ab2aab9ab%40%3Cdev.flink.apache.org%3E >> > > [3] >> > > >> > >> https://lists.apache.org/thread.html/r63914a616de05a91dbe8e1a3208eb2b7c7c840c5c366bbd224483754%40%3Cdev.flink.apache.org%3E >> > >> >
[jira] [Created] (FLINK-24630) Implement row projection in AvroToRowDataConverters#createRowConverter
Caizhi Weng created FLINK-24630: --- Summary: Implement row projection in AvroToRowDataConverters#createRowConverter Key: FLINK-24630 URL: https://issues.apache.org/jira/browse/FLINK-24630 Project: Flink Issue Type: Improvement Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) Affects Versions: 1.15.0 Reporter: Caizhi Weng Currently {{AvroToRowDataConverters#createRowConverter}} converts avro records to Flink row data directly without any projection. However users of this method, such as {{AvroFileSystemFormatFactory.RowDataAvroInputFormat}}, need to implement their own projection logic to filter out the columns they needed. We can hide the logic of implementation in {{AvroToRowDataConverters#createRowConverter}} both for optimization (users do not need to copy the row for this) and for the ease of coding. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] FLIP-176: Unified Iteration to Support Algorithms (Flink ML)
+1 (binding) Thanks for the FLIP. Jiangjie (Becket) Qin On Mon, Oct 25, 2021 at 4:45 PM Yunfeng Zhou wrote: > Sorry that I misunderstood the usage of "binding". I am not a Flink > committer so my vote should be a non-binding one. > > Best, > Yunfeng > > On Mon, Oct 25, 2021 at 4:33 PM Yunfeng Zhou > wrote: > > > Excellent work to support iteration for Flink. > > > > +1 (binding) > > > > Best regards, > > Yunfeng > > > > On Sat, Oct 23, 2021 at 12:27 AM Till Rohrmann > > wrote: > > > >> Thanks for creating this FLIP. > >> > >> +1 (binding) > >> > >> Cheers, > >> Till > >> > >> > >> On Fri, Oct 22, 2021 at 6:29 AM Guowei Ma wrote: > >> > >> > +1 (binding) > >> > > >> > Best, > >> > Guowei > >> > > >> > > >> > On Thu, Oct 21, 2021 at 3:58 PM Yun Gao > > >> > wrote: > >> > > >> > > > >> > > Hi all, > >> > > > >> > > We would like to start the vote for FLIP-176: Unified Iteration to > >> > Support > >> > > Algorithms (Flink ML) [1]. > >> > > This FLIP was discussed in this thread [2][3]. The FLIP-176 targets > at > >> > > implementing the iteration > >> > > API in flink-ml to support the implementation of the algorithms. > >> > > > >> > > The vote will be open for at least 72 hours till 26th Oct morning, > >> > > including the weekend. Very thanks! > >> > > > >> > > Best, > >> > > Yun > >> > > > >> > > [1] https://cwiki.apache.org/confluence/x/hAEBCw > >> > > [2] > >> > > > >> > > >> > https://lists.apache.org/thread.html/r72e87a71b14baac3d7d16268346f5fc7c65f1de989e00b4ab2aab9ab%40%3Cdev.flink.apache.org%3E > >> > > [3] > >> > > > >> > > >> > https://lists.apache.org/thread.html/r63914a616de05a91dbe8e1a3208eb2b7c7c840c5c366bbd224483754%40%3Cdev.flink.apache.org%3E > >> > > >> > > >
[jira] [Created] (FLINK-24631) Avoiding directly use the labels as selector for deployment and service
Aitozi created FLINK-24631: -- Summary: Avoiding directly use the labels as selector for deployment and service Key: FLINK-24631 URL: https://issues.apache.org/jira/browse/FLINK-24631 Project: Flink Issue Type: Improvement Components: Deployment / Kubernetes Affects Versions: 1.14.0 Reporter: Aitozi We create deployment use the pod selector directly from labels, which is not necessary, and may cause problem when some user label value have changed (may be changed by other system). I think it's better to use the minimal and stable selector to select the JobManager pod like {{app=xxx, component=jobmanager}} and service, taskmanager pod and so on. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-24632) "where ... in(..)" has wrong result
xuyang created FLINK-24632: -- Summary: "where ... in(..)" has wrong result Key: FLINK-24632 URL: https://issues.apache.org/jira/browse/FLINK-24632 Project: Flink Issue Type: Bug Reporter: xuyang The sql is : {code:java} // code placeholder CREATE TABLE a( a1 INT , a2 INT ) WITH ( 'connector' = 'filesystem', 'path' = '/Users/tmp/test/testa.csv', 'format' = 'csv', 'csv.field-delimiter'=';') CREATE TABLE b( b1 INT , b2 INT ) WITH ( 'connector' = 'filesystem', 'path' = '/Users/tmp/test/testb.csv', 'format' = 'csv', 'csv.field-delimiter'=';') select * from a where a.a1 in (select a1 from b where a.a1 = b.b2) {code} and the data is {code:java} // testa.csv 1;1 1;2 4;6 77;88 // testb.csv 2;1 2;2 3;4{code} The result in PostgreSQL is : {code:java} // code placeholder 1 1 1 2 4 6{code} But in Flink, the result is : {code:java} // code placeholder 1 2 1 1 4 6 77 88{code} I think something goes wrong. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-24633) JobManager pod may stuck in containerCreating status during failover
Aitozi created FLINK-24633: -- Summary: JobManager pod may stuck in containerCreating status during failover Key: FLINK-24633 URL: https://issues.apache.org/jira/browse/FLINK-24633 Project: Flink Issue Type: Bug Components: Deployment / Kubernetes Affects Versions: 1.14.0 Reporter: Aitozi -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-24634) Java 11 profile should target JDK 8
Chesnay Schepler created FLINK-24634: Summary: Java 11 profile should target JDK 8 Key: FLINK-24634 URL: https://issues.apache.org/jira/browse/FLINK-24634 Project: Flink Issue Type: Technical Debt Components: Build System Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.15.0 Thee {{java11}} profile currently targets Java 11. This was useful because we saw that doing so reveals additional issues that are not detected when building for Java 8. The end goal was to ensure a smooth transition once we switch. However, this has adverse effects on developer productivity. If you happen to switch between Java versions (for example, because of separate environments, or because certain features require Java 8), then you can easily run into UnsupportedVersionErrors when attempting to use Java 8 with Java 11 bytecode. IntelliJ also picks up on this and automatically sets the language level to 11, which means that it will readily allow you to use Java 11 exclusive APIs that will fail on CI later on. To remedy this I propose to split the profile. The {{java11}} profile will pretty much stay as is, except that it is targeting java 8. The value proposition of this profile is being able to build Flink for Java 8 with Java 11. A new explicitly-opt-in {{java11-target}} profile then sets the target version to Java 11, which we will use on CI. This profile will ensure that we can readily switch to Java 11 as the target in the future. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[Discuss] Planning Flink 1.15
Hi all, As people have already started working on their 1.15. contribution, we'd like to start the discussion for the release setup. - Release managers: As a team of three seems to have worked perfectly fine, we'd like to suggest Till, Yun Gao & Joe as the release managers for 1.15. - Timeline: 1.14 was released at the end of September and aiming for a 4 months release cycle including one months of stabilisation would lead to a feature freeze date at the end of December, which would make the European holiday season a bit stressful. One option would have been to aim for early December, but we decided to go for the 17th of January. Such that we also have some buffer before the Chinese new year. - Bi-weekly sync: We'd also like to setup a bi-weekly sync again starting from the 9th of November at 9am CET/4pm CST. - Collecting features: As last time it would be helpful to have a rough overview of the efforts that will likely be included in this release. We have created a wiki page [1] for collecting such information. We'd like to kindly ask all committers to fill in the page with features that they intend to work on. Just copy pasting what we included into the planning email for 1.14, because it still applies: - Stability of master: This has been an issue during the 1.13 & 1.14 feature freeze phase and it is still going on. We encourage every committer to not merge PRs through the Github button, but do this manually, with caution for the commits merged after the CI being triggered. It would be appreciated to always build the project before merging to master. - Documentation: Please try to see documentation as an integrated part of the engineering process and don't push it to the feature freeze phase or even after. You might even think about going documentation first. We, as the Flink community, are adding great stuff, that is pushing the limits of streaming data processors, with every release. We should also make this stuff usable for our users by documenting it well. - Promotion of 1.15: What applies to documentation also applies to all the activity around the release. We encourage every contributor to also think about, plan and prepare activities like blog posts and talk, that will promote and spread the release once it is done. Please let us know what you think. Thank you~ Till, Yun Gao & Joe [1] https://cwiki.apache.org/confluence/display/FLINK/1.15+Release
[jira] [Created] (FLINK-24635) Clean up flink-examples
Seth Wiesman created FLINK-24635: Summary: Clean up flink-examples Key: FLINK-24635 URL: https://issues.apache.org/jira/browse/FLINK-24635 Project: Flink Issue Type: Improvement Components: Examples Reporter: Seth Wiesman Assignee: Seth Wiesman Fix For: 1.15.0 The Flink DataStream examples have a number of deprecation warnings. These are some of the first things new users look at and we should be showing best practices. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-24636) Move cluster deletion operation cache into ResourceManager
Chesnay Schepler created FLINK-24636: Summary: Move cluster deletion operation cache into ResourceManager Key: FLINK-24636 URL: https://issues.apache.org/jira/browse/FLINK-24636 Project: Flink Issue Type: Sub-task Components: Runtime / Coordination, Runtime / REST Reporter: Chesnay Schepler Fix For: 1.15.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-24637) Move savepoint disposal operation cache into Dispatcher
Chesnay Schepler created FLINK-24637: Summary: Move savepoint disposal operation cache into Dispatcher Key: FLINK-24637 URL: https://issues.apache.org/jira/browse/FLINK-24637 Project: Flink Issue Type: Sub-task Components: Runtime / Coordination, Runtime / REST Reporter: Chesnay Schepler Fix For: 1.15.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-24638) Unknown variable or type "org.apache.flink.table.utils.DateTimeUtils"
Sergey Nuyanzin created FLINK-24638: --- Summary: Unknown variable or type "org.apache.flink.table.utils.DateTimeUtils" Key: FLINK-24638 URL: https://issues.apache.org/jira/browse/FLINK-24638 Project: Flink Issue Type: Bug Components: Table SQL / API Reporter: Sergey Nuyanzin The problem is not constantly reproduced however it is reproduced almost every 2-nd query via FlinkSqlClient containing {{current_timestamp}}, {{current_date}} e.g. {code:sql} select extract(millennium from current_date); select extract(millennium from current_timestamp); select floor(current_timestamp to day); select ceil(current_timestamp to day); {code} trace {noformat} [ERROR] Could not execute SQL statement. Reason: org.codehaus.commons.compiler.CompileException: Line 59, Column 16: Unknown variable or type "org.apache.flink.table.utils.DateTimeUtils" at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12211) at org.codehaus.janino.UnitCompiler.getType2(UnitCompiler.java:6860) at org.codehaus.janino.UnitCompiler.access$13600(UnitCompiler.java:215) at org.codehaus.janino.UnitCompiler$22.visitPackage(UnitCompiler.java:6472) at org.codehaus.janino.UnitCompiler$22.visitPackage(UnitCompiler.java:6469) at org.codehaus.janino.Java$Package.accept(Java.java:4248) at org.codehaus.janino.UnitCompiler.getType(UnitCompiler.java:6469) at org.codehaus.janino.UnitCompiler.getType2(UnitCompiler.java:6855) at org.codehaus.janino.UnitCompiler.access$14200(UnitCompiler.java:215) at org.codehaus.janino.UnitCompiler$22$2$1.visitAmbiguousName(UnitCompiler.java:6497) at org.codehaus.janino.UnitCompiler$22$2$1.visitAmbiguousName(UnitCompiler.java:6494) at org.codehaus.janino.Java$AmbiguousName.accept(Java.java:4224) at org.codehaus.janino.UnitCompiler$22$2.visitLvalue(UnitCompiler.java:6494) at org.codehaus.janino.UnitCompiler$22$2.visitLvalue(UnitCompiler.java:6490) at org.codehaus.janino.Java$Lvalue.accept(Java.java:4148) at org.codehaus.janino.UnitCompiler$22.visitRvalue(UnitCompiler.java:6490) at org.codehaus.janino.UnitCompiler$22.visitRvalue(UnitCompiler.java:6469) at org.codehaus.janino.Java$Rvalue.accept(Java.java:4116) at org.codehaus.janino.UnitCompiler.getType(UnitCompiler.java:6469) at org.codehaus.janino.UnitCompiler.findIMethod(UnitCompiler.java:9026) at org.codehaus.janino.UnitCompiler.getType2(UnitCompiler.java:7106) at org.codehaus.janino.UnitCompiler.access$15800(UnitCompiler.java:215) at org.codehaus.janino.UnitCompiler$22$2.visitMethodInvocation(UnitCompiler.java:6517) at org.codehaus.janino.UnitCompiler$22$2.visitMethodInvocation(UnitCompiler.java:6490) at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:5073) at org.codehaus.janino.UnitCompiler$22.visitRvalue(UnitCompiler.java:6490) at org.codehaus.janino.UnitCompiler$22.visitRvalue(UnitCompiler.java:6469) at org.codehaus.janino.Java$Rvalue.accept(Java.java:4116) at org.codehaus.janino.UnitCompiler.getType(UnitCompiler.java:6469) at org.codehaus.janino.UnitCompiler.findMostSpecificIInvocable(UnitCompiler.java:9237) at org.codehaus.janino.UnitCompiler.findIMethod(UnitCompiler.java:9123) at org.codehaus.janino.UnitCompiler.findIMethod(UnitCompiler.java:9025) at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:5062) at org.codehaus.janino.UnitCompiler.access$9100(UnitCompiler.java:215) at org.codehaus.janino.UnitCompiler$16.visitMethodInvocation(UnitCompiler.java:4423) at org.codehaus.janino.UnitCompiler$16.visitMethodInvocation(UnitCompiler.java:4396) at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:5073) at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:4396) at org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:5662) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:3792) at org.codehaus.janino.UnitCompiler.access$6100(UnitCompiler.java:215) at org.codehaus.janino.UnitCompiler$13.visitAssignment(UnitCompiler.java:3754) at org.codehaus.janino.UnitCompiler$13.visitAssignment(UnitCompiler.java:3734) at org.codehaus.janino.Java$Assignment.accept(Java.java:4477) at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3734) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2360) at org.codehaus.janino.UnitCompiler.access$1800(UnitCompiler.java:215) at org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1494) at org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1487) at org.co
[jira] [Created] (FLINK-24639) Improve assignment of Kinesis shards to subtasks
John Karp created FLINK-24639: - Summary: Improve assignment of Kinesis shards to subtasks Key: FLINK-24639 URL: https://issues.apache.org/jira/browse/FLINK-24639 Project: Flink Issue Type: Improvement Components: Connectors / Kinesis Reporter: John Karp Attachments: Screen Shot 2021-10-25 at 5.11.29 PM.png The default assigner of Kinesis shards to Flink subtasks simply takes the hashCode() of the StreamShardHandle (an integer), which is then interpreted modulo the number of subtasks. This basically does random-ish but deterministic assignment of shards to subtasks. However, this can lead to some subtasks getting several times the number of shards as others. To prevent those unlucky subtasks from being overloaded, the overall Flink cluster must be over-provisioned, so that each subtask has more headroom to handle any over-assignment of shards. We can do better here, at least if Kinesis is being used in a common way. Each record sent to a Kinesis stream has a particular hash key in the range [0, 2^128), which is used to determine which shard gets used; each shard has an assigned range of hash keys. By default Kinesis assigns each shard equal fractions of the hash-key space. And when you scale up or down using UpdateShardCount, it tries to maintain equal fractions to the extent possible. Also, a shard's hash key range is fixed at creation; it can only be replaced by new shards, which split it, or merge it. Given the above, one way to assign shards to subtasks is to do a linear mapping from hash-keys in range [0, 2^128) to subtask indices in [0, nSubtasks). For the 'coordinate' of each shard we pick the middle of the shard's range, to ensure neither subtask 0 nor subtask (n-1) is assigned too many. However this will probably not be helpful for Kinesis users that don't randomly assign partition or hash keys to Kinesis records. The existing assigner is probably better for them. I ran a simulation of the default shard assigner versus some alternatives, using shards taken from one of our Kinesis streams; results attached. The measure I used I call 'overload' and it measures how many times more shards the most heavily-loaded subtask has than is necessary. (DEFAULT is the default assigner, Sha256 is similar to the default but with a stronger hashing function, ShardId extracts the shard number from the shardId and uses that, and HashKey is the one I describe above.) Patch is at: https://github.com/apache/flink/compare/master...john-karp:uniform-shard-assigner?expand=1 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[NOTICE] flink-streaming-java no longer depends on Scala and lost it's suffix
Hello all, I just wanted to inform everyone that I just merged https://issues.apache.org/jira/browse/FLINK-24018, removing the transitive Scala dependencies from flink-streaming-java. This also means that the module lost it's Scala suffix, along with a lot of other modules. Please keep this mind this for a few days when adding Flink dependencies or new modules; it is quite likely that something has changed w.r.t. the Scala suffixes. For completeness sake, these are the module that lost the suffix: |flink-batch-sql-test flink-cep flink-cli-test flink-clients flink-connector-elasticsearch-base flink-connector-elasticsearch5 flink-connector-elasticsearch6 flink-connector-elasticsearch7 flink-connector-gcp-pubsub flink-connector-hbase-1.4 flink-connector-hbase-2.2 flink-connector-hbase-base flink-connector-jdbc flink-connector-kafka flink-connector-kinesis flink-connector-nifi flink-connector-pulsar flink-connector-rabbitmq flink-connector-testing flink-connector-twitter flink-connector-wikiedits flink-container flink-distributed-cache-via-blob-test flink-dstl-dfs flink-gelly flink-hadoop-bulk flink-kubernetes flink-parent-child-classloading-test-lib-package flink-parent-child-classloading-test-program flink-queryable-state-test flink-runtime-web flink-scala flink-sql-connector-elasticsearch6 flink-sql-connector-elasticsearch7 flink-sql-connector-hbase-1.4 flink-sql-connector-hbase-2.2 flink-sql-connector-kafka flink-sql-connector-kinesis flink-sql-connector-rabbitmq flink-state-processor-api flink-statebackend-rocksdb flink-streaming-java flink-streaming-kafka-test flink-streaming-kafka-test-base flink-streaming-kinesis-test flink-table-api-java-bridge flink-test-utils flink-walkthrough-common flink-yarn|
Re: [NOTICE] flink-streaming-java no longer depends on Scala and lost it's suffix
This time with proper formatting... flink-batch-sql-test flink-cep flink-cli-test flink-clients flink-connector-elasticsearch-base flink-connector-elasticsearch5 flink-connector-elasticsearch6 flink-connector-elasticsearch7 flink-connector-gcp-pubsub flink-connector-hbase-1.4 flink-connector-hbase-2.2 flink-connector-hbase-base flink-connector-jdbc flink-connector-kafka flink-connector-kinesis flink-connector-nifi flink-connector-pulsar flink-connector-rabbitmq flink-connector-testing flink-connector-twitter flink-connector-wikiedits flink-container flink-distributed-cache-via-blob-test flink-dstl-dfs flink-gelly flink-hadoop-bulk flink-kubernetes flink-parent-child-classloading-test-lib-package flink-parent-child-classloading-test-program flink-queryable-state-test flink-runtime-web flink-scala flink-sql-connector-elasticsearch6 flink-sql-connector-elasticsearch7 flink-sql-connector-hbase-1.4 flink-sql-connector-hbase-2.2 flink-sql-connector-kafka flink-sql-connector-kinesis flink-sql-connector-rabbitmq flink-state-processor-api flink-statebackend-rocksdb flink-streaming-java flink-streaming-kafka-test flink-streaming-kafka-test-base flink-streaming-kinesis-test flink-table-api-java-bridge flink-test-utils flink-walkthrough-common flink-yarn On 26/10/2021 01:04, Chesnay Schepler wrote: Hello all, I just wanted to inform everyone that I just merged https://issues.apache.org/jira/browse/FLINK-24018, removing the transitive Scala dependencies from flink-streaming-java. This also means that the module lost it's Scala suffix, along with a lot of other modules. Please keep this mind this for a few days when adding Flink dependencies or new modules; it is quite likely that something has changed w.r.t. the Scala suffixes. For completeness sake, these are the module that lost the suffix: |flink-batch-sql-test flink-cep flink-cli-test flink-clients flink-connector-elasticsearch-base flink-connector-elasticsearch5 flink-connector-elasticsearch6 flink-connector-elasticsearch7 flink-connector-gcp-pubsub flink-connector-hbase-1.4 flink-connector-hbase-2.2 flink-connector-hbase-base flink-connector-jdbc flink-connector-kafka flink-connector-kinesis flink-connector-nifi flink-connector-pulsar flink-connector-rabbitmq flink-connector-testing flink-connector-twitter flink-connector-wikiedits flink-container flink-distributed-cache-via-blob-test flink-dstl-dfs flink-gelly flink-hadoop-bulk flink-kubernetes flink-parent-child-classloading-test-lib-package flink-parent-child-classloading-test-program flink-queryable-state-test flink-runtime-web flink-scala flink-sql-connector-elasticsearch6 flink-sql-connector-elasticsearch7 flink-sql-connector-hbase-1.4 flink-sql-connector-hbase-2.2 flink-sql-connector-kafka flink-sql-connector-kinesis flink-sql-connector-rabbitmq flink-state-processor-api flink-statebackend-rocksdb flink-streaming-java flink-streaming-kafka-test flink-streaming-kafka-test-base flink-streaming-kinesis-test flink-table-api-java-bridge flink-test-utils flink-walkthrough-common flink-yarn|
Re: [Discuss] Planning Flink 1.15
Thanks for the update, Joe. Looking forward to this release. On Mon, Oct 25, 2021 at 11:13 AM Johannes Moser wrote: > Hi all, > > As people have already started working on their 1.15. > contribution, we'd like to start the discussion for > the release setup. > > - Release managers: As a team of three seems to have > worked perfectly fine, we'd like to suggest Till, Yun Gao > & Joe as the release managers for 1.15. > > - Timeline: 1.14 was released at the end of September and > aiming for a 4 months release cycle including one months > of stabilisation would lead to a feature freeze date at the > end of December, which would make the European holiday > season a bit stressful. One option would have been to aim for early > December, but we decided to go for the 17th of January. > Such that we also have some buffer before the Chinese new > year. > > - Bi-weekly sync: We'd also like to setup a bi-weekly sync again > starting from the 9th of November at 9am CET/4pm CST. > > - Collecting features: As last time it would be helpful to have > a rough overview of the efforts that will likely be included in > this release. We have created a wiki page [1] for collecting such > information. We'd like to kindly ask all committers to fill in the > page with features that they intend to work on. > > Just copy pasting what we included into the planning email > for 1.14, because it still applies: > > - Stability of master: This has been an issue during the 1.13 & 1.14 > feature freeze phase and it is still going on. We encourage every > committer to not merge PRs through the Github button, but do this > manually, with caution for the commits merged after the CI being > triggered. It would be appreciated to always build the project before > merging to master. > > - Documentation: Please try to see documentation as an integrated > part of the engineering process and don't push it to the feature > freeze phase or even after. You might even think about going > documentation first. We, as the Flink community, are adding great > stuff, that is pushing the limits of streaming data processors, with > every release. We should also make this stuff usable for our users by > documenting it well. > > - Promotion of 1.15: What applies to documentation also applies > to all the activity around the release. We encourage every contributor > to also think about, plan and prepare activities like blog posts and talk, > that will promote and spread the release once it is done. > > Please let us know what you think. > > Thank you~ > Till, Yun Gao & Joe > > [1] https://cwiki.apache.org/confluence/display/FLINK/1.15+Release >
[jira] [Created] (FLINK-24640) CEIL, FLOOR built-in functions for Timestamp should respect DST
Sergey Nuyanzin created FLINK-24640: --- Summary: CEIL, FLOOR built-in functions for Timestamp should respect DST Key: FLINK-24640 URL: https://issues.apache.org/jira/browse/FLINK-24640 Project: Flink Issue Type: Bug Components: Table SQL / API Reporter: Sergey Nuyanzin The problem is that if there is a date in DST time then {code:sql} select floor(current_timestamp to year); {code} leads to result {noformat} 2021-12-31 23:00:00.000 {noformat} while expected is {{2022-01-01 00:00:00.000}} same issue is with {{WEEK}}, {{QUARTER}} and {{MONTH}} -- This message was sent by Atlassian Jira (v8.3.4#803005)