Re: [DISCUSS] FLIP-188: Introduce Built-in Dynamic Table Storage

2021-10-25 Thread Jingsong Li
Hi Ingo,

Really appreciate your feedback.

#1. The reason why we insist on using no "connector" option is that we
want to bring the following design to users:
- With the "connector" option, it is a mapping, unmanaged table.
- Without the "connector" option, it is a managed table. It may be an
Iceberg managed table, or may be a JDBC managed table, or may be a
Flink managed table.

#2. About:
CREATE TABLE T (f0 INT);
ALTER TABLE T SET ('connector' = '…');

I think it is dangerous, even for a generic table. The managed table
should prohibit it.

#3. DDL and Table API

You are right, Table Api should be a superset of SQL. There is no
doubt that it should support BDT.

Best,
Jingsong

On Mon, Oct 25, 2021 at 2:18 PM Ingo Bürk  wrote:
>
> Hi Jingsong,
>
> thanks again for the answers. I think requiring catalogs to implement an
> interface to support BDTs is something we'll need (though personally I
> still prefer explicit DDL here over the "no connector option" approach).
>
> What about more edge cases like
>
> CREATE TABLE T (f0 INT);
> ALTER TABLE T SET ('connector' = '…');
>
> This would have to first create the physical storage and then delete it
> again, right?
>
> On a separate note, he FLIP currently only discusses SQL DDL, and you have
> also mentioned
>
> > BDT only can be dropped by Flink SQL DDL now.
>
> Something Flink suffers from a lot is inconsistencies across APIs. I think
> it is important that we support features on all major APIs, i.e. including
> Table API.
> For example for creating a BDT this would mean e.g. adding something like
> #forManaged(…) to TableDescriptor.
>
>
> Best
> Ingo
>
> On Mon, Oct 25, 2021 at 5:27 AM Jingsong Li  wrote:
>
> > Hi Ingo,
> >
> > I thought again.
> >
> > I'll try to sort out the current catalog behaviors.
> > Actually, we can divide catalogs into three categories:
> >
> > 1. ExternalCatalog: it can only read or create a single table kind
> > which connects to external storage. TableFactory is provided by
> > Catalog, which can have nothing to do with Flink's Factory discovery
> > mechanism, such as IcebergCatalog, JdbcCatalog, PostgresCatalog, etc.
> > Catalog manages the life cycle of its **managed** tables, which means
> > that creation and drop will affect the real physical storage. The DDL
> > has no "connector" option.
> >
> > 2. GenericCatalog (or FlinkCatalog): only Flink tables are saved and
> > factories are created through Flink's factory discovery mechanism. At
> > this time, the catalog is actually only a storage medium for saving
> > schema and options, such as GenericInMemoryCatalog. Catalog only saves
> > meta information and does not manage the underlying physical storage
> > of tables. These tables are **unmanaged**. The DDL must have a
> > "connector" option.
> >
> > 3. HybridCatalog: It can save both its own **managed** table and
> > generic Flink **unmanaged** table, such as HiveCatalog.
> >
> > We want to use the "connector" option to distinguish whether it is
> > managed or not.
> >
> > Now, consider the Flink managed table in this FLIP.
> > a. ExternalCatalog can not support Flink managed tables.
> > b. GenericCatalog can support Flink managed tables without the
> > "connector" option.
> > c. What about HybridCatalog (HiveCatalog)? Yes, we want HiveCatalog to
> > support Flink managed tables:
> > - with "connector" option in Flink dialect is unmanaged tables
> > - Hive DDL in Hive dialect is Hive managed tables, the parser will add
> > "connector = hive" automatically. At present, there are many
> > differences between Flink DDL and Hive DDL, and even their features
> > have many differences.
> > - without "connector" option in Flink dialect is Flink managed tables.
> >
> > In this way, we can support Flink managed tables while maintaining
> > compatibility.
> >
> > Anyway, we need introduce a "SupportsFlinkManagedTable" to catalog.
> >
> > ## Back to your question #
> >
> > > but we should make it clear that this is a limitation and probably
> > document how users can clean up the underlying physical storage manually in
> > this case
> >
> > Yes, it's strange that the catalog should manage tables, but some
> > catalogs don't have this ability.
> > - For PersistentCatalog, the meta will continue until the underlying
> > physical storage is deleted.
> > - For InMemoryCatalog, yes, we should document it for the underlying
> > physical storage of Flink managed tables.
> >
> > > the HiveCatalog doesn't list a 'connector' option for its tables.
> >
> > Actually, It can be divided into two steps: create and save:
> > - When creating a table, the table seen by HiveCatalog must have
> > "connector = hive", which is the hive table (Hive managed table). You
> > can see the "HiveCatalog.isHiveTable".
> > - When saving the table, it will remove the connector of the hive
> > table. We can do this: with "connector" option is Flink generic table,
> > without "connector" option is Hive table, with "flink-managed = true"
> >

[jira] [Created] (FLINK-24628) Invalid JDBC query template when no fields are selected

2021-10-25 Thread Paul Lin (Jira)
Paul Lin created FLINK-24628:


 Summary: Invalid JDBC query template when no fields are selected
 Key: FLINK-24628
 URL: https://issues.apache.org/jira/browse/FLINK-24628
 Project: Flink
  Issue Type: Bug
  Components: Connectors / JDBC
Affects Versions: 1.12.3
Reporter: Paul Lin


A query like `select uuid() from mysql_table` will result in an invalid query 
template like `select from mysql_table` in JdbcDynamicTableSource. 

We should avoid making a TableScan when there're no relevant fields are 
actually used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Should we drop Row SerializationSchema/DeserializationSchema?

2021-10-25 Thread Francesco Guardiani
Last remainder: unless there are any objections, I will proceed with
deprecating these by the end of the week.

On Thu, Oct 21, 2021 at 4:28 PM Konstantin Knauf  wrote:

> +1 for deprecating and then dropping them.
>
> On Thu, Oct 21, 2021 at 3:31 PM Timo Walther  wrote:
>
> > Hi Francesco,
> >
> > thanks for starting this discussion. It is definitely time to clean up
> > more connectors and formats that were used for the old planner but are
> > actually not intended for the DataStream API.
> >
> > +1 for deprecating and dropping the mentioned formats. Users can either
> > use Table API or implement a custom
> > SerializationSchema/DeserializationSchema according to their needs. It
> > is actually not that complicated to add Jackson and configure the
> > ObjectMapper for reading JSON/CSV.
> >
> > Regards,
> > Timo
> >
> >
> > On 18.10.21 17:42, Francesco Guardiani wrote:
> > > Hi all,
> > > In flink-avro, flink-csv and flink-json we have implementations of
> > > SerializationSchema/DeserializationSchema for the
> > org.apache.flink.types.Row
> > > type. In particular, I'm referring to:
> > >
> > > - org.apache.flink.formats.json.JsonRowSerializationSchema
> > > - org.apache.flink.formats.json.JsonRowDeserializationSchema
> > > - org.apache.flink.formats.avro.AvroRowSerializationSchema
> > > - org.apache.flink.formats.avro.AvroRowDeserializationSchema
> > > - org.apache.flink.formats.csv.CsvRowDeserializationSchema
> > > - org.apache.flink.formats.csv.CsvRowSerializationSchema
> > >
> > > These classes were used in the old table planner, but now the table
> > planner
> > > doesn't use the Row type internally anymore, so these classes are
> unused
> > > from the flink-table packages.
> > >
> > > Because these classes are exposed (some have @PublicEvolving
> annotation)
> > > there might be some users out there using them when using the
> DataStream
> > > APIs, for example to convert an input stream of JSON from Kafka to a
> Row
> > > instance.
> > >
> > > Do you have any opinions about deprecating these classes in 1.15 and
> then
> > > drop them in 1.16? Or are you using them? If yes, can you describe your
> > use
> > > case?
> > >
> > > Thank you,
> > >
> > > FG
> > >
> >
> >
>
> --
>
> Konstantin Knauf
>
> https://twitter.com/snntrable
>
> https://github.com/knaufk
>


[jira] [Created] (FLINK-24629) Sorting start-time/duration/end-time not working under pages of vertex taskManagers and sub-tasks

2021-10-25 Thread Junhan Yang (Jira)
Junhan Yang created FLINK-24629:
---

 Summary: Sorting start-time/duration/end-time not working under 
pages of vertex taskManagers and sub-tasks
 Key: FLINK-24629
 URL: https://issues.apache.org/jira/browse/FLINK-24629
 Project: Flink
  Issue Type: Bug
Reporter: Junhan Yang
 Attachments: image-2021-10-25-16-23-55-894.png

Based on the definitions of `VertexTaskManagerDetailInterface` and 
`JobSubTaskInterface` interfaces, the sorting functions of start-time, duration 
and end-time are incorrectly stated as `details.XXX`. 
!image-2021-10-25-16-23-55-894.png|width=552,height=73!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] FLIP-176: Unified Iteration to Support Algorithms (Flink ML)

2021-10-25 Thread Yunfeng Zhou
Excellent work to support iteration for Flink.

+1 (binding)

Best regards,
Yunfeng

On Sat, Oct 23, 2021 at 12:27 AM Till Rohrmann  wrote:

> Thanks for creating this FLIP.
>
> +1 (binding)
>
> Cheers,
> Till
>
>
> On Fri, Oct 22, 2021 at 6:29 AM Guowei Ma  wrote:
>
> > +1 (binding)
> >
> > Best,
> > Guowei
> >
> >
> > On Thu, Oct 21, 2021 at 3:58 PM Yun Gao 
> > wrote:
> >
> > >
> > > Hi all,
> > >
> > > We would like to start the vote for FLIP-176: Unified Iteration to
> > Support
> > > Algorithms (Flink ML) [1].
> > > This FLIP was discussed in this thread [2][3]. The FLIP-176 targets at
> > > implementing the iteration
> > > API in flink-ml to support the implementation of the algorithms.
> > >
> > > The vote will be open for at least 72 hours till 26th Oct morning,
> > > including the weekend. Very thanks!
> > >
> > > Best,
> > > Yun
> > >
> > > [1] https://cwiki.apache.org/confluence/x/hAEBCw
> > > [2]
> > >
> >
> https://lists.apache.org/thread.html/r72e87a71b14baac3d7d16268346f5fc7c65f1de989e00b4ab2aab9ab%40%3Cdev.flink.apache.org%3E
> > > [3]
> > >
> >
> https://lists.apache.org/thread.html/r63914a616de05a91dbe8e1a3208eb2b7c7c840c5c366bbd224483754%40%3Cdev.flink.apache.org%3E
> >
>


Re: [VOTE] FLIP-176: Unified Iteration to Support Algorithms (Flink ML)

2021-10-25 Thread Yunfeng Zhou
Sorry that I misunderstood the usage of "binding". I am not a Flink
committer so my vote should be a non-binding one.

Best,
Yunfeng

On Mon, Oct 25, 2021 at 4:33 PM Yunfeng Zhou 
wrote:

> Excellent work to support iteration for Flink.
>
> +1 (binding)
>
> Best regards,
> Yunfeng
>
> On Sat, Oct 23, 2021 at 12:27 AM Till Rohrmann 
> wrote:
>
>> Thanks for creating this FLIP.
>>
>> +1 (binding)
>>
>> Cheers,
>> Till
>>
>>
>> On Fri, Oct 22, 2021 at 6:29 AM Guowei Ma  wrote:
>>
>> > +1 (binding)
>> >
>> > Best,
>> > Guowei
>> >
>> >
>> > On Thu, Oct 21, 2021 at 3:58 PM Yun Gao 
>> > wrote:
>> >
>> > >
>> > > Hi all,
>> > >
>> > > We would like to start the vote for FLIP-176: Unified Iteration to
>> > Support
>> > > Algorithms (Flink ML) [1].
>> > > This FLIP was discussed in this thread [2][3]. The FLIP-176 targets at
>> > > implementing the iteration
>> > > API in flink-ml to support the implementation of the algorithms.
>> > >
>> > > The vote will be open for at least 72 hours till 26th Oct morning,
>> > > including the weekend. Very thanks!
>> > >
>> > > Best,
>> > > Yun
>> > >
>> > > [1] https://cwiki.apache.org/confluence/x/hAEBCw
>> > > [2]
>> > >
>> >
>> https://lists.apache.org/thread.html/r72e87a71b14baac3d7d16268346f5fc7c65f1de989e00b4ab2aab9ab%40%3Cdev.flink.apache.org%3E
>> > > [3]
>> > >
>> >
>> https://lists.apache.org/thread.html/r63914a616de05a91dbe8e1a3208eb2b7c7c840c5c366bbd224483754%40%3Cdev.flink.apache.org%3E
>> >
>>
>


[jira] [Created] (FLINK-24630) Implement row projection in AvroToRowDataConverters#createRowConverter

2021-10-25 Thread Caizhi Weng (Jira)
Caizhi Weng created FLINK-24630:
---

 Summary: Implement row projection in 
AvroToRowDataConverters#createRowConverter
 Key: FLINK-24630
 URL: https://issues.apache.org/jira/browse/FLINK-24630
 Project: Flink
  Issue Type: Improvement
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.15.0
Reporter: Caizhi Weng


Currently {{AvroToRowDataConverters#createRowConverter}} converts avro records 
to Flink row data directly without any projection. However users of this 
method, such as {{AvroFileSystemFormatFactory.RowDataAvroInputFormat}}, need to 
implement their own projection logic to filter out the columns they needed.

We can hide the logic of implementation in 
{{AvroToRowDataConverters#createRowConverter}} both for optimization (users do 
not need to copy the row for this) and for the ease of coding.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] FLIP-176: Unified Iteration to Support Algorithms (Flink ML)

2021-10-25 Thread Becket Qin
+1 (binding)

Thanks for the FLIP.

Jiangjie (Becket) Qin


On Mon, Oct 25, 2021 at 4:45 PM Yunfeng Zhou 
wrote:

> Sorry that I misunderstood the usage of "binding". I am not a Flink
> committer so my vote should be a non-binding one.
>
> Best,
> Yunfeng
>
> On Mon, Oct 25, 2021 at 4:33 PM Yunfeng Zhou 
> wrote:
>
> > Excellent work to support iteration for Flink.
> >
> > +1 (binding)
> >
> > Best regards,
> > Yunfeng
> >
> > On Sat, Oct 23, 2021 at 12:27 AM Till Rohrmann 
> > wrote:
> >
> >> Thanks for creating this FLIP.
> >>
> >> +1 (binding)
> >>
> >> Cheers,
> >> Till
> >>
> >>
> >> On Fri, Oct 22, 2021 at 6:29 AM Guowei Ma  wrote:
> >>
> >> > +1 (binding)
> >> >
> >> > Best,
> >> > Guowei
> >> >
> >> >
> >> > On Thu, Oct 21, 2021 at 3:58 PM Yun Gao  >
> >> > wrote:
> >> >
> >> > >
> >> > > Hi all,
> >> > >
> >> > > We would like to start the vote for FLIP-176: Unified Iteration to
> >> > Support
> >> > > Algorithms (Flink ML) [1].
> >> > > This FLIP was discussed in this thread [2][3]. The FLIP-176 targets
> at
> >> > > implementing the iteration
> >> > > API in flink-ml to support the implementation of the algorithms.
> >> > >
> >> > > The vote will be open for at least 72 hours till 26th Oct morning,
> >> > > including the weekend. Very thanks!
> >> > >
> >> > > Best,
> >> > > Yun
> >> > >
> >> > > [1] https://cwiki.apache.org/confluence/x/hAEBCw
> >> > > [2]
> >> > >
> >> >
> >>
> https://lists.apache.org/thread.html/r72e87a71b14baac3d7d16268346f5fc7c65f1de989e00b4ab2aab9ab%40%3Cdev.flink.apache.org%3E
> >> > > [3]
> >> > >
> >> >
> >>
> https://lists.apache.org/thread.html/r63914a616de05a91dbe8e1a3208eb2b7c7c840c5c366bbd224483754%40%3Cdev.flink.apache.org%3E
> >> >
> >>
> >
>


[jira] [Created] (FLINK-24631) Avoiding directly use the labels as selector for deployment and service

2021-10-25 Thread Aitozi (Jira)
Aitozi created FLINK-24631:
--

 Summary: Avoiding directly use the labels as selector for 
deployment and service
 Key: FLINK-24631
 URL: https://issues.apache.org/jira/browse/FLINK-24631
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / Kubernetes
Affects Versions: 1.14.0
Reporter: Aitozi


We create deployment use the pod selector directly from labels, which is not 
necessary, and may cause problem when some user label value have changed (may 
be changed by other system). 

I think it's better to use the minimal  and stable selector to select the 
JobManager pod like {{app=xxx, component=jobmanager}} and service, taskmanager 
pod and so on. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-24632) "where ... in(..)" has wrong result

2021-10-25 Thread xuyang (Jira)
xuyang created FLINK-24632:
--

 Summary: "where ... in(..)" has wrong result
 Key: FLINK-24632
 URL: https://issues.apache.org/jira/browse/FLINK-24632
 Project: Flink
  Issue Type: Bug
Reporter: xuyang


The sql is :

 
{code:java}
// code placeholder
CREATE TABLE a(
  a1 INT , a2 INT
) WITH (
  'connector' = 'filesystem',
  'path' = '/Users/tmp/test/testa.csv',
  'format' = 'csv',
  'csv.field-delimiter'=';')

CREATE TABLE b(
   b1 INT , b2 INT 
) WITH ( 
  'connector' = 'filesystem', 
  'path' = '/Users/tmp/test/testb.csv', 
  'format' = 'csv', 
  'csv.field-delimiter'=';')

select * from a where a.a1 in (select a1 from b where a.a1 = b.b2)

{code}
and the data is
{code:java}
// testa.csv
1;1
1;2
4;6
77;88

// testb.csv
2;1
2;2
3;4{code}
The result in PostgreSQL is :
{code:java}
// code placeholder
1 1
1 2
4 6{code}
But in Flink, the result is :
{code:java}
// code placeholder
1 2
1 1
4 6
77 88{code}
I think something goes wrong.

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-24633) JobManager pod may stuck in containerCreating status during failover

2021-10-25 Thread Aitozi (Jira)
Aitozi created FLINK-24633:
--

 Summary: JobManager pod may stuck in containerCreating status 
during failover
 Key: FLINK-24633
 URL: https://issues.apache.org/jira/browse/FLINK-24633
 Project: Flink
  Issue Type: Bug
  Components: Deployment / Kubernetes
Affects Versions: 1.14.0
Reporter: Aitozi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-24634) Java 11 profile should target JDK 8

2021-10-25 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-24634:


 Summary: Java 11 profile should target JDK 8
 Key: FLINK-24634
 URL: https://issues.apache.org/jira/browse/FLINK-24634
 Project: Flink
  Issue Type: Technical Debt
  Components: Build System
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.15.0


Thee {{java11}} profile currently targets Java 11. This was useful because we 
saw that doing so reveals additional issues that are not detected when building 
for Java 8. The end goal was to ensure a smooth transition once we switch.

However, this has adverse effects on developer productivity.

If you happen to switch between Java versions (for example, because of separate 
environments, or because certain features require Java 8), then you can easily 
run into UnsupportedVersionErrors when attempting to use Java 8 with Java 11 
bytecode.

IntelliJ also picks up on this and automatically sets the language level to 11, 
which means that it will readily allow you to use Java 11 exclusive APIs that 
will fail on CI later on.

To remedy this I propose to split the profile.

The {{java11}} profile will pretty much stay as is, except that it is targeting 
java 8. The value proposition of this profile is being able to build Flink for 
Java 8 with Java 11.
A new explicitly-opt-in {{java11-target}} profile then sets the target version 
to Java 11, which we will use on CI. This profile will ensure that we can 
readily switch to Java 11 as the target in the future.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[Discuss] Planning Flink 1.15

2021-10-25 Thread Johannes Moser
Hi all,

As people have already started working on their 1.15. 
contribution, we'd like to start the discussion for 
the release setup.

- Release managers: As a team of three seems to have
worked perfectly fine, we'd like to suggest Till, Yun Gao
& Joe as the release managers for 1.15.

- Timeline: 1.14 was released at the end of September and
aiming for a 4 months release cycle including one months
of stabilisation would lead to a feature freeze date at the
end of December, which would make the European holiday
season a bit stressful. One option would have been to aim for early
December, but we decided to go for the 17th of January.
Such that we also have some buffer before the Chinese new
year.

- Bi-weekly sync: We'd also like to setup a bi-weekly sync again
starting from the 9th of November at 9am CET/4pm CST.

- Collecting features: As last time it would be helpful to have 
a rough overview of the efforts that will likely be included in
this release. We have created a wiki page [1] for collecting such
information. We'd like to kindly ask all committers to fill in the
page with features that they intend to work on.

Just copy pasting what we included into the planning email
for 1.14, because it still applies:

- Stability of master: This has been an issue during the 1.13 & 1.14
feature freeze phase and it is still going on. We encourage every
committer to not merge PRs through the Github button, but do this
manually, with caution for the commits merged after the CI being
triggered. It would be appreciated to always build the project before
merging to master.

- Documentation: Please try to see documentation as an integrated
part of the engineering process and don't push it to the feature
freeze phase or even after. You might even think about going
documentation first. We, as the Flink community, are adding great
stuff, that is pushing the limits of streaming data processors, with
every release. We should also make this stuff usable for our users by
documenting it well.

- Promotion of 1.15: What applies to documentation also applies
to all the activity around the release. We encourage every contributor
to also think about, plan and prepare activities like blog posts and talk,
that will promote and spread the release once it is done.

Please let us know what you think.

Thank you~
Till, Yun Gao & Joe

[1] https://cwiki.apache.org/confluence/display/FLINK/1.15+Release


[jira] [Created] (FLINK-24635) Clean up flink-examples

2021-10-25 Thread Seth Wiesman (Jira)
Seth Wiesman created FLINK-24635:


 Summary: Clean up flink-examples
 Key: FLINK-24635
 URL: https://issues.apache.org/jira/browse/FLINK-24635
 Project: Flink
  Issue Type: Improvement
  Components: Examples
Reporter: Seth Wiesman
Assignee: Seth Wiesman
 Fix For: 1.15.0


The Flink DataStream examples have a number of deprecation warnings. These are 
some of the first things new users look at and we should be showing best 
practices. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-24636) Move cluster deletion operation cache into ResourceManager

2021-10-25 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-24636:


 Summary: Move cluster deletion operation cache into ResourceManager
 Key: FLINK-24636
 URL: https://issues.apache.org/jira/browse/FLINK-24636
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination, Runtime / REST
Reporter: Chesnay Schepler
 Fix For: 1.15.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-24637) Move savepoint disposal operation cache into Dispatcher

2021-10-25 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-24637:


 Summary: Move savepoint disposal operation cache into Dispatcher
 Key: FLINK-24637
 URL: https://issues.apache.org/jira/browse/FLINK-24637
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination, Runtime / REST
Reporter: Chesnay Schepler
 Fix For: 1.15.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-24638) Unknown variable or type "org.apache.flink.table.utils.DateTimeUtils"

2021-10-25 Thread Sergey Nuyanzin (Jira)
Sergey Nuyanzin created FLINK-24638:
---

 Summary: Unknown variable or type 
"org.apache.flink.table.utils.DateTimeUtils"
 Key: FLINK-24638
 URL: https://issues.apache.org/jira/browse/FLINK-24638
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Reporter: Sergey Nuyanzin


The problem is not constantly reproduced

however it is reproduced almost every 2-nd query via FlinkSqlClient containing 
{{current_timestamp}}, {{current_date}}
e.g.
{code:sql}
select extract(millennium from current_date);
select extract(millennium from current_timestamp);
select floor(current_timestamp to day);
select ceil(current_timestamp to day);
{code}

trace 
{noformat}
[ERROR] Could not execute SQL statement. Reason:
org.codehaus.commons.compiler.CompileException: Line 59, Column 16: Unknown 
variable or type "org.apache.flink.table.utils.DateTimeUtils"
at 
org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12211)
at org.codehaus.janino.UnitCompiler.getType2(UnitCompiler.java:6860)
at org.codehaus.janino.UnitCompiler.access$13600(UnitCompiler.java:215)
at 
org.codehaus.janino.UnitCompiler$22.visitPackage(UnitCompiler.java:6472)
at 
org.codehaus.janino.UnitCompiler$22.visitPackage(UnitCompiler.java:6469)
at org.codehaus.janino.Java$Package.accept(Java.java:4248)
at org.codehaus.janino.UnitCompiler.getType(UnitCompiler.java:6469)
at org.codehaus.janino.UnitCompiler.getType2(UnitCompiler.java:6855)
at org.codehaus.janino.UnitCompiler.access$14200(UnitCompiler.java:215)
at 
org.codehaus.janino.UnitCompiler$22$2$1.visitAmbiguousName(UnitCompiler.java:6497)
at 
org.codehaus.janino.UnitCompiler$22$2$1.visitAmbiguousName(UnitCompiler.java:6494)
at org.codehaus.janino.Java$AmbiguousName.accept(Java.java:4224)
at 
org.codehaus.janino.UnitCompiler$22$2.visitLvalue(UnitCompiler.java:6494)
at 
org.codehaus.janino.UnitCompiler$22$2.visitLvalue(UnitCompiler.java:6490)
at org.codehaus.janino.Java$Lvalue.accept(Java.java:4148)
at 
org.codehaus.janino.UnitCompiler$22.visitRvalue(UnitCompiler.java:6490)
at 
org.codehaus.janino.UnitCompiler$22.visitRvalue(UnitCompiler.java:6469)
at org.codehaus.janino.Java$Rvalue.accept(Java.java:4116)
at org.codehaus.janino.UnitCompiler.getType(UnitCompiler.java:6469)
at org.codehaus.janino.UnitCompiler.findIMethod(UnitCompiler.java:9026)
at org.codehaus.janino.UnitCompiler.getType2(UnitCompiler.java:7106)
at org.codehaus.janino.UnitCompiler.access$15800(UnitCompiler.java:215)
at 
org.codehaus.janino.UnitCompiler$22$2.visitMethodInvocation(UnitCompiler.java:6517)
at 
org.codehaus.janino.UnitCompiler$22$2.visitMethodInvocation(UnitCompiler.java:6490)
at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:5073)
at 
org.codehaus.janino.UnitCompiler$22.visitRvalue(UnitCompiler.java:6490)
at 
org.codehaus.janino.UnitCompiler$22.visitRvalue(UnitCompiler.java:6469)
at org.codehaus.janino.Java$Rvalue.accept(Java.java:4116)
at org.codehaus.janino.UnitCompiler.getType(UnitCompiler.java:6469)
at 
org.codehaus.janino.UnitCompiler.findMostSpecificIInvocable(UnitCompiler.java:9237)
at org.codehaus.janino.UnitCompiler.findIMethod(UnitCompiler.java:9123)
at org.codehaus.janino.UnitCompiler.findIMethod(UnitCompiler.java:9025)
at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:5062)
at org.codehaus.janino.UnitCompiler.access$9100(UnitCompiler.java:215)
at 
org.codehaus.janino.UnitCompiler$16.visitMethodInvocation(UnitCompiler.java:4423)
at 
org.codehaus.janino.UnitCompiler$16.visitMethodInvocation(UnitCompiler.java:4396)
at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:5073)
at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:4396)
at 
org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:5662)
at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:3792)
at org.codehaus.janino.UnitCompiler.access$6100(UnitCompiler.java:215)
at 
org.codehaus.janino.UnitCompiler$13.visitAssignment(UnitCompiler.java:3754)
at 
org.codehaus.janino.UnitCompiler$13.visitAssignment(UnitCompiler.java:3734)
at org.codehaus.janino.Java$Assignment.accept(Java.java:4477)
at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3734)
at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2360)
at org.codehaus.janino.UnitCompiler.access$1800(UnitCompiler.java:215)
at 
org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1494)
at 
org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1487)
at org.co

[jira] [Created] (FLINK-24639) Improve assignment of Kinesis shards to subtasks

2021-10-25 Thread John Karp (Jira)
John Karp created FLINK-24639:
-

 Summary: Improve assignment of Kinesis shards to subtasks
 Key: FLINK-24639
 URL: https://issues.apache.org/jira/browse/FLINK-24639
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Kinesis
Reporter: John Karp
 Attachments: Screen Shot 2021-10-25 at 5.11.29 PM.png

The default assigner of Kinesis shards to Flink subtasks simply takes the 
hashCode() of the StreamShardHandle (an integer), which is then interpreted 
modulo the number of subtasks. This basically does random-ish but deterministic 
assignment of shards to subtasks.

However, this can lead to some subtasks getting several times the number of 
shards as others. To prevent those unlucky subtasks from being overloaded, the 
overall Flink cluster must be over-provisioned, so that each subtask has more 
headroom to handle any over-assignment of shards.

We can do better here, at least if Kinesis is being used in a common way. Each 
record sent to a Kinesis stream has a particular hash key in the range [0, 
2^128), which is used to determine which shard gets used; each shard has an 
assigned range of hash keys. By default Kinesis assigns each shard equal 
fractions of the hash-key space. And when you scale up or down using 
UpdateShardCount, it tries to maintain equal fractions to the extent possible. 
Also, a shard's hash key range is fixed at creation; it can only be replaced by 
new shards, which split it, or merge it.

Given the above, one way to assign shards to subtasks is to do a linear mapping 
from hash-keys in range [0, 2^128) to subtask indices in [0, nSubtasks). For 
the 'coordinate' of each shard we pick the middle of the shard's range, to 
ensure neither subtask 0 nor subtask (n-1) is assigned too many.

However this will probably not be helpful for Kinesis users that don't randomly 
assign partition or hash keys to Kinesis records. The existing assigner is 
probably better for them.

I ran a simulation of the default shard assigner versus some alternatives, 
using shards taken from one of our Kinesis streams; results attached. The 
measure I used I call 'overload' and it measures how many times more shards the 
most heavily-loaded subtask has than is necessary. (DEFAULT is the default 
assigner, Sha256 is similar to the default but with a stronger hashing 
function, ShardId extracts the shard number from the shardId and uses that, and 
HashKey is the one I describe above.)

Patch is at: 
https://github.com/apache/flink/compare/master...john-karp:uniform-shard-assigner?expand=1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[NOTICE] flink-streaming-java no longer depends on Scala and lost it's suffix

2021-10-25 Thread Chesnay Schepler

Hello all,

I just wanted to inform everyone that I just merged 
https://issues.apache.org/jira/browse/FLINK-24018, removing the 
transitive Scala dependencies from flink-streaming-java. This also means 
that the module lost it's Scala suffix, along with a lot of other modules.


Please keep this mind this for a few days when adding Flink dependencies 
or new modules; it is quite likely that something has changed w.r.t. the 
Scala suffixes.


For completeness sake, these are the module that lost the suffix:

|flink-batch-sql-test flink-cep flink-cli-test flink-clients 
flink-connector-elasticsearch-base flink-connector-elasticsearch5 
flink-connector-elasticsearch6 flink-connector-elasticsearch7 
flink-connector-gcp-pubsub flink-connector-hbase-1.4 
flink-connector-hbase-2.2 flink-connector-hbase-base 
flink-connector-jdbc flink-connector-kafka flink-connector-kinesis 
flink-connector-nifi flink-connector-pulsar flink-connector-rabbitmq 
flink-connector-testing flink-connector-twitter 
flink-connector-wikiedits flink-container 
flink-distributed-cache-via-blob-test flink-dstl-dfs flink-gelly 
flink-hadoop-bulk flink-kubernetes 
flink-parent-child-classloading-test-lib-package 
flink-parent-child-classloading-test-program flink-queryable-state-test 
flink-runtime-web flink-scala flink-sql-connector-elasticsearch6 
flink-sql-connector-elasticsearch7 flink-sql-connector-hbase-1.4 
flink-sql-connector-hbase-2.2 flink-sql-connector-kafka 
flink-sql-connector-kinesis flink-sql-connector-rabbitmq 
flink-state-processor-api flink-statebackend-rocksdb 
flink-streaming-java flink-streaming-kafka-test 
flink-streaming-kafka-test-base flink-streaming-kinesis-test 
flink-table-api-java-bridge flink-test-utils flink-walkthrough-common 
flink-yarn|


Re: [NOTICE] flink-streaming-java no longer depends on Scala and lost it's suffix

2021-10-25 Thread Chesnay Schepler

This time with proper formatting...

flink-batch-sql-test
flink-cep
flink-cli-test
flink-clients
flink-connector-elasticsearch-base
flink-connector-elasticsearch5
flink-connector-elasticsearch6
flink-connector-elasticsearch7
flink-connector-gcp-pubsub
flink-connector-hbase-1.4
flink-connector-hbase-2.2
flink-connector-hbase-base
flink-connector-jdbc
flink-connector-kafka
flink-connector-kinesis
flink-connector-nifi
flink-connector-pulsar
flink-connector-rabbitmq
flink-connector-testing
flink-connector-twitter
flink-connector-wikiedits
flink-container
flink-distributed-cache-via-blob-test
flink-dstl-dfs
flink-gelly
flink-hadoop-bulk
flink-kubernetes
flink-parent-child-classloading-test-lib-package
flink-parent-child-classloading-test-program
flink-queryable-state-test
flink-runtime-web
flink-scala
flink-sql-connector-elasticsearch6
flink-sql-connector-elasticsearch7
flink-sql-connector-hbase-1.4
flink-sql-connector-hbase-2.2
flink-sql-connector-kafka
flink-sql-connector-kinesis
flink-sql-connector-rabbitmq
flink-state-processor-api
flink-statebackend-rocksdb
flink-streaming-java
flink-streaming-kafka-test
flink-streaming-kafka-test-base
flink-streaming-kinesis-test
flink-table-api-java-bridge
flink-test-utils
flink-walkthrough-common
flink-yarn


On 26/10/2021 01:04, Chesnay Schepler wrote:

Hello all,

I just wanted to inform everyone that I just merged 
https://issues.apache.org/jira/browse/FLINK-24018, removing the 
transitive Scala dependencies from flink-streaming-java. This also 
means that the module lost it's Scala suffix, along with a lot of 
other modules.


Please keep this mind this for a few days when adding Flink 
dependencies or new modules; it is quite likely that something has 
changed w.r.t. the Scala suffixes.


For completeness sake, these are the module that lost the suffix:

|flink-batch-sql-test flink-cep flink-cli-test flink-clients 
flink-connector-elasticsearch-base flink-connector-elasticsearch5 
flink-connector-elasticsearch6 flink-connector-elasticsearch7 
flink-connector-gcp-pubsub flink-connector-hbase-1.4 
flink-connector-hbase-2.2 flink-connector-hbase-base 
flink-connector-jdbc flink-connector-kafka flink-connector-kinesis 
flink-connector-nifi flink-connector-pulsar flink-connector-rabbitmq 
flink-connector-testing flink-connector-twitter 
flink-connector-wikiedits flink-container 
flink-distributed-cache-via-blob-test flink-dstl-dfs flink-gelly 
flink-hadoop-bulk flink-kubernetes 
flink-parent-child-classloading-test-lib-package 
flink-parent-child-classloading-test-program 
flink-queryable-state-test flink-runtime-web flink-scala 
flink-sql-connector-elasticsearch6 flink-sql-connector-elasticsearch7 
flink-sql-connector-hbase-1.4 flink-sql-connector-hbase-2.2 
flink-sql-connector-kafka flink-sql-connector-kinesis 
flink-sql-connector-rabbitmq flink-state-processor-api 
flink-statebackend-rocksdb flink-streaming-java 
flink-streaming-kafka-test flink-streaming-kafka-test-base 
flink-streaming-kinesis-test flink-table-api-java-bridge 
flink-test-utils flink-walkthrough-common flink-yarn|






Re: [Discuss] Planning Flink 1.15

2021-10-25 Thread Israel Ekpo
Thanks for the update, Joe.

Looking forward to this release.

On Mon, Oct 25, 2021 at 11:13 AM Johannes Moser  wrote:

> Hi all,
>
> As people have already started working on their 1.15.
> contribution, we'd like to start the discussion for
> the release setup.
>
> - Release managers: As a team of three seems to have
> worked perfectly fine, we'd like to suggest Till, Yun Gao
> & Joe as the release managers for 1.15.
>
> - Timeline: 1.14 was released at the end of September and
> aiming for a 4 months release cycle including one months
> of stabilisation would lead to a feature freeze date at the
> end of December, which would make the European holiday
> season a bit stressful. One option would have been to aim for early
> December, but we decided to go for the 17th of January.
> Such that we also have some buffer before the Chinese new
> year.
>
> - Bi-weekly sync: We'd also like to setup a bi-weekly sync again
> starting from the 9th of November at 9am CET/4pm CST.
>
> - Collecting features: As last time it would be helpful to have
> a rough overview of the efforts that will likely be included in
> this release. We have created a wiki page [1] for collecting such
> information. We'd like to kindly ask all committers to fill in the
> page with features that they intend to work on.
>
> Just copy pasting what we included into the planning email
> for 1.14, because it still applies:
>
> - Stability of master: This has been an issue during the 1.13 & 1.14
> feature freeze phase and it is still going on. We encourage every
> committer to not merge PRs through the Github button, but do this
> manually, with caution for the commits merged after the CI being
> triggered. It would be appreciated to always build the project before
> merging to master.
>
> - Documentation: Please try to see documentation as an integrated
> part of the engineering process and don't push it to the feature
> freeze phase or even after. You might even think about going
> documentation first. We, as the Flink community, are adding great
> stuff, that is pushing the limits of streaming data processors, with
> every release. We should also make this stuff usable for our users by
> documenting it well.
>
> - Promotion of 1.15: What applies to documentation also applies
> to all the activity around the release. We encourage every contributor
> to also think about, plan and prepare activities like blog posts and talk,
> that will promote and spread the release once it is done.
>
> Please let us know what you think.
>
> Thank you~
> Till, Yun Gao & Joe
>
> [1] https://cwiki.apache.org/confluence/display/FLINK/1.15+Release
>


[jira] [Created] (FLINK-24640) CEIL, FLOOR built-in functions for Timestamp should respect DST

2021-10-25 Thread Sergey Nuyanzin (Jira)
Sergey Nuyanzin created FLINK-24640:
---

 Summary: CEIL, FLOOR built-in functions for Timestamp should 
respect DST
 Key: FLINK-24640
 URL: https://issues.apache.org/jira/browse/FLINK-24640
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Reporter: Sergey Nuyanzin


The problem is that if there is a date in DST time then 
{code:sql}
select floor(current_timestamp to year);
{code}
leads to result
{noformat}
2021-12-31 23:00:00.000
{noformat}
while expected is {{2022-01-01 00:00:00.000}}

same issue is with {{WEEK}}, {{QUARTER}} and {{MONTH}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)