[jira] [Created] (FLINK-30288) Use visitor to convert predicate for orc

2022-12-04 Thread Shammon (Jira)
Shammon created FLINK-30288:
---

 Summary: Use visitor to convert predicate for orc
 Key: FLINK-30288
 URL: https://issues.apache.org/jira/browse/FLINK-30288
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Affects Versions: table-store-0.3.0
Reporter: Shammon


Use `PredicateVisitor` to convert `Predicate` in table store for orc



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30289) RateLimitedSourceReader uses wrong signal for checkpoint rate-limiting

2022-12-04 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-30289:


 Summary: RateLimitedSourceReader uses wrong signal for checkpoint 
rate-limiting
 Key: FLINK-30289
 URL: https://issues.apache.org/jira/browse/FLINK-30289
 Project: Flink
  Issue Type: Bug
  Components: API / Core
Affects Versions: 1.17.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.17.0


The checkpoint rate limiter is notified when the checkpoint is complete, but 
since this signal comes at some point in the future (or not at all) it can 
result in no records being emitted for a checkpoint, or more records than 
expected being emitted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30290) IteratorSourceReaderBase should report END_OF_INPUT sooner

2022-12-04 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-30290:


 Summary: IteratorSourceReaderBase should report END_OF_INPUT sooner
 Key: FLINK-30290
 URL: https://issues.apache.org/jira/browse/FLINK-30290
 Project: Flink
  Issue Type: Technical Debt
  Components: API / Core
Affects Versions: 1.17.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.17.0


The iterator source reader base does not report end_of_input when the last 
value was emitted, but instead requires an additional call to pollNext to be 
made.
This is fine functionality-wise, and allowed by the the source reader api 
contracts, but it's not intuitive behavior and leaks into tests for the datagen 
source.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30291) Integrate flink-connector-aws into Flink docs

2022-12-04 Thread Danny Cranmer (Jira)
Danny Cranmer created FLINK-30291:
-

 Summary: Integrate flink-connector-aws into Flink docs
 Key: FLINK-30291
 URL: https://issues.apache.org/jira/browse/FLINK-30291
 Project: Flink
  Issue Type: Technical Debt
  Components: Connectors / AWS, Documentation
Reporter: Danny Cranmer
 Fix For: 1.17.0, 1.16.1


Update the docs render to integrate {{{}flink-connector-aws{}}}.

Add a new shortcode to handle rendering the SQL connector correctly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30292) Better support for conversion between DataType and TypeInformation

2022-12-04 Thread Yunfeng Zhou (Jira)
Yunfeng Zhou created FLINK-30292:


 Summary: Better support for conversion between DataType and 
TypeInformation
 Key: FLINK-30292
 URL: https://issues.apache.org/jira/browse/FLINK-30292
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Affects Versions: 1.15.3
Reporter: Yunfeng Zhou


In Flink 1.15, we have the following ways to convert a DataType to a 
TypeInformation. Each of them has some disadvantages.

* `TypeConversions.fromDataTypeToLegacyInfo`
It might lead to precision losses in face of some data types like timestamp.
It has been deprecated.
* `ExternalTypeInfo.of`
It cannot be used to get detailed type information like `RowTypeInfo`
It might bring some serialization overhead.

Given that the ways mentioned above are both not perfect,  Flink SQL should 
provide a better API to support DataType-TypeInformation conversions, and thus 
better support Table-DataStream conversions.
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30293) Create an enumerator for static (batch)

2022-12-04 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-30293:


 Summary: Create an enumerator for static (batch)
 Key: FLINK-30293
 URL: https://issues.apache.org/jira/browse/FLINK-30293
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Reporter: Jingsong Lee
 Fix For: table-store-0.3.0


In FLINK-30207, we have created enumerator for continuous.
We should have an enumerator for static (batch).
For example, for the current read-compacted, time traveling may specify the 
commit time to read snapshots in the future.
I think these capabilities need to be in the core, but should they be in scan? 
(It seems that it should not)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-277: Native GlueCatalog Support in Flink

2022-12-04 Thread yuxia
Hi, Samrat.
I have seen some users are asking for GlueCatalog support[1], it's really 
exciting that you're driving it. 
After a quick look of this Flip, I have some comments:

1: I noticed there's a YAML part in the section of "Using the Catalog", what do 
you mean by that? Do you mean how to use glue catalog in sql client? If so, 
just for your information, it's not supported to use yaml envrioment file in 
sql client[2].

2: Seems there's a typo in "Design#views" part, it contains "listTables" which 
I think shouldn't be contained. Also, I'm curious about how to list views using 
Glue API. Is there an on-hand api to list views directly or we need to list the 
tables and then filter the views using the table-kind?

3: In "Flink Glue DataType Mapping" part, CharType is mapped to String. It 
seems the char's size will lose, is it possible to have a better mapping which 
won't loss the size of char type?

4: About the "Flink CatalogFunction mapping with Glue Function" part, how do we 
map the function language in Flink's CatalogFunction.



[1] https://lists.apache.org/thread/pdd780wl4f26p447fohvm9osky2r9fhh
[2] https://issues.apache.org/jira/browse/FLINK-22540

Best regards,
Yuxia

- 原始邮件 -
发件人: "Samrat Deb" 
收件人: "dev" 
抄送: "prabhujose gates" 
发送时间: 星期六, 2022年 12 月 03日 下午 12:29:16
主题: [DISCUSS] FLIP-277: Native GlueCatalog Support in Flink

Hi everyone,

I would like to open a discussion[1] on providing GlueCatalog support
in Flink.
Currently, Flink offers 3 major types of catalog[2]. Out of which only
HiveCatalog is a persistent catalog backed by Hive Metastore. We would like
to introduce GlueCatalog in Flink offering another option for users which
will be persistent in nature. Aws Glue data catalog is a centralized data
catalog in AWS cloud that provides integrations with many different
connectors[3]. Flink GlueCatalog can use the features provided by glue and
create strong integration with other services in the cloud.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink

[2]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/catalogs/

[3]
https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#data-catalog-intro

[4] https://issues.apache.org/jira/browse/FLINK-29549

Bests
Samrat


Re: [VOTE] FLIP-273: Improve Catalog API to Support ALTER TABLE syntax

2022-12-04 Thread Jark Wu
+1 (binding)

Best,
Jark

On Fri, 2 Dec 2022 at 10:11, Paul Lam  wrote:

> +1 (non-binding)
>
> Best,
> Paul Lam
>
> > 2022年12月2日 09:17,yuxia  写道:
> >
> > +1 (non-binding)
> >
> > Best regards,
> > Yuxia
> >
> > - 原始邮件 -
> > 发件人: "Yaroslav Tkachenko" 
> > 收件人: "dev" 
> > 发送时间: 星期五, 2022年 12 月 02日 上午 12:27:24
> > 主题: Re: [VOTE] FLIP-273: Improve Catalog API to Support ALTER TABLE
> syntax
> >
> > +1 (non-binding).
> >
> > Looking forward to it!
> >
> > On Thu, Dec 1, 2022 at 5:06 AM Dong Lin  wrote:
> >
> >> +1 (binding)
> >>
> >> Thanks for the FLIP!
> >>
> >> On Thu, Dec 1, 2022 at 12:20 PM Shengkai Fang 
> wrote:
> >>
> >>> Hi All,
> >>>
> >>> Thanks for all the feedback so far. Based on the discussion[1] we seem
> >>> to have a consensus, so I would like to start a vote on FLIP-273.
> >>>
> >>> The vote will last for at least 72 hours (Dec 5th at 13:00 GMT,
> >>> excluding weekend days) unless there is an objection or insufficient
> >> votes.
> >>>
> >>> Best,
> >>> Shengkai
> >>>
> >>> [1]
> >>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-273%3A+Improve+the+Catalog+API+to+Support+ALTER+TABLE+syntax
> >>> [2] https://lists.apache.org/thread/2v4kh2bpzvk049zdxb687q7o1pcmnnnw
> >>>
> >>
>
>


Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

2022-12-04 Thread yu zelin
Hi, Shammon

Thanks for your feedback. I think it’s good to support jdbc-sdk. However, 
it's not supported in the gateway side yet. In my opinion, this FLIP is more
concerned with the SQL Client. How about put “supporting jdbc-sdk” in 
‘Future Work’? We can discuss how to implement it in another thread.

Best,
Yu Zelin
> 2022年12月2日 18:12,Shammon FY  写道:
> 
> Hi zelin
> 
> Thanks for driving this discussion.
> 
> I notice that the sql-client will interact with sql-gateway by `REST
> Client` in the `Executor` in the FLIP, how about introducing jdbc-sdk for
> sql-gateway?
> 
> Then the sql-client can connect the gateway with jdbc-sdk, on the other
> hand, the other applications and tools such as jmeter can use the jdbc-sdk
> to connect sql-gateway too.
> 
> Best,
> Shammon
> 
> 
> On Fri, Dec 2, 2022 at 4:10 PM yu zelin  wrote:
> 
>> Hi Jim,
>> 
>> Thanks for your feedback!
>> 
>>> Should this configuration be mentioned in the FLIP?
>> 
>> Sure.
>> 
>>> some way for the server to be able to limit the number of requests it
>> receives.
>> I’m sorry that this FLIP is dedicated in implementing the Remote mode, so
>> we
>> didn't consider much about this. I think the option is enough currently.
>> I will add
>> the improvement suggestions to the ‘Future Work’.
>> 
>>> I wonder if two other options are possible
>> 
>> To forward the raw format to gateway and then to client is possible. The
>> raw
>> results from sink is in ‘CollectResultIterator#bufferedResult’. First, we
>> can find
>> a way to get this result without wrapping it. Second, constructing a
>> ‘InternalTypeInfo’.
>> We can construct it using the schema information (data’s logical type).
>> After
>> construction, we can get the ’TypeSerializer’ to deserialize the raw
>> result.
>> 
>> 
>> 
>> 
>>> 2022年12月1日 04:54,Jim Hughes  写道:
>>> 
>>> Hi Yu,
>>> 
>>> Thanks for moving my comments to this thread!  Also, thank you for
>>> answering my questions; it is helping me understand the SQL Gateway
>>> better.
>>> 
>>> 5.
 Our idea is to introduce a new session option (like
>>> 'sql-client.result.fetch-interval') to control
>>> the fetching requests sending frequency. What do you think?
>>> 
>>> Should this configuration be mentioned in the FLIP?
>>> 
>>> One slight concern I have with having 'sql-client.result.fetch-interval'
>> as
>>> a session configuration is that users could set it low and cause the
>> client
>>> to send a large volume of requests to the SQL gateway.
>>> 
>>> Generally, I'd like to see some way for the server to be able to limit
>> the
>>> number of requests it receives.  If that really needs to be done by a
>> proxy
>>> in front of the SQL gateway, that is fine as well.  (To be clear, I don't
>>> think my concern here should be blocking in any way.)
>>> 
>>> 7.
 What is the serialization lifecycle for results?
>>> 
>>> I wonder if two other options are possible:
>>> 3) Could the Gateway just forward the result byte array?  (Or does the
>>> Gateway need to deserialize the response in order to understand it for
>> some
>>> reason?)
>>> 4) Could the JobManager prepare the results in JSON?  (Or similarly could
>>> the Client read the format which the JobManager sends?)
>>> 
>>> Thanks again!
>>> 
>>> Cheers,
>>> 
>>> Jim
>>> 
>>> On Wed, Nov 30, 2022 at 9:40 AM yu zelin  wrote:
>>> 
 Hi, all
 
 Thanks Jim’s questions below. Here I’d like to reply to them.
 
> 1. For the Client Parser, is it going to work with the extended syntax
> from the Flink Table Store?
> 
> 2. Relatedly, what will happen if an older Client tries to handle
 syntax
> that a newer service supports?  (Suppose I use a 1.17 client with a
 1.18
> Gateway/system which has a new keyword.  Is there anything we should
>> be
> designing for upfront?)
> 
> 3. How will client and server version mismatches be handled?  Will a
> single gateway be able to support multiple endpoint versions?
> 4. How are commands which change a session handled?  Are those sent
>> via
> an ExecuteStatementRequest?
> 
> 5. The remote POC uses polling for getting back status and getting
>> back
> results.  Would it be possible to switch to web sockets or some other
> mechanism to avoid polling?  If polling is used for both, the polling
> frequency should be different between local and remote configurations.
> 
> 6. What does this sentence mean?  "The reason why we didn't get the
>> sql
> type in client side is because it's hard for the lightweight
 client-level
> parser to recognize some sql type  sql, such as query with CTE.  "
> 
> 7. What is the serialization lifecycle for results?  It makes sense to
> have some control over whether the gateway returns results as SQL or
 JSON.
> I'd love to see a way to avoid needing to serialize and deserialize
 results
> on the SQL Gateway if possible.  I'm still new enough to the project
 that
> I'm not sure if t

Patch to support Parquet schema evolution

2022-12-04 Thread sunshun18
Hi there,


I find an null-value issue when using Flink to read parquet files with multi 
versions of schema (V1->V2->V3->..->Vn).
Assuming there are two fileds in given parquet schema as below, and filed F2 
only exist in version 2.


Version1: F1
Version2: F1, F2


Currently the value of filed F2 will be empty when reading data from parquet 
file using schema version2.
I explore the implementation, and find Flink use a collection named 
`unknownFieldsIndices` to track the nonexistent fields, applied to all parquet 
files under given path.


I draft a patch to fix this issue with unit test.


https://issues.apache.org/jira/browse/FLINK-29527
https://github.com/apache/flink/pull/21149


As these PR is pending for a long time, I hope any commitor can help review it 
and provide any feedback if possible.


Thanks!
Shun

Re: Need right

2022-12-04 Thread Martijn Visser
Hi,

There is no need for additional permissions: you can start working on a
Jira issue of your liking (feel free to ping me to get it assigned to you)
and open up a PR.

Thanks and looking forward to your contribution!

Best regards,

Martijn

On Mon, Dec 5, 2022 at 6:59 AM Stan1005 <532338...@qq.com.invalid> wrote:

> Hi, I want to contribute to Apache Flink. Would you please give me the
> contributor permission? My JIRA ID is StarBoy1005.


Re: Patch to support Parquet schema evolution

2022-12-04 Thread yuxia
Hi, Shun. 
Thanks for the contribution.  I'll have a look first and then find some 
committers help review & merge.

Best regards,
Yuxia

- 原始邮件 -
发件人: "sunshun18" 
收件人: "dev" 
发送时间: 星期一, 2022年 12 月 05日 上午 11:54:38
主题: Patch to support Parquet schema evolution

Hi there,


I find an null-value issue when using Flink to read parquet files with multi 
versions of schema (V1->V2->V3->..->Vn).
Assuming there are two fileds in given parquet schema as below, and filed F2 
only exist in version 2.


Version1: F1
Version2: F1, F2


Currently the value of filed F2 will be empty when reading data from parquet 
file using schema version2.
I explore the implementation, and find Flink use a collection named 
`unknownFieldsIndices` to track the nonexistent fields, applied to all parquet 
files under given path.


I draft a patch to fix this issue with unit test.


https://issues.apache.org/jira/browse/FLINK-29527
https://github.com/apache/flink/pull/21149


As these PR is pending for a long time, I hope any commitor can help review it 
and provide any feedback if possible.


Thanks!
Shun


[jira] [Created] (FLINK-30294) Change table property key 'log.scan' to 'startup.mode' and add a default startup mode in Table Store

2022-12-04 Thread Caizhi Weng (Jira)
Caizhi Weng created FLINK-30294:
---

 Summary: Change table property key 'log.scan' to 'startup.mode' 
and add a default startup mode in Table Store
 Key: FLINK-30294
 URL: https://issues.apache.org/jira/browse/FLINK-30294
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Affects Versions: table-store-0.3.0
Reporter: Caizhi Weng
Assignee: Caizhi Weng


We're introducing time-travel reading of Table Store for batch jobs. However 
this reading mode is quite similar to the "from-timestamp" startup mode for 
streaming jobs, just that "from-timestamp" streaming jobs only consume 
incremental data but not history data.

We can support startup mode for both batch and streaming jobs. For batch jobs, 
"from-timestamp" startup mode will produce all records from the last snapshot 
before the specified timestamp. For streaming jobs the behavior doesn't change.

Previously, in order to use "from-timestamp" startup mode, users will have to 
specify "log.scan" and also "log.scan.timestamp-millis", which is a little 
inconvenient. We can introduce a "default" startup mode and its behavior will 
base on the execution environment and other configurations. In this way, to use 
"from-timestamp" startup mode, it is enough for users to specify just 
"startup.timestamp-millis".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)