[jira] [Created] (FLINK-19084) Remove deprecated methods in ExecutionConfig

2020-08-28 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-19084:


 Summary: Remove deprecated methods in ExecutionConfig
 Key: FLINK-19084
 URL: https://issues.apache.org/jira/browse/FLINK-19084
 Project: Flink
  Issue Type: Sub-task
  Components: API / Core
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz
 Fix For: 1.12.0


We should remove no-ops methods in ExecutionConfig.

- ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
- ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9) 

They are {{Public}}, however they became no-op operations, which can be argued 
already broke the stability guarantees.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-19085) Remove deprecated methods for writing CSV and Text files from DataStream

2020-08-28 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-19085:


 Summary: Remove deprecated methods for writing CSV and Text files 
from DataStream
 Key: FLINK-19085
 URL: https://issues.apache.org/jira/browse/FLINK-19085
 Project: Flink
  Issue Type: Sub-task
  Components: API / DataStream
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz
 Fix For: 1.12.0


We can remove long deprecated {{PublicEvolving}} methods:
- DataStream#writeAsText
- DataStream#writeAsCsv



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Removing deprecated methods from DataStream API

2020-08-28 Thread Yun Tang
I noticed that the ticket to remove deprecated DataStream#fold() [1] has been 
created but not yet reach an agreement or assigned.

Actually fold related function and state descriptions has been deprecated for 
long than 3 years, and I think it's okay to remove such kind of state now.

[1] https://issues.apache.org/jira/browse/FLINK-19035

Best,
Yun Tang

From: Aljoscha Krettek 
Sent: Thursday, August 27, 2020 16:45
To: dev@flink.apache.org 
Subject: Re: [DISCUSS] Removing deprecated methods from DataStream API

Did you consider DataStream.project() yet? In general I think we should
remove most of the relational-ish methods from DataStream. More
candidates in this set of methods would be the tuple index/expression
methods for aggregations like min/max etc...

Aljoscha

On 25.08.20 20:52, Konstantin Knauf wrote:
> I would argue that the guarantees of @Public methods that became
> ineffective were broken when they became ineffective (and were deprecated).
>
> - ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
> - ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>
> Removing these methods seems like the better of two evils to me as it shows
> users that they have been using no-ops for some time.
>
> On Thu, Aug 20, 2020 at 10:50 AM Stephan Ewen  wrote:
>
>> We have removed some public methods in the past, after a careful
>> deprecation period, if they were not well working any more.
>>
>> The sentiment I got from users is that careful cleanup is in fact
>> appreciated, otherwise things get confusing over time (the deprecated
>> methods cause noise in the API).
>> Still, we need to be very careful here.
>>
>> I would suggest to
>>- start with the non-public breaking methods
>>- remove fold() (very long deprecated)
>>- remove split() buggy
>>
>> Removing the env.socketStream() and env.fileStream() methods would
>> probably be good, too. They are very long deprecated and they don't work
>> well (with checkpoints) and the sources are the first thing a user needs to
>> understand when starting with Flink, so removing noise there is super
>> helpful.
>>
>>
>> On Thu, Aug 20, 2020 at 8:53 AM Dawid Wysakowicz 
>> wrote:
>>
>>> Hey Till,
>>>
>>> You've got a good point here. Removing some of the methods would cause
>>> breaking the stability guarantees. I do understand if we decide not to
>>> remove them for that reason, let me explain though why I am thinking it
>>> might make sense to remove them already. First of all I am a bit afraid it
>>> might take a long time before we arrive at the 2.0 version. We have not
>>> ever discussed that in the community. At the same time a lot of the methods
>>> already don't work or are buggy, and we do not fix them any more.
>>>
>>> Methods which removing would not break the Public guarantees:
>>>
>>> - ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>> - RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>> - ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>> - 
>>> StreamExecutionEnvironment#setNumberOfExecutionRetries/getNumberOfExecutionRetries
>>> (not the equivalent in the ExecutionConfig)
>>> - StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>> (deprecated in 1.5)
>>>
>>> Methods which removing would break the Public guarantees:
>>>
>>> which have no effect:
>>>
>>> - ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>> - ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>>
>>> which are buggy or discouraged and thus we do not support fixing them:
>>>
>>> - DataStream#split (deprecated in 1.8)
>>> - DataStream#fold and all related classes and methods such as
>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>> 1.3/1.4)
>>>
>>> The methods like:
>>>
>>> - 
>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...),
>>>
>>> - methods in (Connected)DataStream that specify keys as either
>>> indices or field names
>>> -
>>> ExecutionConfig#setNumberOfExecutionRetries/getNumberOfExecutionRetries
>>>
>>> should be working just fine and I feel the least eager to remove those.
>>>
>>> I'd suggest I will open PRs for removing the methods that will not cause
>>> breakage of the Public guarantees as the general feedback was rather
>>> positive. For the rest I do understand the resentment to do so and will not
>>> do it in the 1.x branch. Still I think it is valuable to have the
>>> discussion.
>>>
>>> Best,
>>>
>>> Dawid
>>>
>>>
>>> On 18/08/2020 09:26, Till Rohrmann wrote:
>>>
>>> Having looked at the proposed set of methods to remove I've noticed that
>>> some of them are actually annotated with @Public. According to our
>>> stability guarantees, only major releases (1.0, 2.0, etc.) can break APIs
>>> with this annotation. Hence, I believe that we cannot simply remove them
>

Re: [DISCUSS] Removing deprecated methods from DataStream API

2020-08-28 Thread Dawid Wysakowicz
@Aljoscha I have not thought about DataStream#project. I see your
argument and I do agree with you it makes sense to eventually remove
them. They are not deprecated yet though. Moreover I think we could
include deprecating it as part of the efforts of FLIP-131/FLIP-134. I
will add the argument there.

@Konstantin I do share your sentiment that the contract was broken when
they became ineffective. That's why I mentioned the fact they are
ineffective ;)

Let me give a brief summary how I understand the discussion so far.

I will start the effort of removing the non-breaking APIs:

  * StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
(deprecated in 1.5) (ALREADY REMOVED)
  * RuntimeContext#getAllAccumulators (deprecated in 0.10) (ALREADY
REMOVED)
  * ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
  * ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
  * 
StreamExecutionEnvironment#setNumberOfExecutionRetries/getNumberOfExecutionRetries
(not the equivalent in the ExecutionConfig
  * ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
(ineffective, thus already broke the guarantees)
  * ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
(ineffective, thus already broke the guarantees)
  * DataStream#writeAsCsv,writeAsText (deprecated in 1.10)

I will start a Vote thread for removing:

  * DataStream#split
  * XxxDataStream#fold

as there was quite a bit of positive feedback for doing so, but there
were also concerns.

For now (probably until 2.0), I'll leave the methods:

  * 
StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
  * methods in XxxDataStream that use field names and indices for
addressing projections/groupings
  * ExecutionConfig#setNumberOfExecutionRetries/getNumberOfExecutionRetries

BTW you can track the effort in FLINK-19033[1]

Best,

Dawid

[1] https://issues.apache.org/jira/browse/FLINK-19033


On 27/08/2020 10:45, Aljoscha Krettek wrote:
> Did you consider DataStream.project() yet? In general I think we
> should remove most of the relational-ish methods from DataStream. More
> candidates in this set of methods would be the tuple index/expression
> methods for aggregations like min/max etc...
>
> Aljoscha
>
> On 25.08.20 20:52, Konstantin Knauf wrote:
>> I would argue that the guarantees of @Public methods that became
>> ineffective were broken when they became ineffective (and were
>> deprecated).
>>
>>     - ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>     - ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in
>> 1.9)
>>
>> Removing these methods seems like the better of two evils to me as it
>> shows
>> users that they have been using no-ops for some time.
>>
>> On Thu, Aug 20, 2020 at 10:50 AM Stephan Ewen  wrote:
>>
>>> We have removed some public methods in the past, after a careful
>>> deprecation period, if they were not well working any more.
>>>
>>> The sentiment I got from users is that careful cleanup is in fact
>>> appreciated, otherwise things get confusing over time (the deprecated
>>> methods cause noise in the API).
>>> Still, we need to be very careful here.
>>>
>>> I would suggest to
>>>    - start with the non-public breaking methods
>>>    - remove fold() (very long deprecated)
>>>    - remove split() buggy
>>>
>>> Removing the env.socketStream() and env.fileStream() methods would
>>> probably be good, too. They are very long deprecated and they don't
>>> work
>>> well (with checkpoints) and the sources are the first thing a user
>>> needs to
>>> understand when starting with Flink, so removing noise there is super
>>> helpful.
>>>
>>>
>>> On Thu, Aug 20, 2020 at 8:53 AM Dawid Wysakowicz
>>> 
>>> wrote:
>>>
 Hey Till,

 You've got a good point here. Removing some of the methods would cause
 breaking the stability guarantees. I do understand if we decide not to
 remove them for that reason, let me explain though why I am
 thinking it
 might make sense to remove them already. First of all I am a bit
 afraid it
 might take a long time before we arrive at the 2.0 version. We have
 not
 ever discussed that in the community. At the same time a lot of the
 methods
 already don't work or are buggy, and we do not fix them any more.

 Methods which removing would not break the Public guarantees:

     - ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
     - RuntimeContext#getAllAccumulators (deprecated in 0.10)
     - ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
     -
 StreamExecutionEnvironment#setNumberOfExecutionRetries/getNumberOfExecutionRetries
     (not the equivalent in the ExecutionConfig)
     - StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
     (deprecated in 1.5)

 Methods which removing would break the Public guarantees:

 which have no 

Re: [DISCUSS] Removing deprecated methods from DataStream API

2020-08-28 Thread Dawid Wysakowicz
@Yun Yes, I created the ticket with target version 2.0.0, which was
agreed by deprecating the method ;) I will soon start a vote for
actually removing it in 1.12 and if we agree on that I will change the
target version.

On 28/08/2020 09:33, Yun Tang wrote:
> I noticed that the ticket to remove deprecated DataStream#fold() [1] has been 
> created but not yet reach an agreement or assigned.
>
> Actually fold related function and state descriptions has been deprecated for 
> long than 3 years, and I think it's okay to remove such kind of state now.
>
> [1] https://issues.apache.org/jira/browse/FLINK-19035
>
> Best,
> Yun Tang
> 
> From: Aljoscha Krettek 
> Sent: Thursday, August 27, 2020 16:45
> To: dev@flink.apache.org 
> Subject: Re: [DISCUSS] Removing deprecated methods from DataStream API
>
> Did you consider DataStream.project() yet? In general I think we should
> remove most of the relational-ish methods from DataStream. More
> candidates in this set of methods would be the tuple index/expression
> methods for aggregations like min/max etc...
>
> Aljoscha
>
> On 25.08.20 20:52, Konstantin Knauf wrote:
>> I would argue that the guarantees of @Public methods that became
>> ineffective were broken when they became ineffective (and were deprecated).
>>
>> - ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>> - ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>
>> Removing these methods seems like the better of two evils to me as it shows
>> users that they have been using no-ops for some time.
>>
>> On Thu, Aug 20, 2020 at 10:50 AM Stephan Ewen  wrote:
>>
>>> We have removed some public methods in the past, after a careful
>>> deprecation period, if they were not well working any more.
>>>
>>> The sentiment I got from users is that careful cleanup is in fact
>>> appreciated, otherwise things get confusing over time (the deprecated
>>> methods cause noise in the API).
>>> Still, we need to be very careful here.
>>>
>>> I would suggest to
>>>- start with the non-public breaking methods
>>>- remove fold() (very long deprecated)
>>>- remove split() buggy
>>>
>>> Removing the env.socketStream() and env.fileStream() methods would
>>> probably be good, too. They are very long deprecated and they don't work
>>> well (with checkpoints) and the sources are the first thing a user needs to
>>> understand when starting with Flink, so removing noise there is super
>>> helpful.
>>>
>>>
>>> On Thu, Aug 20, 2020 at 8:53 AM Dawid Wysakowicz 
>>> wrote:
>>>
 Hey Till,

 You've got a good point here. Removing some of the methods would cause
 breaking the stability guarantees. I do understand if we decide not to
 remove them for that reason, let me explain though why I am thinking it
 might make sense to remove them already. First of all I am a bit afraid it
 might take a long time before we arrive at the 2.0 version. We have not
 ever discussed that in the community. At the same time a lot of the methods
 already don't work or are buggy, and we do not fix them any more.

 Methods which removing would not break the Public guarantees:

 - ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
 - RuntimeContext#getAllAccumulators (deprecated in 0.10)
 - ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
 - 
 StreamExecutionEnvironment#setNumberOfExecutionRetries/getNumberOfExecutionRetries
 (not the equivalent in the ExecutionConfig)
 - StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
 (deprecated in 1.5)

 Methods which removing would break the Public guarantees:

 which have no effect:

 - ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
 - ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)

 which are buggy or discouraged and thus we do not support fixing them:

 - DataStream#split (deprecated in 1.8)
 - DataStream#fold and all related classes and methods such as
 FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
 1.3/1.4)

 The methods like:

 - 
 StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...),

 - methods in (Connected)DataStream that specify keys as either
 indices or field names
 -
 ExecutionConfig#setNumberOfExecutionRetries/getNumberOfExecutionRetries

 should be working just fine and I feel the least eager to remove those.

 I'd suggest I will open PRs for removing the methods that will not cause
 breakage of the Public guarantees as the general feedback was rather
 positive. For the rest I do understand the resentment to do so and will not
 do it in the 1.x branch. Still I think it is valuable to have the
 discussion.

[VOTE] Remove deprecated DataStream#fold and DataStream#split in 1.12

2020-08-28 Thread Dawid Wysakowicz
Hi all,

I would like to start a vote for removing deprecated, but
Public(Evolving) methods in the upcoming 1.12 release:

  * XxxDataStream#fold and all related classes (such as
FoldingDescriptor, FoldFunction, ...)
  * DataStream#split

This was discussed in
https://lists.apache.org/thread.html/rf37cd0e00e9adb917b7b75275af2370ec2f3970d17a4abd0db7ead31%40%3Cdev.flink.apache.org%3E

The vote will be open until 2nd September (72h), unless there is an
objection or not enough votes.

Best,

Dawid



signature.asc
Description: OpenPGP digital signature


[jira] [Created] (FLINK-19086) Performance regressin 2020-08-28 in globalWindow benchmark

2020-08-28 Thread Roman Khachatryan (Jira)
Roman Khachatryan created FLINK-19086:
-

 Summary: Performance regressin 2020-08-28 in globalWindow benchmark
 Key: FLINK-19086
 URL: https://issues.apache.org/jira/browse/FLINK-19086
 Project: Flink
  Issue Type: Bug
  Components: Benchmarks, Runtime / Task
Affects Versions: 1.12.0
Reporter: Roman Khachatryan
Assignee: Roman Khachatryan


[http://codespeed.dak8s.net:8000/timeline/?ben=globalWindow&env=2]

[http://codespeed.dak8s.net:8000/timeline/#/?exe=1,3&ben=tumblingWindow&env=2&revs=200&equid=off&quarts=on&extr=on]
 

The results started to decrease 2 days before decomissioning of an old jenkins 
node.

The other tests, however, were stable.

 

cc: [~pnowojski]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Remove deprecated DataStream#fold and DataStream#split in 1.12

2020-08-28 Thread Aljoscha Krettek

+1

Aljoscha

On 28.08.20 09:41, Dawid Wysakowicz wrote:

Hi all,

I would like to start a vote for removing deprecated, but
Public(Evolving) methods in the upcoming 1.12 release:

   * XxxDataStream#fold and all related classes (such as
 FoldingDescriptor, FoldFunction, ...)
   * DataStream#split

This was discussed in
https://lists.apache.org/thread.html/rf37cd0e00e9adb917b7b75275af2370ec2f3970d17a4abd0db7ead31%40%3Cdev.flink.apache.org%3E

The vote will be open until 2nd September (72h), unless there is an
objection or not enough votes.

Best,

Dawid






Re: [VOTE] Remove deprecated DataStream#fold and DataStream#split in 1.12

2020-08-28 Thread Paul Lam
+1 

Best,
Paul Lam

> 2020年8月28日 15:41,Dawid Wysakowicz  写道:
> 
> Hi all,
> 
> I would like to start a vote for removing deprecated, but Public(Evolving) 
> methods in the upcoming 1.12 release:
> 
> XxxDataStream#fold and all related classes (such as FoldingDescriptor, 
> FoldFunction, ...)
> DataStream#split
> This was discussed in 
> https://lists.apache.org/thread.html/rf37cd0e00e9adb917b7b75275af2370ec2f3970d17a4abd0db7ead31%40%3Cdev.flink.apache.org%3E
>  
> 
> The vote will be open until 2nd September (72h), unless there is an objection 
> or not enough votes.
> 
> Best,
> 
> Dawid
> 



Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

2020-08-28 Thread Kurt Young
A quick question, does network memory treated as managed memory now? Or in
the future?

Best,
Kurt


On Wed, Aug 26, 2020 at 5:32 PM Xintong Song  wrote:

> Hi devs,
>
> I'd like to bring the discussion over FLIP-141[1], which proposes how
> managed memory should be shared by various use cases within a slot. This is
> an extension to FLIP-53[2], where we assumed that RocksDB state backend and
> batch operators are the only use cases of managed memory for streaming and
> batch jobs respectively, which is no longer true with the introduction of
> Python UDFs.
>
> Please notice that we have not reached consensus between two different
> designs. The major part of this FLIP describes one of the candidates, while
> the alternative is discussed in the section "Rejected Alternatives". We are
> hoping to borrow intelligence from the community to help us resolve the
> disagreement.
>
> Any feedback would be appreciated.
>
> Thank you~
>
> Xintong Song
>
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility
>
> [2]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
>


Re: [DISCUSS] FLIP-138: Declarative Resource management

2020-08-28 Thread Chesnay Schepler

Maybe :)

Imagine a case where the producer and consumer have the same 
ResourceProfile, or at least one where the consumer requirements are 
less than the producer ones.
In this case, the scheduler can happily schedule consumers, because it 
knows it will get enough slots.


If the profiles are different, then the Scheduler _may_ wait 
numberOf(producer) slots; it _may_ also stick with the parallelism and 
schedule right away, in the worst case running the consumers in sequence.
In fact, for batch jobs there is probably(?) never a reason for the 
scheduler to _reduce_ the parallelism; it can always try to run things 
in sequence if it doesn't get enough slots.
Reducing the parallelism would just mean that you'd have to wait for 
more producers to finish.


The scope of this FLIP is just the protocol, without changes to the 
scheduler; in other words just changing how slots are acquired, but 
change nothing about the scheduling. That is tackled in a follow-up FLIP.


On 28/08/2020 07:34, Zhu Zhu wrote:

Thanks for the response!

>> The scheduler doesn't have to wait for one stage to finish
Does it mean we will declare resources and decide the parallelism for 
a stage which is partially
schedulable, i.e. when input data are ready just for part of the 
execution vertices?


>> This will get more complicated once we allow the scheduler to 
change the parallelism while the job is running
Agreed. Looks to me it's a problem for batch jobs only and can be 
avoided for streaming jobs.
Will this FLIP limit its scope to streaming jobs, and improvements for 
batch jobs are to be done later?


Thanks,
Zhu

Chesnay Schepler mailto:ches...@apache.org>> 
于2020年8月28日周五 上午2:27写道:


The scheduler doesn't have to wait for one stage to finish. It is
still aware that the upstream execution vertex has finished, and
can request/use slots accordingly to schedule the consumer.

This will get more complicated once we allow the scheduler to
change the parallelism while the job is running, for which we will
need some enhancements to the network stack to allow the producer
to run without knowing the consumer parallelism ahead of time. I'm
not too clear on the details, but we'll some form of keygroup-like
approach for sub partitions (maxParallelism and all that).


On 27/08/2020 20:05, Zhu Zhu wrote:

Thanks Chesnay&Till for proposing this improvement.
It's of good value to allow jobs to make best use of available
resources adaptively. Not
to mention it further supports reactive mode.
So big +1 for it.

I have a minor concern about possible regression in certain cases
due to the proposed
JobVertex-wise scheduling which replaces current
ExecutionVertex-wise scheduling.
In the proposal, looks to me it requires a stage to finish before
its consumer stage can be
scheduled. This limitation, however, does not exist in current
scheduler. In the case that there
exists a POINTWISE BLOCKING edge, the downstream execution region
can be scheduled
right after its connected upstream execution vertices finishes,
even before the whole upstream
stage finishes. This allows the region to be launched earlier and
make use of available resources.
Do we need to let the new scheduler retain this property?

Thanks,
Zhu

Xintong Song mailto:tonysong...@gmail.com>> 于2020年8月26日周三 下午6:59写道:

Thanks for the quick response.

*Job prioritization, Allocation IDs, Minimum resource
requirements, SlotManager Implementation Plan:* Sounds good
to me.

*FLIP-56*
Good point about the trade-off. I believe maximum resource
utilization and
quick deployment are desired in different scenarios. E.g., a
long running
streaming job deserves some deployment latency to improve the
resource
utilization, which benefits the entire lifecycle of the job.
On the other
hand, short batch queries may prefer quick deployment,
otherwise the time
for resource allocation might significantly increase the
response time.
It would be good enough for me to bring these questions to
attention.
Nothing that I'm aware of should block this FLIP.

Thank you~

Xintong Song



On Wed, Aug 26, 2020 at 5:14 PM Chesnay Schepler
mailto:ches...@apache.org>> wrote:

> Thank you Xintong for your questions!
> Job prioritization
> Yes, the job which declares it's initial requirements first
is prioritized.
> This is very much for simplicity; for example this avoids
the nasty case
> where all jobs get some resources, but none get enough to
actually run the
> job.
> Minimum resource requirements
>
> My bad; at some point we want to allow the JobMaster to
declare a range of
> resources it could 

Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

2020-08-28 Thread Xintong Song
>
> A quick question, does network memory treated as managed memory now? Or in
> the future?
>
No, network memory is independent from managed memory ATM. And I'm not
aware of any plan to combine these two.

Any insights there?

Thank you~

Xintong Song



On Fri, Aug 28, 2020 at 4:35 PM Kurt Young  wrote:

> A quick question, does network memory treated as managed memory now? Or in
> the future?
>
> Best,
> Kurt
>
>
> On Wed, Aug 26, 2020 at 5:32 PM Xintong Song 
> wrote:
>
> > Hi devs,
> >
> > I'd like to bring the discussion over FLIP-141[1], which proposes how
> > managed memory should be shared by various use cases within a slot. This
> is
> > an extension to FLIP-53[2], where we assumed that RocksDB state backend
> and
> > batch operators are the only use cases of managed memory for streaming
> and
> > batch jobs respectively, which is no longer true with the introduction of
> > Python UDFs.
> >
> > Please notice that we have not reached consensus between two different
> > designs. The major part of this FLIP describes one of the candidates,
> while
> > the alternative is discussed in the section "Rejected Alternatives". We
> are
> > hoping to borrow intelligence from the community to help us resolve the
> > disagreement.
> >
> > Any feedback would be appreciated.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility
> >
> > [2]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> >
>


Re: [DISCUSS] FLIP-138: Declarative Resource management

2020-08-28 Thread Zhu Zhu
Thanks for the explanation @Chesnay Schepler  .

Yes, for batch jobs it can be safe to schedule downstream vertices if there
are enough slots in the pool, even if these slots are still in use at that
moment.
And the job can still progress even if the vertices stick to the original
parallelism.

Looks to me several decision makings can be different for streaming and
batch jobs.
Looking forward to the follow-up FLIP on the lazy ExecutionGraph
construction!

Thanks,
Zhu

Chesnay Schepler  于2020年8月28日周五 下午4:35写道:

> Maybe :)
>
> Imagine a case where the producer and consumer have the same
> ResourceProfile, or at least one where the consumer requirements are less
> than the producer ones.
> In this case, the scheduler can happily schedule consumers, because it
> knows it will get enough slots.
>
> If the profiles are different, then the Scheduler _may_ wait
> numberOf(producer) slots; it _may_ also stick with the parallelism and
> schedule right away, in the worst case running the consumers in sequence.
> In fact, for batch jobs there is probably(?) never a reason for the
> scheduler to _reduce_ the parallelism; it can always try to run things in
> sequence if it doesn't get enough slots.
> Reducing the parallelism would just mean that you'd have to wait for more
> producers to finish.
>
> The scope of this FLIP is just the protocol, without changes to the
> scheduler; in other words just changing how slots are acquired, but change
> nothing about the scheduling. That is tackled in a follow-up FLIP.
>
> On 28/08/2020 07:34, Zhu Zhu wrote:
>
> Thanks for the response!
>
> >> The scheduler doesn't have to wait for one stage to finish
> Does it mean we will declare resources and decide the parallelism for a
> stage which is partially
> schedulable, i.e. when input data are ready just for part of the execution
> vertices?
>
> >> This will get more complicated once we allow the scheduler to change
> the parallelism while the job is running
> Agreed. Looks to me it's a problem for batch jobs only and can be avoided
> for streaming jobs.
> Will this FLIP limit its scope to streaming jobs, and improvements for
> batch jobs are to be done later?
>
> Thanks,
> Zhu
>
> Chesnay Schepler  于2020年8月28日周五 上午2:27写道:
>
>> The scheduler doesn't have to wait for one stage to finish. It is still
>> aware that the upstream execution vertex has finished, and can request/use
>> slots accordingly to schedule the consumer.
>>
>> This will get more complicated once we allow the scheduler to change the
>> parallelism while the job is running, for which we will need some
>> enhancements to the network stack to allow the producer to run without
>> knowing the consumer parallelism ahead of time. I'm not too clear on the
>> details, but we'll some form of keygroup-like approach for sub partitions
>> (maxParallelism and all that).
>>
>> On 27/08/2020 20:05, Zhu Zhu wrote:
>>
>> Thanks Chesnay&Till for proposing this improvement.
>> It's of good value to allow jobs to make best use of available resources
>> adaptively. Not
>> to mention it further supports reactive mode.
>> So big +1 for it.
>>
>> I have a minor concern about possible regression in certain cases due to
>> the proposed
>> JobVertex-wise scheduling which replaces current ExecutionVertex-wise
>> scheduling.
>> In the proposal, looks to me it requires a stage to finish before its
>> consumer stage can be
>> scheduled. This limitation, however, does not exist in current scheduler.
>> In the case that there
>> exists a POINTWISE BLOCKING edge, the downstream execution region can be
>> scheduled
>> right after its connected upstream execution vertices finishes, even
>> before the whole upstream
>> stage finishes. This allows the region to be launched earlier and make
>> use of available resources.
>> Do we need to let the new scheduler retain this property?
>>
>> Thanks,
>> Zhu
>>
>> Xintong Song  于2020年8月26日周三 下午6:59写道:
>>
>>> Thanks for the quick response.
>>>
>>> *Job prioritization, Allocation IDs, Minimum resource
>>> requirements, SlotManager Implementation Plan:* Sounds good to me.
>>>
>>> *FLIP-56*
>>> Good point about the trade-off. I believe maximum resource utilization
>>> and
>>> quick deployment are desired in different scenarios. E.g., a long running
>>> streaming job deserves some deployment latency to improve the resource
>>> utilization, which benefits the entire lifecycle of the job. On the other
>>> hand, short batch queries may prefer quick deployment, otherwise the time
>>> for resource allocation might significantly increase the response time.
>>> It would be good enough for me to bring these questions to attention.
>>> Nothing that I'm aware of should block this FLIP.
>>>
>>> Thank you~
>>>
>>> Xintong Song
>>>
>>>
>>>
>>> On Wed, Aug 26, 2020 at 5:14 PM Chesnay Schepler 
>>> wrote:
>>>
>>> > Thank you Xintong for your questions!
>>> > Job prioritization
>>> > Yes, the job which declares it's initial requirements first is
>>> prioritized.
>>> > This is very muc

Re: [ANNOUNCE] Introducing the GSoD 2020 Participants.

2020-08-28 Thread David Anderson
I'm really pleased we were able to attract such well-qualified
participants, and look forward to working with and learning from you both.

David

On Thu, Aug 27, 2020 at 8:45 PM Arvid Heise  wrote:

> Welcome. I'm grateful for having you two on board. I'll guess that the
> documentation will be top-notched in 3 months judging from your background.
>
> I hope we can learn from the improvements and also apply them to the other
> parts of the documentation (or avoid the fixed issues in the future).
>
> On Thu, Aug 27, 2020 at 6:05 AM Jark Wu  wrote:
>
> > Welcome Kartik and Muhammad! Thanks in advance for helping improve Flink
> > documentation.
> >
> > Best,
> > Jark
> >
> > On Thu, 27 Aug 2020 at 03:59, Till Rohrmann 
> wrote:
> >
> > > Welcome Muhammad and Kartik! Thanks a lot for helping us with improving
> > > Flink's documentation.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Wed, Aug 26, 2020 at 7:32 PM Konstantin Knauf 
> > > wrote:
> > >
> > > > Welcome Kartik & Muhammad! Looking very much forward to your
> > > contributions
> > > > :)
> > > >
> > > > On Wed, Aug 26, 2020 at 5:52 PM Kartik Khare 
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > > It's a great opportunity to get to work with you guys. I have
> always
> > > > > admired Flink's performance and simplicity and have been looking
> > > forward
> > > > to
> > > > > contribute more.
> > > > >
> > > > > Looking forward to exciting next 3 months.
> > > > >
> > > > > Regards,
> > > > > Kartik
> > > > >
> > > > > On Wed, 26 Aug 2020, 14:42 Marta Paes Moreira, <
> ma...@ververica.com>
> > > > > wrote:
> > > > >
> > > > > > Hi, Everyone!
> > > > > >
> > > > > > I'd like to officially welcome the applicants that were selected
> to
> > > > work
> > > > > > with the Flink community for this year's Google Season of Docs
> > (GSoD)
> > > > > [1]: *Kartik
> > > > > > Khare* and *Muhammad Haseeb Asif*!
> > > > > >
> > > > > >- Kartik [2] is a software engineer at Walmart Labs and a
> > regular
> > > > > >contributor to multiple Apache projects. He is also a prolific
> > > > writer
> > > > > on
> > > > > >Medium and has previously published on the Flink blog. Last
> > year,
> > > he
> > > > > >contributed to Apache Airflow as part of GSoD and he's
> currently
> > > > > revamping
> > > > > >the Apache Pinot documentation.
> > > > > >
> > > > > >
> > > > > >- Muhammad [3] is a dual degree master student at KTH and TU
> > > Berlin,
> > > > > >focusing on distributed systems and data intensive processing
> > (in
> > > > > >particular, performance optimization of state backends). He
> > writes
> > > > > >frequently about Flink on Medium and you can catch him and his
> > > > > colleague
> > > > > >Sruthi this Friday at Beam Summit [4]!
> > > > > >
> > > > > > They will be working to improve the Table API/SQL documentation
> > over
> > > a
> > > > > > 3-month period, with the support of Aljoscha and Seth as mentors.
> > > > > >
> > > > > > Please give them a warm welcome to the Flink developer community!
> > > > > >
> > > > > > Marta
> > > > > >
> > > > > > [1]
> https://developers.google.com/season-of-docs/docs/participants
> > > > > > [2] https://github.com/KKcorps
> > > > > > [3] https://www.linkedin.com/in/haseebasif/
> > > > > > [4] https://2020.beamsummit.org/sessions/nexmark-beam-flinkndb/
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Konstantin Knauf
> > > >
> > > > https://twitter.com/snntrable
> > > >
> > > > https://github.com/knaufk
> > > >
> > >
> >
>
>
> --
>
> Arvid Heise | Senior Java Developer
>
> 
>
> Follow us @VervericaData
>
> --
>
> Join Flink Forward  - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
> Ververica GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> (Toni) Cheng
>


Re: [DISCUSS] Remove Kafka 0.10.x connector (and possibly 0.11.x)

2020-08-28 Thread Aljoscha Krettek
Yes, that should be the process. But I'd try it in a testing environment 
before doing it in the production environment.


Aljoscha

On 27.08.20 11:42, Paul Lam wrote:

Hi,

I think it’s okay, given that we can either migrate to the universal connector 
or still use the compatible 0.10/0.11 connector of 1.11 release as Chesnay 
mentioned when upgrading to 1.12.

IIUC, the migration process to the universal connector would be (please correct 
me if I’m wrong):
1. Stop the job with a savepoint, committing the offset to Kafka brokers.
2. Modify user code, migrate to he universal connector, and change the source 
operator id to discard the old connector states.
3. Start the job with the savepoint, and read Kafka from group offsets.

Best,
Paul Lam


2020年8月27日 16:27,Aljoscha Krettek  写道:

@Konstantin: Yes, I'm talking about dropping those modules. We don't have any special 
code for supporting Kafka 0.10/0.11 in the "modern" connector, that comes from 
the Kafka Consumer/Producer code we're using.

@Paul: The modern Kafka connector works with Kafka brokers as far back as 0.10, 
would that be enough or do you still think we should have the actual Kafka 0.10 
Consumer code in Flink as well.

Best,
Aljoscha

On 25.08.20 23:15, Chesnay Schepler wrote:

+1 to remove both the 1.10 and 1.11 connectors.
The connectors have not been actively developed for some time. They are 
basically just sitting around causing noise by causing test instabilities and 
eating CI time.
It would  also allow us to really simplify the module structure of the Kafka 
connectors.
Users may continue to use the 1.11 version of the connectors with future Flink 
versions, and we may even provide critical bug fixes in a 1.11 bugfix release 
(albeit unlikely).
While ultimately this is a separate topic I would also be in favor of removing 
any migration paths we have from 0.11 to the universal connector;
as these are already present in 1.11 users may migrate to the universal 
connector before jumping to Flink 1.12+.
On 25/08/2020 18:49, Konstantin Knauf wrote:

Hi Aljoscha,

I am assuming you're asking about dropping the flink-connector-kafka-0.10/0.11 
modules, right? Or are you talking about removing support for Kafka 0.10/0.11 
from the universal connector?

I am in favor of removing flink-connector-kafka-0.10/0.11 in the next release. 
These modules would still be available in Flink 1.11- as a reference, and could 
be used with Flink 1.12+ with small or no modifications. To my knowledge, you 
also use the universal Kafka connector with 0.10 brokers, but there might be a 
performance penalty if I remember correctly. In general, I find it important to 
continuously reduce baggage that accumulates over time and this seems like a 
good opportunity.

Best,

Konstantin

On Tue, Aug 25, 2020 at 4:59 AM Paul Lam mailto:paullin3...@gmail.com> 
>> wrote:

 Hi Aljoscha,

 I'm lightly leaning towards keeping the 0.10 connector, for Kafka
 0.10 still has a steady user base in my observation.

 But if we drop 0.10 connector, can we ensure the users would be
 able to smoothly migrate to 0.11 connector/universal connector?

 If I remember correctly, the universal connector is compatible
 with 0.10 brokers, but I want to double check that.

 Best,
 Paul Lam


 2020年8月24日 22:46,Aljoscha Krettek mailto:aljos...@apache.org>
 >> 写道:

 Hi all,

 this thought came up on FLINK-17260 [1] but I think it would be a
 good idea in general. The issue reminded us that Kafka didn't
 have an idempotent/fault-tolerant Producer before Kafka 0.11.0.
 By now we have had the "modern" Kafka connector that roughly
 follows new Kafka releases for a while and this one supports
 Kafka cluster versions as far back as 0.10.2.0 (I believe).

 What are your thoughts on removing support for older Kafka
 versions? And yes, I know that we had multiple discussions like
 this in the past but I'm trying to gauge the current sentiment.

 I'm cross-posting to the user-ml since this is important for both
 users and developers.

 Best,
 Aljoscha

 [1] https://issues.apache.org/jira/browse/FLINK-17260




--

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk







Re: [VOTE] Remove deprecated DataStream#fold and DataStream#split in 1.12

2020-08-28 Thread David Anderson
+1

David

On Fri, Aug 28, 2020 at 9:41 AM Dawid Wysakowicz 
wrote:

> Hi all,
>
> I would like to start a vote for removing deprecated, but Public(Evolving)
> methods in the upcoming 1.12 release:
>
>- XxxDataStream#fold and all related classes (such as
>FoldingDescriptor, FoldFunction, ...)
>- DataStream#split
>
> This was discussed in
> https://lists.apache.org/thread.html/rf37cd0e00e9adb917b7b75275af2370ec2f3970d17a4abd0db7ead31%40%3Cdev.flink.apache.org%3E
>
> The vote will be open until 2nd September (72h), unless there is an
> objection or not enough votes.
>
> Best,
>
> Dawid
>


Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-28 Thread Dian Fu
Thanks you all! It's my honor and pleasure to work with you in such a great 
community.

Regards,
Dian

> 在 2020年8月28日,下午2:40,Till Rohrmann  写道:
> 
> Congrats, Dian!
> 
> Cheers,
> Till
> 
> On Fri, Aug 28, 2020 at 8:33 AM Wei Zhong  wrote:
> 
>> Congratulations Dian!
>> 
>>> 在 2020年8月28日,14:29,Jingsong Li  写道:
>>> 
>>> Congratulations , Dian!
>>> 
>>> Best, Jingsong
>>> 
>>> On Fri, Aug 28, 2020 at 11:06 AM Walter Peng > > wrote:
>>> congrats!
>>> 
>>> Yun Tang wrote:
 Congratulations , Dian!
>>> 
>>> 
>>> --
>>> Best, Jingsong Lee
>> 
>> 



[jira] [Created] (FLINK-19087) ReaultPartitionWriter should not expose subpartition but only subpartition-readers

2020-08-28 Thread Stephan Ewen (Jira)
Stephan Ewen created FLINK-19087:


 Summary: ReaultPartitionWriter should not expose subpartition but 
only subpartition-readers
 Key: FLINK-19087
 URL: https://issues.apache.org/jira/browse/FLINK-19087
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Network
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.12.0


The {{ResultPartitionWiter}} currently gives arbitrary access to the 
sub-partitions.

These subpartitions may not always exist directly, such as in a sort based 
shuffle.
Necessary is only the access to a reader over a sub-partition's data (the 
ResultSubpartitionView).

In the spirit of minimal scope of knowledge, the methods should be scoped to 
return readers, not the more general subpartitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re:Re: Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-28 Thread Haibo Sun
Congratulations Dian !



Best,
Haibo

At 2020-08-27 18:03:38, "Zhijiang"  wrote:
>Congrats, Dian!
>
>
>--
>From:Yun Gao 
>Send Time:2020年8月27日(星期四) 17:44
>To:dev ; Dian Fu ; user 
>; user-zh 
>Subject:Re: Re: [ANNOUNCE] New PMC member: Dian Fu
>
>Congratulations Dian !
>
> Best
> Yun
>
>
>--
>Sender:Marta Paes Moreira
>Date:2020/08/27 17:42:34
>Recipient:Yuan Mei
>Cc:Xingbo Huang; jincheng sun; 
>dev; Dian Fu; 
>user; user-zh
>Theme:Re: [ANNOUNCE] New PMC member: Dian Fu
>
>Congrats, Dian!
>On Thu, Aug 27, 2020 at 11:39 AM Yuan Mei  wrote:
>
>Congrats!
>On Thu, Aug 27, 2020 at 5:38 PM Xingbo Huang  wrote:
>
>Congratulations Dian!
>
>Best,
>Xingbo
>jincheng sun  于2020年8月27日周四 下午5:24写道:
>
>Hi all,
>
>On behalf of the Flink PMC, I'm happy to announce that Dian Fu is now part of 
>the Apache Flink Project Management Committee (PMC).
>
>Dian Fu has been very active on PyFlink component, working on various 
>important features, such as the Python UDF and Pandas integration, and keeps 
>checking and voting for our releases, and also has successfully produced two 
>releases(1.9.3&1.11.1) as RM, currently working as RM to push forward the 
>release of Flink 1.12.
>
>Please join me in congratulating Dian Fu for becoming a Flink PMC Member!
>
>Best,
>Jincheng(on behalf of the Flink PMC)
>


Re: [DISCUSS] FLIP-138: Declarative Resource management

2020-08-28 Thread Till Rohrmann
Thanks for creating this FLIP @Chesnay and the good input @Xintong and @Zhu
Zhu.

Let me try to add some comments concerning your questions:

# FLIP-56

I think there is nothing fundamentally contradicting FLIP-56 in the FLIP
for declarative resource management. As Chesnay said, we have to keep the
AllocationID around as long as we have the old scheduler implementation.
Once it is replaced, we can think about using the SlotID instead of
AllocationIDs for identifying allocated slots. For dynamic slots we can
keep the special meaning of a SlotID with a negative index. In the future
we might think about making this encoding a bit more explicit by sending a
richer slot request object and reporting the actual SlotID back to the RM.

For the question of resource utilization vs. deployment latency I believe
that this will be a question of requirements and preferences as you've said
Xintong. I can see that we will have different strategies to fulfill the
different needs.

# Offer/free slots between JM/TM

You are right Xintong that the existing slot protocol was developed with
the assumption in mind that the RM and JM can run in separate processes and
that a failure of the RM should only affect the JM in the sense that it
cannot ask for more resources. I believe that one could simplify things a
bit under the assumption that the RM and JM are always colocated in the
same process. However, the discussion whether to change it or not should
indeed be a separate one.

Changing the slot protocol to a declarative resource management should
already solve the first problem you have described because we won't ask for
new slots in case of a failover but simply keep the same resource
requirements declared and let the RM make sure that we will receive at
least this amount of slots.

If releasing a slot should lead to allocating new resources because
decreasing the resource requirement declaration takes longer than releasing
the slot on the TM, then we could apply what Chesnay said. By waiting on
the confirmation of the resource requirement decrease and then freeing the
slot on the TM gives you effectively the same behaviour as if the freeing
of the slot would be done by the RM.

I am not entirely sure whether allocating the slots and receiving the slot
offers through the RM will allow us to get rid of the pending slot state on
the RM side. If the RM needs to communicate with the TM and we want to have
a reconciliation protocol between these components, then I think we would
have to solve the exact same problem of currently waiting on the TM for
confirming that a slot has been allocated.

# Implications for the scheduling

The FLIP does not fully cover the changes for the scheduler and mainly
drafts the rough idea. For the batch scheduling, I believe that we have a
couple degrees of freedom in how to do things. In the scenario you
described, one could choose a simple strategy where we wait for all
producers to stop before deciding on the parallelism of the consumer and
scheduling the respective tasks (even though they have POINTWISE BLOCKING
edges). Or we can try to be smart and say if we get at least one slot that
we can run the consumers with the same parallelism as the producers it just
might be that we have to run them one after another in a single slot. One
advantage of not directly schedule the first consumer when the first
producer is finished is that one might schedule the consumer stage with a
higher parallelism because one might acquire more resources a bit later.
But I would see this as different execution strategies which have different
properties.

Cheers,
Till

On Fri, Aug 28, 2020 at 11:21 AM Zhu Zhu  wrote:

> Thanks for the explanation @Chesnay Schepler  .
>
> Yes, for batch jobs it can be safe to schedule downstream vertices if
> there
> are enough slots in the pool, even if these slots are still in use at that
> moment.
> And the job can still progress even if the vertices stick to the original
> parallelism.
>
> Looks to me several decision makings can be different for streaming and
> batch jobs.
> Looking forward to the follow-up FLIP on the lazy ExecutionGraph
> construction!
>
> Thanks,
> Zhu
>
> Chesnay Schepler  于2020年8月28日周五 下午4:35写道:
>
>> Maybe :)
>>
>> Imagine a case where the producer and consumer have the same
>> ResourceProfile, or at least one where the consumer requirements are less
>> than the producer ones.
>> In this case, the scheduler can happily schedule consumers, because it
>> knows it will get enough slots.
>>
>> If the profiles are different, then the Scheduler _may_ wait
>> numberOf(producer) slots; it _may_ also stick with the parallelism and
>> schedule right away, in the worst case running the consumers in sequence.
>> In fact, for batch jobs there is probably(?) never a reason for the
>> scheduler to _reduce_ the parallelism; it can always try to run things in
>> sequence if it doesn't get enough slots.
>> Reducing the parallelism would just mean that you'd have to wait for more
>

[jira] [Created] (FLINK-19088) flink sql 1.11 HbaseTableSource Supports FilterPushDown

2020-08-28 Thread sijun.huang (Jira)
sijun.huang created FLINK-19088:
---

 Summary: flink sql 1.11 HbaseTableSource Supports FilterPushDown
 Key: FLINK-19088
 URL: https://issues.apache.org/jira/browse/FLINK-19088
 Project: Flink
  Issue Type: New Feature
Affects Versions: 1.11.1
 Environment: flink sql 1.11
Reporter: sijun.huang


Hi,

In flink sql 1.11, if we create hbase table via hbase connector through hive 
catalog, when we query it, the flink will do a full table scan on the hbase 
table, even we specify the row key filter.

for detailed info, you may look at below post 

[http://apache-flink.147419.n8.nabble.com/flink-sql-1-11-hbase-td6652.html]

so I strongly recommend flink sql 1.11 HbaseTableSource support FilterPushDown 
to avoid full table scan on hbase table.

Cheers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Re: Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-28 Thread Thomas Weise
Congratulations!

On Fri, Aug 28, 2020 at 4:51 AM Haibo Sun  wrote:

> Congratulations Dian !
>
>
>
> Best,
> Haibo
>
> At 2020-08-27 18:03:38, "Zhijiang" 
> wrote:
> >Congrats, Dian!
> >
> >
> >--
> >From:Yun Gao 
> >Send Time:2020年8月27日(星期四) 17:44
> >To:dev ; Dian Fu ; user <
> u...@flink.apache.org>; user-zh 
> >Subject:Re: Re: [ANNOUNCE] New PMC member: Dian Fu
> >
> >Congratulations Dian !
> >
> > Best
> > Yun
> >
> >
> >--
> >Sender:Marta Paes Moreira
> >Date:2020/08/27 17:42:34
> >Recipient:Yuan Mei
> >Cc:Xingbo Huang; jincheng sun<
> sunjincheng...@gmail.com>; dev; Dian Fu<
> dian0511...@gmail.com>; user; user-zh<
> user...@flink.apache.org>
> >Theme:Re: [ANNOUNCE] New PMC member: Dian Fu
> >
> >Congrats, Dian!
> >On Thu, Aug 27, 2020 at 11:39 AM Yuan Mei  wrote:
> >
> >Congrats!
> >On Thu, Aug 27, 2020 at 5:38 PM Xingbo Huang  wrote:
> >
> >Congratulations Dian!
> >
> >Best,
> >Xingbo
> >jincheng sun  于2020年8月27日周四 下午5:24写道:
> >
> >Hi all,
> >
> >On behalf of the Flink PMC, I'm happy to announce that Dian Fu is now
> part of the Apache Flink Project Management Committee (PMC).
> >
> >Dian Fu has been very active on PyFlink component, working on various
> important features, such as the Python UDF and Pandas integration, and
> keeps checking and voting for our releases, and also has successfully
> produced two releases(1.9.3&1.11.1) as RM, currently working as RM to push
> forward the release of Flink 1.12.
> >
> >Please join me in congratulating Dian Fu for becoming a Flink PMC Member!
> >
> >Best,
> >Jincheng(on behalf of the Flink PMC)
> >
>


Re: Re: Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-28 Thread Yu Li
Congratulations, Dian!

Best Regards,
Yu


On Sat, 29 Aug 2020 at 02:10, Thomas Weise  wrote:

> Congratulations!
>
> On Fri, Aug 28, 2020 at 4:51 AM Haibo Sun  wrote:
>
> > Congratulations Dian !
> >
> >
> >
> > Best,
> > Haibo
> >
> > At 2020-08-27 18:03:38, "Zhijiang" 
> > wrote:
> > >Congrats, Dian!
> > >
> > >
> > >--
> > >From:Yun Gao 
> > >Send Time:2020年8月27日(星期四) 17:44
> > >To:dev ; Dian Fu ; user <
> > u...@flink.apache.org>; user-zh 
> > >Subject:Re: Re: [ANNOUNCE] New PMC member: Dian Fu
> > >
> > >Congratulations Dian !
> > >
> > > Best
> > > Yun
> > >
> > >
> > >--
> > >Sender:Marta Paes Moreira
> > >Date:2020/08/27 17:42:34
> > >Recipient:Yuan Mei
> > >Cc:Xingbo Huang; jincheng sun<
> > sunjincheng...@gmail.com>; dev; Dian Fu<
> > dian0511...@gmail.com>; user; user-zh<
> > user...@flink.apache.org>
> > >Theme:Re: [ANNOUNCE] New PMC member: Dian Fu
> > >
> > >Congrats, Dian!
> > >On Thu, Aug 27, 2020 at 11:39 AM Yuan Mei 
> wrote:
> > >
> > >Congrats!
> > >On Thu, Aug 27, 2020 at 5:38 PM Xingbo Huang 
> wrote:
> > >
> > >Congratulations Dian!
> > >
> > >Best,
> > >Xingbo
> > >jincheng sun  于2020年8月27日周四 下午5:24写道:
> > >
> > >Hi all,
> > >
> > >On behalf of the Flink PMC, I'm happy to announce that Dian Fu is now
> > part of the Apache Flink Project Management Committee (PMC).
> > >
> > >Dian Fu has been very active on PyFlink component, working on various
> > important features, such as the Python UDF and Pandas integration, and
> > keeps checking and voting for our releases, and also has successfully
> > produced two releases(1.9.3&1.11.1) as RM, currently working as RM to
> push
> > forward the release of Flink 1.12.
> > >
> > >Please join me in congratulating Dian Fu for becoming a Flink PMC
> Member!
> > >
> > >Best,
> > >Jincheng(on behalf of the Flink PMC)
> > >
> >
>


[jira] [Created] (FLINK-19089) Replace ReentrantLock with ReentrantReadWriteLock in ClosableBlockingQueue

2020-08-28 Thread dugenkui (Jira)
dugenkui created FLINK-19089:


 Summary: Replace ReentrantLock with ReentrantReadWriteLock in 
ClosableBlockingQueue
 Key: FLINK-19089
 URL: https://issues.apache.org/jira/browse/FLINK-19089
 Project: Flink
  Issue Type: Improvement
Reporter: dugenkui


1. Replace ReentrantLock with ReentrantReadWriteLock to improve concurrency;

2. Use signal instead of signalAll to reduce the thread scheduling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)