date:20200203

Hi Jark,

Thanks involving, yes, it's hard to understand to add isBounded on the
source.
I recommend adding only to sink at present, because sink has upstream. Its
upstream is either bounded or unbounded.

Hi all,

Let me summarize with your suggestions.

public interface TableSourceFactory extends TableFactory {

   ..

   /**
* Creates and configures a {@link TableSource} based on the given
{@link Context}.
*
* @param context context of this table source.
* @return the configured table source.
*/
   default TableSource createTableSource(Context context) {
  ObjectIdentifier tableIdentifier = context.getTableIdentifier();
  return createTableSource(
new ObjectPath(tableIdentifier.getDatabaseName(),
tableIdentifier.getObjectName()),
context.getTable());
   }

   /**
* Context of table source creation. Contains table information and
environment information.
*/
   interface Context {

  /**
   * @return full identifier of the given {@link CatalogTable}.
   */
  ObjectIdentifier getTableIdentifier();

  /**
   * @return table {@link CatalogTable} instance.
   */
  CatalogTable getTable();

  /**
   * @return readable config of this table environment.
   */
  ReadableConfig getTableConfig();
   }
}

public interface TableSinkFactory extends TableFactory {

   ..

   /**
* Creates and configures a {@link TableSink} based on the given
{@link Context}.
*
* @param context context of this table sink.
* @return the configured table sink.
*/
   default TableSink createTableSink(Context context) {
  ObjectIdentifier tableIdentifier = context.getTableIdentifier();
  return createTableSink(
new ObjectPath(tableIdentifier.getDatabaseName(),
tableIdentifier.getObjectName()),
context.getTable());
   }

   /**
* Context of table sink creation. Contains table information and
environment information.
*/
   interface Context {

  /**
   * @return full identifier of the given {@link CatalogTable}.
   */
  ObjectIdentifier getTableIdentifier();

  /**
   * @return table {@link CatalogTable} instance.
   */
  CatalogTable getTable();

  /**
   * @return readable config of this table environment.
   */
  ReadableConfig getTableConfig();

  /**
   * @return Input whether or not it is bounded.
   */
  boolean isBounded();
   }
}

If there is no objection, I will start a vote thread. (if necessary, I can
also edit a FLIP).

Best,
Jingsong Lee

On Thu, Jan 16, 2020 at 7:56 PM Jingsong Li  wrote:

> Thanks Bowen and Timo for involving.
>
> Hi Bowen,
>
> > 1. is it better to have explicit APIs like "createBatchTableSource(...)"
> I think it is better to keep one method, since in [1], we have reached one
> in DataStream layer to maintain a single API in "env.source". I think it is
> good to not split batch and stream, And our TableSource/TableSink are the
> same class for both batch and streaming too.
>
> > 2. I'm not sure of the benefits to have a CatalogTableContext class.
> As Timo said, We may have more parameters to add in the future, take a
> look to "AbstractRichFunction.RuntimeContext", It's added little by little.
>
> Hi Timo,
>
> Your suggestion about Context looks good to me.
> "TablePath" used in Hive for updating the catalog information of this
> table. Yes, "ObjectIdentifier" looks better than "ObjectPath".
>
> > Can we postpone the change of TableValidators?
> Yes, ConfigOption validation looks good to me. It seems that you have been
> thinking about this for a long time. It's very good. Looking forward to the
> promotion of FLIP-54.
>
> [1]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-27-Refactor-Source-Interface-td24952i80.html#a36692
>
> Best,
> Jingsong Lee
>
> On Thu, Jan 16, 2020 at 6:01 PM Timo Walther  wrote:
>
>> Hi Jingsong,
>>
>> +1 for adding a context in the source and sink factories. A context
>> class also allows for future modifications without touching the
>> TableFactory interface again.
>>
>> How about:
>>
>> interface TableSourceFactory {
>>  interface Context {
>> // ...
>>  }
>> }
>>
>> Because I find the name `CatalogTableContext` confusing and we can bound
>> the interface to the factory class itself as an inner interface.
>>
>> Readable access to configuration sounds also right to me. Can we remove
>> the `ObjectPath getTablePath()` method? I don't see a reason why a
>> factory should know the path. And if so, it should be an
>> `ObjectIdentifier` instead to also know about the catalog we are using.
>>
>> The `isStreamingMode()` should be renamed to `isBounded()` because we
>> would like to use terminology around boundedness rather than
>> streaming/batch.
>>
>> @Bowen: We are in the process of unifying the APIs and thus explicitly
>> avoid specialized methods in the future.
>>
>> Can we postpone the change of Tab

Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

2020-02-03 Thread jincheng sun

Thank you for pushing forward @Hequn Cheng  !

Hi  @Becket Qin  , Do you have any concerns on this ?

Best,
Jincheng

Hequn Cheng  于2020年2月3日周一 下午2:09写道：

> Hi everyone,
>
> Thanks for the feedback. As there are no objections, I've opened a JIRA
> issue(FLINK-15847[1]) to address this issue.
> The implementation details can be discussed in the issue or in the
> following PR.
>
> Best,
> Hequn
>
> [1] https://issues.apache.org/jira/browse/FLINK-15847
>
> On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng  wrote:
>
> > Hi Jincheng,
> >
> > Thanks a lot for your feedback!
> > Yes, I agree with you. There are cases that multi jars need to be
> > uploaded. I will prepare another discussion later. Maybe with a simple
> > design doc.
> >
> > Best, Hequn
> >
> > On Wed, Jan 8, 2020 at 3:06 PM jincheng sun 
> > wrote:
> >
> >> Thanks for bring up this discussion Hequn!
> >>
> >> +1 for include `flink-ml-api` and `flink-ml-lib` in opt.
> >>
> >> BTW: I think would be great if bring up a discussion for upload multiple
> >> Jars at the same time. as PyFlink JOB also can have the benefit if we do
> >> that improvement.
> >>
> >> Best,
> >> Jincheng
> >>
> >>
> >> Hequn Cheng  于2020年1月8日周三 上午11:50写道：
> >>
> >> > Hi everyone,
> >> >
> >> > FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI which moves
> >> Flink
> >> > ML a step further. Base on it, users can develop their ML jobs and
> more
> >> and
> >> > more machine learning platforms are providing ML services.
> >> >
> >> > However, the problem now is the jars of flink-ml-api and flink-ml-lib
> >> are
> >> > only exist on maven repo. Whenever users want to submit ML jobs, they
> >> can
> >> > only depend on the ml modules and package a fat jar. This would be
> >> > inconvenient especially for the machine learning platforms on which
> >> nearly
> >> > all jobs depend on Flink ML modules and have to package a fat jar.
> >> >
> >> > Given this, it would be better to include jars of flink-ml-api and
> >> > flink-ml-lib in the `opt` folder, so that users can directly use the
> >> jars
> >> > with the binary release. For example, users can move the jars into the
> >> > `lib` folder or use -j to upload the jars. (Currently, -j only support
> >> > upload one jar. Supporting multi jars for -j can be discussed in
> another
> >> > discussion.)
> >> >
> >> > Putting the jars in the `opt` folder instead of the `lib` folder is
> >> because
> >> > currently, the ml jars are still optional for the Flink project by
> >> default.
> >> >
> >> > What do you think? Welcome any feedback!
> >> >
> >> > Best,
> >> >
> >> > Hequn
> >> >
> >> > [1]
> >> >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
> >> >
> >>
> >
>

[jira] [Created] (FLINK-15861) Change TableEnvironmentImpl to StreamTableaEnvironmentImpl in python blink batch mode

2020-02-03 Thread Huang Xingbo (Jira)

Huang Xingbo created FLINK-15861:


 Summary: Change TableEnvironmentImpl to 
StreamTableaEnvironmentImpl in python blink batch mode 
 Key: FLINK-15861
 URL: https://issues.apache.org/jira/browse/FLINK-15861
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Huang Xingbo
 Fix For: 1.10.0, 1.9.3


We need to change TableEnvironmentImpl to StreamTableaEnvironmentImpl in python 
blink batch mode otherwise we can't register TableFunction and 
AggregateFunction in the mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15862) Remove deprecated class KafkaPartitioner and all its usages

2020-02-03 Thread Dawid Wysakowicz (Jira)

Dawid Wysakowicz created FLINK-15862:


 Summary: Remove deprecated class KafkaPartitioner and all its 
usages
 Key: FLINK-15862
 URL: https://issues.apache.org/jira/browse/FLINK-15862
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Kafka
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz
 Fix For: 1.11.0


The {{KafkaPartitioner}} was deprecated in 1.3.0. It should be safe to remove 
this interface now.

Moreover it has wrong annotation {{@Internal}} even thought it is exposed in a 
user facing API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Update UpsertStreamTableSink and RetractStreamTableSink to new type system

2020-02-03 Thread Kurt Young

Would overriding `getConsumedDataType` do the job?

Best,
Kurt


On Mon, Feb 3, 2020 at 3:52 PM Zhenghua Gao  wrote:

> Hi all,
>
> FLINK-12254[1] [2] updated TableSink and related interfaces to new type
> system which
> allows connectors use the new type system based on DataTypes.
>
> But FLINK-12911 port UpsertStreamTableSink and RetractStreamTableSink to
> flink-api-java-bridge and returns TypeInformation of the requested record
> type which
> can't support types with precision and scale, e.g. TIMESTAMP(p),
> DECIMAL(p,s).
>
> /**
>  * Returns the requested record type.
>  */
> TypeInformation getRecordType();
>
>
> A proposal is deprecating the *getRecordType* API and adding a
> *getRecordDataType* API instead to return the data type of the requested
> record. I have filed the issue FLINK-15469 and
> an initial PR to verify it.
>
> What do you think about this API changes? Any feedback are appreciated.
> [1] https://issues.apache.org/jira/browse/FLINK-12254
> [2] https://github.com/apache/flink/pull/8596
> [3] https://issues.apache.org/jira/browse/FLINK-15469
>
> *Best Regards,*
> *Zhenghua Gao*
>

Re: [DISCUSS] Update UpsertStreamTableSink and RetractStreamTableSink to new type system

Hi Zhenghua,

The *getRecordDataType* looks good to me.

But the main problem is how to represent the tuple type in DataType. I
understand that it is necessary to use StructuredType, but at present,
planner does not support StructuredType, so the other way is to support
StructuredType.

Best,
Jingsong Lee

On Mon, Feb 3, 2020 at 4:49 PM Kurt Young  wrote:

> Would overriding `getConsumedDataType` do the job?
>
> Best,
> Kurt
>
>
> On Mon, Feb 3, 2020 at 3:52 PM Zhenghua Gao  wrote:
>
>> Hi all,
>>
>> FLINK-12254[1] [2] updated TableSink and related interfaces to new type
>> system which
>> allows connectors use the new type system based on DataTypes.
>>
>> But FLINK-12911 port UpsertStreamTableSink and RetractStreamTableSink to
>> flink-api-java-bridge and returns TypeInformation of the requested record
>> type which
>> can't support types with precision and scale, e.g. TIMESTAMP(p),
>> DECIMAL(p,s).
>>
>> /**
>>  * Returns the requested record type.
>>  */
>> TypeInformation getRecordType();
>>
>>
>> A proposal is deprecating the *getRecordType* API and adding a
>> *getRecordDataType* API instead to return the data type of the requested
>> record. I have filed the issue FLINK-15469 and
>> an initial PR to verify it.
>>
>> What do you think about this API changes? Any feedback are appreciated.
>> [1] https://issues.apache.org/jira/browse/FLINK-12254
>> [2] https://github.com/apache/flink/pull/8596
>> [3] https://issues.apache.org/jira/browse/FLINK-15469
>>
>> *Best Regards,*
>> *Zhenghua Gao*
>>
>

-- 
Best, Jingsong Lee

Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

2020-02-03 Thread Becket Qin

Thanks for bringing up the discussion, Hequn.

+1 on adding `flink-ml-api` and `flink-ml-lib` into opt. This would make it
much easier for the users to try out some simple ml tasks.

Thanks,

Jiangjie (Becket) Qin

On Mon, Feb 3, 2020 at 4:34 PM jincheng sun 
wrote:

> Thank you for pushing forward @Hequn Cheng  !
>
> Hi  @Becket Qin  , Do you have any concerns on this
> ?
>
> Best,
> Jincheng
>
> Hequn Cheng  于2020年2月3日周一 下午2:09写道：
>
>> Hi everyone,
>>
>> Thanks for the feedback. As there are no objections, I've opened a JIRA
>> issue(FLINK-15847[1]) to address this issue.
>> The implementation details can be discussed in the issue or in the
>> following PR.
>>
>> Best,
>> Hequn
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-15847
>>
>> On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng  wrote:
>>
>> > Hi Jincheng,
>> >
>> > Thanks a lot for your feedback!
>> > Yes, I agree with you. There are cases that multi jars need to be
>> > uploaded. I will prepare another discussion later. Maybe with a simple
>> > design doc.
>> >
>> > Best, Hequn
>> >
>> > On Wed, Jan 8, 2020 at 3:06 PM jincheng sun 
>> > wrote:
>> >
>> >> Thanks for bring up this discussion Hequn!
>> >>
>> >> +1 for include `flink-ml-api` and `flink-ml-lib` in opt.
>> >>
>> >> BTW: I think would be great if bring up a discussion for upload
>> multiple
>> >> Jars at the same time. as PyFlink JOB also can have the benefit if we
>> do
>> >> that improvement.
>> >>
>> >> Best,
>> >> Jincheng
>> >>
>> >>
>> >> Hequn Cheng  于2020年1月8日周三 上午11:50写道：
>> >>
>> >> > Hi everyone,
>> >> >
>> >> > FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI which moves
>> >> Flink
>> >> > ML a step further. Base on it, users can develop their ML jobs and
>> more
>> >> and
>> >> > more machine learning platforms are providing ML services.
>> >> >
>> >> > However, the problem now is the jars of flink-ml-api and flink-ml-lib
>> >> are
>> >> > only exist on maven repo. Whenever users want to submit ML jobs, they
>> >> can
>> >> > only depend on the ml modules and package a fat jar. This would be
>> >> > inconvenient especially for the machine learning platforms on which
>> >> nearly
>> >> > all jobs depend on Flink ML modules and have to package a fat jar.
>> >> >
>> >> > Given this, it would be better to include jars of flink-ml-api and
>> >> > flink-ml-lib in the `opt` folder, so that users can directly use the
>> >> jars
>> >> > with the binary release. For example, users can move the jars into
>> the
>> >> > `lib` folder or use -j to upload the jars. (Currently, -j only
>> support
>> >> > upload one jar. Supporting multi jars for -j can be discussed in
>> another
>> >> > discussion.)
>> >> >
>> >> > Putting the jars in the `opt` folder instead of the `lib` folder is
>> >> because
>> >> > currently, the ml jars are still optional for the Flink project by
>> >> default.
>> >> >
>> >> > What do you think? Welcome any feedback!
>> >> >
>> >> > Best,
>> >> >
>> >> > Hequn
>> >> >
>> >> > [1]
>> >> >
>> >> >
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
>> >> >
>> >>
>> >
>>
>

Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

2020-02-03 Thread Hequn Cheng

Thank you all for your feedback and suggestions!

Best, Hequn

On Mon, Feb 3, 2020 at 5:07 PM Becket Qin  wrote:

> Thanks for bringing up the discussion, Hequn.
>
> +1 on adding `flink-ml-api` and `flink-ml-lib` into opt. This would make
> it much easier for the users to try out some simple ml tasks.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Mon, Feb 3, 2020 at 4:34 PM jincheng sun 
> wrote:
>
>> Thank you for pushing forward @Hequn Cheng  !
>>
>> Hi  @Becket Qin  , Do you have any concerns on
>> this ?
>>
>> Best,
>> Jincheng
>>
>> Hequn Cheng  于2020年2月3日周一 下午2:09写道：
>>
>>> Hi everyone,
>>>
>>> Thanks for the feedback. As there are no objections, I've opened a JIRA
>>> issue(FLINK-15847[1]) to address this issue.
>>> The implementation details can be discussed in the issue or in the
>>> following PR.
>>>
>>> Best,
>>> Hequn
>>>
>>> [1] https://issues.apache.org/jira/browse/FLINK-15847
>>>
>>> On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng  wrote:
>>>
>>> > Hi Jincheng,
>>> >
>>> > Thanks a lot for your feedback!
>>> > Yes, I agree with you. There are cases that multi jars need to be
>>> > uploaded. I will prepare another discussion later. Maybe with a simple
>>> > design doc.
>>> >
>>> > Best, Hequn
>>> >
>>> > On Wed, Jan 8, 2020 at 3:06 PM jincheng sun 
>>> > wrote:
>>> >
>>> >> Thanks for bring up this discussion Hequn!
>>> >>
>>> >> +1 for include `flink-ml-api` and `flink-ml-lib` in opt.
>>> >>
>>> >> BTW: I think would be great if bring up a discussion for upload
>>> multiple
>>> >> Jars at the same time. as PyFlink JOB also can have the benefit if we
>>> do
>>> >> that improvement.
>>> >>
>>> >> Best,
>>> >> Jincheng
>>> >>
>>> >>
>>> >> Hequn Cheng  于2020年1月8日周三 上午11:50写道：
>>> >>
>>> >> > Hi everyone,
>>> >> >
>>> >> > FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI which moves
>>> >> Flink
>>> >> > ML a step further. Base on it, users can develop their ML jobs and
>>> more
>>> >> and
>>> >> > more machine learning platforms are providing ML services.
>>> >> >
>>> >> > However, the problem now is the jars of flink-ml-api and
>>> flink-ml-lib
>>> >> are
>>> >> > only exist on maven repo. Whenever users want to submit ML jobs,
>>> they
>>> >> can
>>> >> > only depend on the ml modules and package a fat jar. This would be
>>> >> > inconvenient especially for the machine learning platforms on which
>>> >> nearly
>>> >> > all jobs depend on Flink ML modules and have to package a fat jar.
>>> >> >
>>> >> > Given this, it would be better to include jars of flink-ml-api and
>>> >> > flink-ml-lib in the `opt` folder, so that users can directly use the
>>> >> jars
>>> >> > with the binary release. For example, users can move the jars into
>>> the
>>> >> > `lib` folder or use -j to upload the jars. (Currently, -j only
>>> support
>>> >> > upload one jar. Supporting multi jars for -j can be discussed in
>>> another
>>> >> > discussion.)
>>> >> >
>>> >> > Putting the jars in the `opt` folder instead of the `lib` folder is
>>> >> because
>>> >> > currently, the ml jars are still optional for the Flink project by
>>> >> default.
>>> >> >
>>> >> > What do you think? Welcome any feedback!
>>> >> >
>>> >> > Best,
>>> >> >
>>> >> > Hequn
>>> >> >
>>> >> > [1]
>>> >> >
>>> >> >
>>> >>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
>>> >> >
>>> >>
>>> >
>>>
>>

Re: connection timeout during shuffle initialization

2020-02-03 Thread Piotr Nowojski

Hi,

Coming back to this idea:

>> Removing synchronization *did solve* the problem for me, because it
>> allows flink to leverage the whole netty event loop pool and it's ok to
>> have a single thread blocked for a little while (we still can accept
>> connections with other threads).


Assignment between channels and threads is not done on a basis “assign to some 
idle channel”, but based on some trivial hash function 
(`PowerOfTwoEventExecutorChooser` or `GenericEventExecutorChooser`). 

Without the global ResultPartitionManager lock:
# assuming 16 threads
1. If a thread A blocks for 2+ minutes
2. New incoming connection has at best 1/16 chances to be blocked (instead of 
100% as it is now)

So removing the lock at best only decreases the magnitude   of the problem 
by factor of number of Netty threads. At worst, it might not help much, as if 
one thread is blocked on IO usually it doesn’t mean that other thread can 
perform IO without blocking itself and being enqueued one after another by the 
Kernel. Or even completely overloading the IO system.

This also ignores the issue that if a thread is currently blocked on blocking 
IO outside of the lock, it will still prevent 1/16th of incoming connections to 
be accepted, both now and after theoretical removal of the 
ResultPartitionManager's lock.

In short, I’m +1 for trying to remove/limit a scope of this lock, but I 
wouldn’t expect wonders :(

Piotrek

> On 3 Feb 2020, at 08:39, Zhijiang  wrote:
> 
> Sorry for touching this issue late, just come back from Chinese Spring 
> Festival.
> 
> Actually we have not encountered this problem in production before. The 
> problem of connection timeout was mainly caused by netty server starting 
> delay on upstream side (big jar loading might cause as Piotr mentioned) or 
> TaskManager exits early (as Stephan mentioned).
> 
> As we know, one connection channel would be bound to only one netty thread by 
> design, and one netty thread might be responsible for multiple channels. In 
> this case, if the respective netty thread is blocking in heavy IO operator, 
> then it would not respond to the connection request in time to cause timeout. 
> Even though we remove the global lock from ResultPartitionManager, I guess 
> that it can not fully solve this issue and actually the connection process 
> does not touch the global lock.
> 
> The global lock in ResultPartitionManager is mainly working on 
> registering/releasing partitions, and for maintaining the global states of 
> `isShutDown`,`registeredPartitions`. It is feasible to remove the global lock 
> in technology/theory which might get a bit benefit to not delay create other 
> subpartition views if one view is blocking into IO operation in some 
> scenarios. But from another aspect, it is also meaningful to try best not 
> block netty thread long time, that could solve the connection timeout 
> completely. In our previous assumption/suggestion it is better to make netty 
> thread involve in light-weight operations if possible.
> 
> Let's forward the further solutions on the jira page as Piotr suggested. :)
> 
> Best,
> Zhijiang
> 
> --
> From:Piotr Nowojski 
> Send Time:2020 Jan. 30 (Thu.) 19:29
> To:dev ; zhijiang 
> Subject:Re: connection timeout during shuffle initialization
> 
> One more thing. Could you create a JIRA ticket for this issue? We could also 
> move the discussion there.
> 
> Piotrek
> 
> > On 30 Jan 2020, at 12:14, Piotr Nowojski  wrote:
> > 
> > Hi,
> > 
> >>> I think it's perfectly ok to perform IO ops in netty threads,
> > (…)
> >>> Removing synchronization *did solve* the problem for me, because it
> >>> allows flink to leverage the whole netty event loop pool and it's ok to
> >>> have a single thread blocked for a little while (we still can accept
> >>> connections with other threads).
> > 
> > 
> > It’s discouraged pattern, as Netty have a thread pool for processing 
> > multiple channels, but a single channel is always handled by the same 
> > pre-defined thread (to the best of my knowledge). In Flink we are lucky 
> > that Netty threads are not doing anything critical besides registering 
> > partitions (heartbeats are handled independently) that could fail the job 
> > if blocked. And I guess you are right, if some threads are blocked on the 
> > IO, new (sub)partition registration should be handled by the non blocked 
> > threads, if not for the global lock. 
> > 
> > It sounds very hacky though. Also that's ignoring performance implication - 
> > one thread blocked on the disks IO, wastes CPU/network potential of ~1/16 
> > channels (due to this fix pre determined assignment between channels <-> 
> > threads). In some scenarios that might be acceptable, with uniform tasks 
> > without data skew. But if there are simultaneously running multiple tasks 
> > with different work load patterns and/or a data skew, this can cause 
> > visible performance issues.
> > 
> > Havi

[DISCUSS] FLIP-94 Rework 2-phase commit abstractions

2020-02-03 Thread Roman Khachatryan

Hi everyone,

I'd like to kick off the discussion on the redesign of
TwoPhaseCommitSinkFunction [1].

The primary motivation is to provide a solution that suits the needs of
both Kafka Sink and JDBC exactly once sink. Other possible scenarios
include File Sink, WAL and batch jobs.

Current abstraction doesn't support all of the requirements of the JDBC
exactly-once sink (e.g retries); besides that, it needs some (minor)
changes in the API.

FLIP-94 proposes more fine-grained abstractions and the use of composition
instead of inheritance (please see the diagram). This enables customization
of various aspects independently and eventually support of more use cases.

Any feedback welcome.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-94%3A+Rework+2-phase+commit+abstractions

-- 
Regards,
Roman

Re: [DISCUSS] FLIP-75: Flink Web UI Improvement Proposal

2020-02-03 Thread Yadong Xie

Hi Till
I didn’t find how to create of sub flip at cwiki.apache.org
do you mean to create 9 more FLIPS instead of FLIP-75?

Till Rohrmann  于2020年1月30日周四 下午11:12写道：

> Would it be easier if FLIP-75 would be the umbrella FLIP and we would vote
> on the individual improvements as sub FLIPs? Decreasing the scope should
> make things easier.
>
> Cheers,
> Till
>
> On Thu, Jan 30, 2020 at 2:35 PM Robert Metzger 
> wrote:
>
> > Thanks a lot for this work! I believe the web UI is very important, in
> > particular to new users. I'm very happy to see that you are putting
> effort
> > into improving the visibility into Flink through the proposed changes.
> >
> > I can not judge if all the changes make total sense, but the discussion
> has
> > been open since September, and a good number of people have commented in
> > the document.
> > I wonder if we can move this FLIP to the VOTing stage?
> >
> > On Wed, Jan 22, 2020 at 6:27 PM Till Rohrmann 
> > wrote:
> >
> > > Thanks for the update Yadong. Big +1 for the proposed improvements for
> > > Flink's web UI. I think they will be super helpful for our users.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Tue, Jan 7, 2020 at 10:00 AM Yadong Xie 
> wrote:
> > >
> > > > Hi everyone
> > > >
> > > > We have spent some time updating the documentation since the last
> > > > discussion.
> > > >
> > > > In short, the latest FLIP-75 contains the following
> proposal(including
> > > both
> > > > frontend and RestAPI)
> > > >
> > > >1. Job Level
> > > >   - better job backpressure detection
> > > >   - load more feature in job exception
> > > >   - show attempt history in the subtask
> > > >   - show attempt timeline
> > > >   - add pending slots
> > > >2. Task Manager Level
> > > >   - add more metrics
> > > >   - better log display
> > > >3. Job Manager Level
> > > >   - add metrics tab
> > > >   - better log display
> > > >
> > > > To help everyone better understand the proposal, we spent efforts on
> > > making
> > > > an online POC .
> > > >
> > > > Now you can compare the difference between the new and old
> Web/RestAPI
> > > (the
> > > > link is inside the doc)!
> > > >
> > > > Here is the latest FLIP-75 doc:
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit#
> > > >
> > > > Looking forward to your feedback
> > > >
> > > >
> > > > Best,
> > > > Yadong
> > > >
> > > > lining jing  于2019年10月24日周四 下午2:11写道：
> > > >
> > > > > Hi all, I have updated the backend design in FLIP-75
> > > > > <
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit?usp=sharing
> > > > > >
> > > > > .
> > > > >
> > > > > Here are some brief introductions:
> > > > >
> > > > >- Add metric for manage memory FLINK-14406
> > > > >.
> > > > >- Expose TaskExecutor resource configurations to REST API
> > > FLINK-14422
> > > > >.
> > > > >- Add TaskManagerResourceInfo in TaskManagerDetailsInfo to show
> > > > >TaskManager Resource FLINK-14435
> > > > >.
> > > > >
> > > > > I will continue to update the rest part of the backend design in
> the
> > > doc,
> > > > > let's keep discuss here, any feedback is appreciated.
> > > > >
> > > > > Yadong Xie  于2019年9月27日周五 上午10:13写道：
> > > > >
> > > > > > Hi all
> > > > > >
> > > > > > Flink Web UI is the main platform for most users to monitor their
> > > jobs
> > > > > and
> > > > > > clusters. We have reconstructed Flink web in 1.9.0 version, but
> > there
> > > > are
> > > > > > still some shortcomings.
> > > > > >
> > > > > > This discussion thread aims to provide a better experience for
> > Flink
> > > UI
> > > > > > users.
> > > > > >
> > > > > > Here is the design doc I drafted:
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit?usp=sharing
> > > > > >
> > > > > >
> > > > > > The FLIP can be found at [2].
> > > > > >
> > > > > > Please keep the discussion here, in the mailing list.
> > > > > >
> > > > > > Looking forward to your opinions, any feedbacks are welcome.
> > > > > >
> > > > > > [1]:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit?usp=sharing
> > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit#
> > > > > > >
> > > > > > [2]:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-75%3A+Flink+Web+UI+Improvement+Proposal
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Update UpsertStreamTableSink and RetractStreamTableSink to new type system

2020-02-03 Thread Zhenghua Gao

Hi Jingsong,  For now, only UpsertStreamTableSink and
RetractStreamTableSink consumes JTuple2
So the 'getConsumedDataType' interface is not necessary in validate &
codegen phase.
See
https://github.com/apache/flink/pull/10880/files#diff-137d090bd719f3a99ae8ba6603ed81ebR52
 and
https://github.com/apache/flink/pull/10880/files#diff-8405c17e5155aa63d16e497c4c96a842R304

What about stay the same to use RAW type?

*Best Regards,*
*Zhenghua Gao*


On Mon, Feb 3, 2020 at 4:59 PM Jingsong Li  wrote:

> Hi Zhenghua,
>
> The *getRecordDataType* looks good to me.
>
> But the main problem is how to represent the tuple type in DataType. I
> understand that it is necessary to use StructuredType, but at present,
> planner does not support StructuredType, so the other way is to support
> StructuredType.
>
> Best,
> Jingsong Lee
>
> On Mon, Feb 3, 2020 at 4:49 PM Kurt Young  wrote:
>
> > Would overriding `getConsumedDataType` do the job?
> >
> > Best,
> > Kurt
> >
> >
> > On Mon, Feb 3, 2020 at 3:52 PM Zhenghua Gao  wrote:
> >
> >> Hi all,
> >>
> >> FLINK-12254[1] [2] updated TableSink and related interfaces to new type
> >> system which
> >> allows connectors use the new type system based on DataTypes.
> >>
> >> But FLINK-12911 port UpsertStreamTableSink and RetractStreamTableSink to
> >> flink-api-java-bridge and returns TypeInformation of the requested
> record
> >> type which
> >> can't support types with precision and scale, e.g. TIMESTAMP(p),
> >> DECIMAL(p,s).
> >>
> >> /**
> >>  * Returns the requested record type.
> >>  */
> >> TypeInformation getRecordType();
> >>
> >>
> >> A proposal is deprecating the *getRecordType* API and adding a
> >> *getRecordDataType* API instead to return the data type of the requested
> >> record. I have filed the issue FLINK-15469 and
> >> an initial PR to verify it.
> >>
> >> What do you think about this API changes? Any feedback are appreciated.
> >> [1] https://issues.apache.org/jira/browse/FLINK-12254
> >> [2] https://github.com/apache/flink/pull/8596
> >> [3] https://issues.apache.org/jira/browse/FLINK-15469
> >>
> >> *Best Regards,*
> >> *Zhenghua Gao*
> >>
> >
>
> --
> Best, Jingsong Lee
>

Re: [VOTE] Release 1.10.0, release candidate #1

2020-02-03 Thread Kostas Kloudas

+1 (binding)

- Built Flink locally
- Tested quickstart by writing simple, WordCount-like jobs
- Submitted them to Yarn both "per-job" and "session" mode

For Thomas' comment, I agree that in this release we change how some
of the execution options are propagated through the stack. This was
done as part of
the "Executors" effort, which required all parameters to be passed to
an executor
through a Configuration object.

That said, the classes and methods mentioned in the test are Internal
and they were
not meant to be used directly by the users. This is also the reason that their
behaviour was not also described in the 1.9 documentation.

But if this is something worth documenting, then we could in the
future create an page in the
"Internals" of the documentation where we can describe the "Executors"
and how to use
them "at-your-own-risk".

What do you think @Thomas Weise ?

Cheers,
Kostas


On Mon, Feb 3, 2020 at 5:44 AM Thomas Weise  wrote:
>
> The above issue was resolved by adding
>
> RemoteEnvironmentConfigUtils.setJarURLsToConfig(new String[] {JAR_PATH},
> config);
>
> It might be helpful to provide migration instructions to users as part of
> this release.
>
>
> On Fri, Jan 31, 2020 at 9:20 PM Thomas Weise  wrote:
>
> > As part of testing the RC, I run into the following issue with a test case
> > that runs a job from a packaged jar on a MiniCluster. This test had to be
> > modified due to the client-side API changes in 1.10.
> >
> > The issue is that the jar file that also contains the entry point isn't
> > part of the user classpath on the task manager. The entry point is executed
> > successfully; when removing all user code from the job graph, the test
> > passes.
> >
> > If the jar isn't shipped automatically to the task manager, what do I need
> > to set for it to occur?
> >
> > Thanks,
> > Thomas
> >
> >
> >   @ClassRule
> >   public static final MiniClusterResource MINI_CLUSTER_RESOURCE = new
> > MiniClusterResource(
> >   new MiniClusterResourceConfiguration.Builder()
> >   .build());
> >
> >   @Test(timeout = 3)
> >   public void test() throws Exception {
> > final URI restAddress =
> > MINI_CLUSTER_RESOURCE.getMiniCluster().getRestAddress().get();
> > Configuration config = new Configuration();
> > config.setString(JobManagerOptions.ADDRESS, restAddress.getHost());
> > config.setString(RestOptions.ADDRESS, restAddress.getHost());
> > config.setInteger(RestOptions.PORT, restAddress.getPort());
> > config.set(CoreOptions.DEFAULT_PARALLELISM, 1);
> > config.setString(DeploymentOptions.TARGET, RemoteExecutor.NAME);
> >
> > String entryPoint = "my.TestFlinkJob";
> >
> > PackagedProgram.Builder program = PackagedProgram.newBuilder()
> > .setJarFile(new File(JAR_PATH))
> > .setEntryPointClassName(entryPoint);
> >
> > ClientUtils.executeProgram(DefaultExecutorServiceLoader.INSTANCE,
> > config, program.build());
> >   }
> >
> > The user function deserialization error:
> >
> > org.apache.flink.streaming.runtime.tasks.StreamTaskException: Cannot
> > instantiate user function.
> > at
> > org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperatorFactory(StreamConfig.java:269)
> > at
> > org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:115)
> > at
> > org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:433)
> > at
> > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:461)
> > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)
> > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)
> > at java.lang.Thread.run(Thread.java:748)
> > Caused by: java.io.StreamCorruptedException: unexpected block data
> > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1581)
> > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
> > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2158)
> >
> > With 1.9, the driver code would be:
> >
> > PackagedProgram program = new PackagedProgram(new File(JAR_PATH),
> > entryPoint, new String[]{});
> > RestClusterClient client = new RestClusterClient(config,
> > "RemoteExecutor");
> > client.run(program, 1);
> >
> > On Fri, Jan 31, 2020 at 9:16 PM Jingsong Li 
> > wrote:
> >
> >> Thanks Jincheng,
> >>
> >> FLINK-15840 [1] should be a blocker, lead to
> >> "TableEnvironment.from/scan(string path)" cannot be used for all
> >> temporaryTable and catalogTable (not DataStreamTable). Of course, it can
> >> be
> >> bypassed by "TableEnvironment.sqlQuery("select * from t")", but
> >> "from/scan"
> >> are very important api of TableEnvironment and pure TableApi can't be used
> >> seriously.
> >>
> >> [1] https://issues.apache.org/jira/browse/FLINK-15840
> >>
> >> Best,
> >> Jingsong Lee
> >>
> >> On Sat, Feb 1, 2020 at 12:47 PM Benchao Li  wrote:
> >>
> >> > Hi all,
> >> >
> >> > I also have a issue[1] wh

Re: [VOTE] FLIP-27 - Refactor Source Interface

2020-02-03 Thread Becket Qin

Bump up the thread.

On Tue, Jan 21, 2020 at 10:43 AM Becket Qin  wrote:

> Hi Folks,
>
> I'd like to resume the voting thread for FlIP-27.
>
> Please note that the FLIP wiki has been updated to reflect the latest
> discussions in the discussion thread.
>
> To avoid confusion, I'll only count the votes casted after this point.
>
> FLIP wiki:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27
> %3A+Refactor+Source+Interface
>
> Discussion thread:
>
> https://lists.apache.org/thread.html/70484d6aa4b8e7121181ed8d5857a94bfb7d5a76334b9c8fcc59700c%40%3Cdev.flink.apache.org%3E
>
> The vote will last for at least 72 hours, following the consensus voting
>  process.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Thu, Dec 5, 2019 at 10:31 AM jincheng sun 
> wrote:
>
>> +1 (binding), and looking forward to seeing the new interface in the
>> master.
>>
>> Best,
>> Jincheng
>>
>> Becket Qin  于2019年12月5日周四 上午8:05写道：
>>
>> > Hi all,
>> >
>> > I would like to start the vote for FLIP-27 which proposes to introduce a
>> > new Source connector interface to address a few problems in the existing
>> > source connector. The main goals of the the FLIP are following:
>> >
>> > 1. Unify the Source interface in Flink for batch and stream.
>> > 2. Significantly reduce the work for developers to develop new source
>> > connectors.
>> > 3. Provide a common abstraction for all the sources, as well as a
>> mechanism
>> > to allow source subtasks to coordinate among themselves.
>> >
>> > The vote will last for at least 72 hours, following the consensus voting
>> > process.
>> >
>> > FLIP wiki:
>> >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
>> >
>> > Discussion thread:
>> >
>> >
>> https://lists.apache.org/thread.html/70484d6aa4b8e7121181ed8d5857a94bfb7d5a76334b9c8fcc59700c@%3Cdev.flink.apache.org%3E
>> >
>> > Thanks,
>> >
>> > Jiangjie (Becket) Qin
>> >
>>
>

Re: [VOTE] FLIP-27 - Refactor Source Interface

2020-02-03 Thread Yu Li

+1, thanks for the efforts Becket!

Best Regards,
Yu


On Mon, 3 Feb 2020 at 17:52, Becket Qin  wrote:

> Bump up the thread.
>
> On Tue, Jan 21, 2020 at 10:43 AM Becket Qin  wrote:
>
> > Hi Folks,
> >
> > I'd like to resume the voting thread for FlIP-27.
> >
> > Please note that the FLIP wiki has been updated to reflect the latest
> > discussions in the discussion thread.
> >
> > To avoid confusion, I'll only count the votes casted after this point.
> >
> > FLIP wiki:
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-27
> > %3A+Refactor+Source+Interface
> >
> > Discussion thread:
> >
> >
> https://lists.apache.org/thread.html/70484d6aa4b8e7121181ed8d5857a94bfb7d5a76334b9c8fcc59700c%40%3Cdev.flink.apache.org%3E
> >
> > The vote will last for at least 72 hours, following the consensus voting
> >  process.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Thu, Dec 5, 2019 at 10:31 AM jincheng sun 
> > wrote:
> >
> >> +1 (binding), and looking forward to seeing the new interface in the
> >> master.
> >>
> >> Best,
> >> Jincheng
> >>
> >> Becket Qin  于2019年12月5日周四 上午8:05写道：
> >>
> >> > Hi all,
> >> >
> >> > I would like to start the vote for FLIP-27 which proposes to
> introduce a
> >> > new Source connector interface to address a few problems in the
> existing
> >> > source connector. The main goals of the the FLIP are following:
> >> >
> >> > 1. Unify the Source interface in Flink for batch and stream.
> >> > 2. Significantly reduce the work for developers to develop new source
> >> > connectors.
> >> > 3. Provide a common abstraction for all the sources, as well as a
> >> mechanism
> >> > to allow source subtasks to coordinate among themselves.
> >> >
> >> > The vote will last for at least 72 hours, following the consensus
> voting
> >> > process.
> >> >
> >> > FLIP wiki:
> >> >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> >> >
> >> > Discussion thread:
> >> >
> >> >
> >>
> https://lists.apache.org/thread.html/70484d6aa4b8e7121181ed8d5857a94bfb7d5a76334b9c8fcc59700c@%3Cdev.flink.apache.org%3E
> >> >
> >> > Thanks,
> >> >
> >> > Jiangjie (Becket) Qin
> >> >
> >>
> >
>

Re: REST Monitoring Savepoint failed

At the moment this is the case Ramya. We plan to add the auto scaling
feature back again in one of the future Flink versions, though.

Cheers,
Till

On Mon, Feb 3, 2020 at 5:27 AM Ramya Ramamurthy  wrote:

> Thanks Till Rohrmann for the update.
>
> So even if we upgrade to the newer version of Flink, we have to manually
> rescale. Is that correct? that is to stop the job and start again with the
> desired parallelism.
>
> Thanks.
>
> On Fri, Jan 31, 2020 at 6:42 PM Till Rohrmann 
> wrote:
>
>> Thanks for providing us with the logs Ramya. I think the problem is that
>> with FLINK-10354 [1], we accidentally broke the rescaling feature in Flink
>> >= 1.7.0. The problem is that before savepoints weren't used for recovery
>> and, hence, they were not part of the CompletedCheckpointStore. With
>> FLINK-10354, this changed and now the savepoints are part of the completed
>> checkpoint store. This breaks an assumption of the rescaling feature.
>>
>> I would recommend to manually rescale by (1) taking a savepoint, (2)
>> stopping the job, (3) resubmitting the job with the changed parallelism
>> resuming from the taken savepoint.
>>
>> A side note, the rescaling feature has been removed in Flink >= 1.9.0
>> because of some inherent limitations.
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-10354
>>
>> Cheers,
>> Till
>>
>> On Fri, Jan 31, 2020 at 11:49 AM Ramya Ramamurthy 
>> wrote:
>>
>>> Hi,
>>>
>>> Please find the below steps.
>>>
>>> 1) Trigger a savepoint to GCS
>>> 2) Trigger /rescaling REST API to dynamically increase parallelism.
>>> 3) check on the trigger id received, we get the below error.
>>>
>>> 2020-01-31 10:35:44,211 INFO
>>>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering
>>> checkpoint 2 @ 1580466943982 for job 7a085511514ba68ad07de1945dbf40a2.
>>> 2020-01-31 10:35:44,648 INFO
>>>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Starting
>>> job 7a085511514ba68ad07de1945dbf40a2 from savepoint
>>> gs://ss-enigma-bucket/flink/flink-gcs/flink-savepoints/savepoint-7a0855-de3002bc066b
>>> ()
>>> 2020-01-31 10:35:45,191 INFO
>>>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Completed
>>> checkpoint 2 for job 7a085511514ba68ad07de1945dbf40a2 (25449 bytes in 665
>>> ms).
>>> 2020-01-31 10:35:45,337 ERROR
>>> org.apache.flink.runtime.rest.handler.job.JobDetailsHandler   - Exception
>>> occurred in REST handler: Job c6dd27ec36392e4af8fa55066e16a2e2 not found
>>> 2020-01-31 10:35:45,395 INFO
>>>  org.apache.flink.runtime.jobmaster.JobMaster  - Could not
>>> restore from temporary rescaling savepoint. This might indicate that the
>>> savepoint
>>> gs://ss-enigma-bucket/flink/flink-gcs/flink-savepoints/savepoint-7a0855-de3002bc066b
>>> got corrupted. Deleting this savepoint as a precaution.
>>> 2020-01-31 10:35:45,397 INFO
>>>  org.apache.flink.runtime.jobmaster.JobMaster  - Attempting
>>> to load configured state backend for savepoint disposal
>>> 2020-01-31 10:35:45,397 INFO
>>>  org.apache.flink.runtime.jobmaster.JobMaster  - No state
>>> backend configured, attempting to dispose savepoint with default backend
>>> (file system based)
>>> 2020-01-31 10:35:45,398 INFO
>>>  org.apache.flink.runtime.executiongraph.ExecutionGraph- Job stuart
>>> (7a085511514ba68ad07de1945dbf40a2) switched from state CREATED to FAILING.
>>> org.apache.flink.runtime.execution.SuppressRestartsException:
>>> Unrecoverable failure. This suppresses job restarts. Please check the stack
>>> trace for the root cause.
>>>
>>> Attaching the file for reference. Job
>>> ID: 7a085511514ba68ad07de1945dbf40a2,
>>>
>>> Thanks.
>>>
>>>
>>>
>>> On Fri, Jan 31, 2020 at 3:20 PM Till Rohrmann 
>>> wrote:
>>>
 Hi Ramya,

 could you share the logs with us?

 Cheers,
 Till

 On Fri, Jan 31, 2020 at 9:31 AM Yun Tang  wrote:

> Hi Ramya
>
> Removed the dev mail list in receiver.
>
> Can you check the configuration of your "Job Manager" tab via web UI
> to see whether state.savepoints.dir [1] existed? If that existed, default
> savepoint directory is already given and such problem should not happen.
>
>
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/config.html#state-savepoints-dir
>
> Best
> Yun Tang
> --
> *From:* Ramya Ramamurthy 
> *Sent:* Friday, January 31, 2020 15:03
> *To:* dev@flink.apache.org 
> *Cc:* user 
> *Subject:* Re: REST Monitoring Savepoint failed
>
> Hi Till,
>
> I am using flink 1.7.
> This is my observation.
>
> a) I first trigger a savepoint. this is stored on my Google cloud
> storage.
> b) When i invoke the rescale HTTP API, i get the error telling
> savepoints
> dir is not configured. But post triggering a), i could verify the
> savepoint
> directory present in GCS in the mentioned pa

Re: Quick Introduction of myself - Kartheek.

Hi Kartheek,

welcome to the Flink community. The best way to get started is to read the
contribution guidelines [1].

[1] https://flink.apache.org/contributing/how-to-contribute.html

Cheers,
Till

On Sun, Feb 2, 2020 at 6:52 AM Kartheek kark  wrote:

> Good Morning everyone,
> I am kartheek, currently working in saas and analytics domain.
> How I would like to contribute -
> 1. Understanding problems in  various architectures, and helping them solve
> it
> 2. Code and solve the problems, which helps to further enhance my
> knowledge.
>As well, Can I be introduced to my peers who are working in timezone
> close to Indian Time Zone ? Is there a way to communicate via chat channels
> ?
>
>I am happy to be a part of this team.
>
> Thanks and regards,
> Kartheek B,
> Linkedin 
>

Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

An alternative solution would be to offer the flink-ml libraries as
optional dependencies on the download page. Similar to how we offer the
different SQL formats and Hadoop releases [1].

[1] https://flink.apache.org/downloads.html

Cheers,
Till

On Mon, Feb 3, 2020 at 10:19 AM Hequn Cheng  wrote:

> Thank you all for your feedback and suggestions!
>
> Best, Hequn
>
> On Mon, Feb 3, 2020 at 5:07 PM Becket Qin  wrote:
>
> > Thanks for bringing up the discussion, Hequn.
> >
> > +1 on adding `flink-ml-api` and `flink-ml-lib` into opt. This would make
> > it much easier for the users to try out some simple ml tasks.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Mon, Feb 3, 2020 at 4:34 PM jincheng sun 
> > wrote:
> >
> >> Thank you for pushing forward @Hequn Cheng  !
> >>
> >> Hi  @Becket Qin  , Do you have any concerns on
> >> this ?
> >>
> >> Best,
> >> Jincheng
> >>
> >> Hequn Cheng  于2020年2月3日周一 下午2:09写道：
> >>
> >>> Hi everyone,
> >>>
> >>> Thanks for the feedback. As there are no objections, I've opened a JIRA
> >>> issue(FLINK-15847[1]) to address this issue.
> >>> The implementation details can be discussed in the issue or in the
> >>> following PR.
> >>>
> >>> Best,
> >>> Hequn
> >>>
> >>> [1] https://issues.apache.org/jira/browse/FLINK-15847
> >>>
> >>> On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng 
> wrote:
> >>>
> >>> > Hi Jincheng,
> >>> >
> >>> > Thanks a lot for your feedback!
> >>> > Yes, I agree with you. There are cases that multi jars need to be
> >>> > uploaded. I will prepare another discussion later. Maybe with a
> simple
> >>> > design doc.
> >>> >
> >>> > Best, Hequn
> >>> >
> >>> > On Wed, Jan 8, 2020 at 3:06 PM jincheng sun <
> sunjincheng...@gmail.com>
> >>> > wrote:
> >>> >
> >>> >> Thanks for bring up this discussion Hequn!
> >>> >>
> >>> >> +1 for include `flink-ml-api` and `flink-ml-lib` in opt.
> >>> >>
> >>> >> BTW: I think would be great if bring up a discussion for upload
> >>> multiple
> >>> >> Jars at the same time. as PyFlink JOB also can have the benefit if
> we
> >>> do
> >>> >> that improvement.
> >>> >>
> >>> >> Best,
> >>> >> Jincheng
> >>> >>
> >>> >>
> >>> >> Hequn Cheng  于2020年1月8日周三 上午11:50写道：
> >>> >>
> >>> >> > Hi everyone,
> >>> >> >
> >>> >> > FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI which
> moves
> >>> >> Flink
> >>> >> > ML a step further. Base on it, users can develop their ML jobs and
> >>> more
> >>> >> and
> >>> >> > more machine learning platforms are providing ML services.
> >>> >> >
> >>> >> > However, the problem now is the jars of flink-ml-api and
> >>> flink-ml-lib
> >>> >> are
> >>> >> > only exist on maven repo. Whenever users want to submit ML jobs,
> >>> they
> >>> >> can
> >>> >> > only depend on the ml modules and package a fat jar. This would be
> >>> >> > inconvenient especially for the machine learning platforms on
> which
> >>> >> nearly
> >>> >> > all jobs depend on Flink ML modules and have to package a fat jar.
> >>> >> >
> >>> >> > Given this, it would be better to include jars of flink-ml-api and
> >>> >> > flink-ml-lib in the `opt` folder, so that users can directly use
> the
> >>> >> jars
> >>> >> > with the binary release. For example, users can move the jars into
> >>> the
> >>> >> > `lib` folder or use -j to upload the jars. (Currently, -j only
> >>> support
> >>> >> > upload one jar. Supporting multi jars for -j can be discussed in
> >>> another
> >>> >> > discussion.)
> >>> >> >
> >>> >> > Putting the jars in the `opt` folder instead of the `lib` folder
> is
> >>> >> because
> >>> >> > currently, the ml jars are still optional for the Flink project by
> >>> >> default.
> >>> >> >
> >>> >> > What do you think? Welcome any feedback!
> >>> >> >
> >>> >> > Best,
> >>> >> >
> >>> >> > Hequn
> >>> >> >
> >>> >> > [1]
> >>> >> >
> >>> >> >
> >>> >>
> >>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
> >>> >> >
> >>> >>
> >>> >
> >>>
> >>
>

Re: [DISCUSS] FLIP-75: Flink Web UI Improvement Proposal

I think there is no such description because we never did it before. I just
figured that FLIP-75 could actually be a good candidate to start this
practice. We would need a community discussion first, though.

Cheers,
Till

On Mon, Feb 3, 2020 at 10:28 AM Yadong Xie  wrote:

> Hi Till
> I didn’t find how to create of sub flip at cwiki.apache.org
> do you mean to create 9 more FLIPS instead of FLIP-75?
>
> Till Rohrmann  于2020年1月30日周四 下午11:12写道：
>
> > Would it be easier if FLIP-75 would be the umbrella FLIP and we would
> vote
> > on the individual improvements as sub FLIPs? Decreasing the scope should
> > make things easier.
> >
> > Cheers,
> > Till
> >
> > On Thu, Jan 30, 2020 at 2:35 PM Robert Metzger 
> > wrote:
> >
> > > Thanks a lot for this work! I believe the web UI is very important, in
> > > particular to new users. I'm very happy to see that you are putting
> > effort
> > > into improving the visibility into Flink through the proposed changes.
> > >
> > > I can not judge if all the changes make total sense, but the discussion
> > has
> > > been open since September, and a good number of people have commented
> in
> > > the document.
> > > I wonder if we can move this FLIP to the VOTing stage?
> > >
> > > On Wed, Jan 22, 2020 at 6:27 PM Till Rohrmann 
> > > wrote:
> > >
> > > > Thanks for the update Yadong. Big +1 for the proposed improvements
> for
> > > > Flink's web UI. I think they will be super helpful for our users.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Tue, Jan 7, 2020 at 10:00 AM Yadong Xie 
> > wrote:
> > > >
> > > > > Hi everyone
> > > > >
> > > > > We have spent some time updating the documentation since the last
> > > > > discussion.
> > > > >
> > > > > In short, the latest FLIP-75 contains the following
> > proposal(including
> > > > both
> > > > > frontend and RestAPI)
> > > > >
> > > > >1. Job Level
> > > > >   - better job backpressure detection
> > > > >   - load more feature in job exception
> > > > >   - show attempt history in the subtask
> > > > >   - show attempt timeline
> > > > >   - add pending slots
> > > > >2. Task Manager Level
> > > > >   - add more metrics
> > > > >   - better log display
> > > > >3. Job Manager Level
> > > > >   - add metrics tab
> > > > >   - better log display
> > > > >
> > > > > To help everyone better understand the proposal, we spent efforts
> on
> > > > making
> > > > > an online POC .
> > > > >
> > > > > Now you can compare the difference between the new and old
> > Web/RestAPI
> > > > (the
> > > > > link is inside the doc)!
> > > > >
> > > > > Here is the latest FLIP-75 doc:
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit#
> > > > >
> > > > > Looking forward to your feedback
> > > > >
> > > > >
> > > > > Best,
> > > > > Yadong
> > > > >
> > > > > lining jing  于2019年10月24日周四 下午2:11写道：
> > > > >
> > > > > > Hi all, I have updated the backend design in FLIP-75
> > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit?usp=sharing
> > > > > > >
> > > > > > .
> > > > > >
> > > > > > Here are some brief introductions:
> > > > > >
> > > > > >- Add metric for manage memory FLINK-14406
> > > > > >.
> > > > > >- Expose TaskExecutor resource configurations to REST API
> > > > FLINK-14422
> > > > > >.
> > > > > >- Add TaskManagerResourceInfo in TaskManagerDetailsInfo to
> show
> > > > > >TaskManager Resource FLINK-14435
> > > > > >.
> > > > > >
> > > > > > I will continue to update the rest part of the backend design in
> > the
> > > > doc,
> > > > > > let's keep discuss here, any feedback is appreciated.
> > > > > >
> > > > > > Yadong Xie  于2019年9月27日周五 上午10:13写道：
> > > > > >
> > > > > > > Hi all
> > > > > > >
> > > > > > > Flink Web UI is the main platform for most users to monitor
> their
> > > > jobs
> > > > > > and
> > > > > > > clusters. We have reconstructed Flink web in 1.9.0 version, but
> > > there
> > > > > are
> > > > > > > still some shortcomings.
> > > > > > >
> > > > > > > This discussion thread aims to provide a better experience for
> > > Flink
> > > > UI
> > > > > > > users.
> > > > > > >
> > > > > > > Here is the design doc I drafted:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit?usp=sharing
> > > > > > >
> > > > > > >
> > > > > > > The FLIP can be found at [2].
> > > > > > >
> > > > > > > Please keep the discussion here, in the mailing list.
> > > > > > >
> > > > > > > Looking forward to your opinions, any feedbacks are welcome.
> > > > > > >
> > > > > > > [1]:
> > > > > > >

[jira] [Created] (FLINK-15863) Fix docs stating that savepoints are relocatable

2020-02-03 Thread Nico Kruber (Jira)

Nico Kruber created FLINK-15863:
---

 Summary: Fix docs stating that savepoints are relocatable
 Key: FLINK-15863
 URL: https://issues.apache.org/jira/browse/FLINK-15863
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.9.2, 1.10.0
Reporter: Nico Kruber


This section from 
https://ci.apache.org/projects/flink/flink-docs-stable/ops/upgrading.html#preconditions
 states that savepoints are relocatable which they are not yet (see 
FLINK-5763). It should be fixed and/or removed; I'm unsure what change from 1.3 
it should actually reflect.

{quote}Another important precondition is that for savepoints taken before Flink 
1.3.x, all the savepoint data must be accessible from the new installation and 
reside under the same absolute path. Before Flink 1.3.x, the savepoint data is 
typically not self-contained in just the created savepoint file. Additional 
files can be referenced from inside the savepoint file (e.g. the output from 
state backend snapshots). Since Flink 1.3.x, this is no longer a limitation; 
savepoints can be relocated using typical filesystem operations..{quote}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15864) Upgrade jackson-databind dependency to 2.10.1 for security reasons

2020-02-03 Thread Till Rohrmann (Jira)

Till Rohrmann created FLINK-15864:
-

 Summary: Upgrade jackson-databind dependency to 2.10.1 for 
security reasons
 Key: FLINK-15864
 URL: https://issues.apache.org/jira/browse/FLINK-15864
 Project: Flink
  Issue Type: Sub-task
  Components: Deployment / Kubernetes
Reporter: Till Rohrmann
 Fix For: 1.11.0, 1.10.1


The module {{flink-kubernetes}} defines an explicit dependency on 
{{jackson-databind:2.9.8}}. This is problematic since this jackson version 
contains security vulnerabilities. See FLINK-14104 for more information.

If possible, I would suggest to remove the explicit version tag and to rely on 
the parent's dependency management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] Release 1.10.0, release candidate #1

2020-02-03 Thread Gary Yao

Hi everyone,

I am hereby canceling the vote due to:

FLINK-15837
FLINK-15840

Another RC will be created later today.

Best,
Gary

On Mon, Jan 27, 2020 at 10:06 PM Gary Yao  wrote:

> Hi everyone,
> Please review and vote on the release candidate #1 for the version 1.10.0,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release and binary convenience releases to be
> deployed to dist.apache.org [2], which are signed with the key with
> fingerprint BB137807CEFBE7DD2616556710B12A1F89C115E8 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "release-1.10.0-rc1" [5],
>
> The announcement blog post is in the works. I will update this voting
> thread with a link to the pull request soon.
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Yu & Gary
>
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12345845
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.10.0-rc1/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4] https://repository.apache.org/content/repositories/orgapacheflink-1325
> [5] https://github.com/apache/flink/releases/tag/release-1.10.0-rc1
>

[jira] [Created] (FLINK-15865) When to add .uid() call: inconsistent definition of operators in Flink docs

2020-02-03 Thread Jun Qin (Jira)

Jun Qin created FLINK-15865:
---

 Summary: When to add .uid() call: inconsistent definition of 
operators in Flink docs
 Key: FLINK-15865
 URL: https://issues.apache.org/jira/browse/FLINK-15865
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.9.1
Reporter: Jun Qin


On one hand, the Flink doc suggests to add .uid() call for *all* operators in 
[1], on the other hand, it lists all operators in Flink [2]. The issues are:
 # KeyBy is listed as an operator, but .keyBy().uid() is not a valid call. This 
is same for window(), split(), etc.

 # addSource(), addSink() are not listed as operators, but we do expect user to 
call .uid() after addSource() and addSink(), especially in the exact-once 
scenario.

This creates confusions, esp., for beginners. There should be a better 
definition about which/what kind of operators can have a following uid() call.

[1] [Should I assign ids to all operators in my 
job|https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/state/savepoints.html#should-i-assign-ids-to-all-operators-in-my-job]
[2] [Flink 
Operators|https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/operators/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Update UpsertStreamTableSink and RetractStreamTableSink to new type system

2020-02-03 Thread Jark Wu

Thanks Zhenghua for starting this discussion.

Currently, all the UpsertStreamTableSinks can't upgrade to the new type
system which affects usability a lot.
I hope we can fix that in 1.11.

I'm find with *getRecordDataType* for a temporary solution.
IIUC, the framework will only recognize getRecordDataType and
ignore getConsumedDataType for UpsertStreamTableSink, is that right?

I guess Timo are planning to design a new source/sink interface which will
also fix this problem, but I'm not sure the timelines. cc @Timo
It would be better if we can have a new and complete interface, because
getRecordDataType is little confused as UpsertStreamTableSink already has
three getXXXType().

Best,
Jark


On Mon, 3 Feb 2020 at 17:48, Zhenghua Gao  wrote:

> Hi Jingsong,  For now, only UpsertStreamTableSink and
> RetractStreamTableSink consumes JTuple2
> So the 'getConsumedDataType' interface is not necessary in validate &
> codegen phase.
> See
>
> https://github.com/apache/flink/pull/10880/files#diff-137d090bd719f3a99ae8ba6603ed81ebR52
>  and
>
> https://github.com/apache/flink/pull/10880/files#diff-8405c17e5155aa63d16e497c4c96a842R304
>
> What about stay the same to use RAW type?
>
> *Best Regards,*
> *Zhenghua Gao*
>
>
> On Mon, Feb 3, 2020 at 4:59 PM Jingsong Li  wrote:
>
> > Hi Zhenghua,
> >
> > The *getRecordDataType* looks good to me.
> >
> > But the main problem is how to represent the tuple type in DataType. I
> > understand that it is necessary to use StructuredType, but at present,
> > planner does not support StructuredType, so the other way is to support
> > StructuredType.
> >
> > Best,
> > Jingsong Lee
> >
> > On Mon, Feb 3, 2020 at 4:49 PM Kurt Young  wrote:
> >
> > > Would overriding `getConsumedDataType` do the job?
> > >
> > > Best,
> > > Kurt
> > >
> > >
> > > On Mon, Feb 3, 2020 at 3:52 PM Zhenghua Gao  wrote:
> > >
> > >> Hi all,
> > >>
> > >> FLINK-12254[1] [2] updated TableSink and related interfaces to new
> type
> > >> system which
> > >> allows connectors use the new type system based on DataTypes.
> > >>
> > >> But FLINK-12911 port UpsertStreamTableSink and RetractStreamTableSink
> to
> > >> flink-api-java-bridge and returns TypeInformation of the requested
> > record
> > >> type which
> > >> can't support types with precision and scale, e.g. TIMESTAMP(p),
> > >> DECIMAL(p,s).
> > >>
> > >> /**
> > >>  * Returns the requested record type.
> > >>  */
> > >> TypeInformation getRecordType();
> > >>
> > >>
> > >> A proposal is deprecating the *getRecordType* API and adding a
> > >> *getRecordDataType* API instead to return the data type of the
> requested
> > >> record. I have filed the issue FLINK-15469 and
> > >> an initial PR to verify it.
> > >>
> > >> What do you think about this API changes? Any feedback are
> appreciated.
> > >> [1] https://issues.apache.org/jira/browse/FLINK-12254
> > >> [2] https://github.com/apache/flink/pull/8596
> > >> [3] https://issues.apache.org/jira/browse/FLINK-15469
> > >>
> > >> *Best Regards,*
> > >> *Zhenghua Gao*
> > >>
> > >
> >
> > --
> > Best, Jingsong Lee
> >
>

[jira] [Created] (FLINK-15866) ClosureCleaner#getSuperClassOrInterfaceName throw NPE

2020-02-03 Thread Aven Wu (Jira)

Aven Wu created FLINK-15866:
---

 Summary: ClosureCleaner#getSuperClassOrInterfaceName throw NPE
 Key: FLINK-15866
 URL: https://issues.apache.org/jira/browse/FLINK-15866
 Project: Flink
  Issue Type: Bug
  Components: API / Core
Affects Versions: 1.9.2
Reporter: Aven Wu
 Fix For: 1.10.0, 1.9.3


When param ‘cls’ is Object class, it will throw NPE.
{code:java}
Class superclass = cls.getSuperclass();{code}
superclass is null

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15867) LAST_VALUE aggregate function does not support time-related types

2020-02-03 Thread Jira

Benoît Paris created FLINK-15867:


 Summary: LAST_VALUE aggregate function does not support 
time-related types
 Key: FLINK-15867
 URL: https://issues.apache.org/jira/browse/FLINK-15867
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.9.2, 1.10.0
Reporter: Benoît Paris
 Attachments: flink-test-lastvalue-timestamp.zip

The following fails:
{code:java}
LAST_VALUE(TIMESTAMP '2020-02-03 16:17:20')
LAST_VALUE(DATE '2020-02-03')
LAST_VALUE(TIME '16:17:20')
LAST_VALUE(NOW()){code}
But this works:

 
{code:java}
LAST_VALUE(UNIX_TIMESTAMP()) 
{code}
Leading me to say it might be more a type/format issue, rather than an actual 
time processing issue.

Attached is java + pom + full stacktrace, for reproduction. Stacktrace part is 
below.

 

The ByteLastValueAggFunction, etc types seem trivial to implement, but the in 
the createLastValueAggFunction only basic types seem to be dealt with. Is there 
a reason more complicated LogicalTypeRoots might not be implemented ? (old vs 
new types?)

 

 

Caused by: org.apache.flink.table.api.TableException: LAST_VALUE aggregate 
function does not support type: ''TIMESTAMP_WITHOUT_TIME_ZONE''.Caused by: 
org.apache.flink.table.api.TableException: LAST_VALUE aggregate function does 
not support type: ''TIMESTAMP_WITHOUT_TIME_ZONE''.Please re-check the data 
type. at 
org.apache.flink.table.planner.plan.utils.AggFunctionFactory.createLastValueAggFunction(AggFunctionFactory.scala:617)
 at 
org.apache.flink.table.planner.plan.utils.AggFunctionFactory.createAggFunction(AggFunctionFactory.scala:113)
 at 
org.apache.flink.table.planner.plan.utils.AggregateUtil$$anonfun$9.apply(AggregateUtil.scala:285)
 at 
org.apache.flink.table.planner.plan.utils.AggregateUtil$$anonfun$9.apply(AggregateUtil.scala:279)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
org.apache.flink.table.planner.plan.utils.AggregateUtil$.transformToAggregateInfoList(AggregateUtil.scala:279)
 at 
org.apache.flink.table.planner.plan.utils.AggregateUtil$.transformToStreamAggregateInfoList(AggregateUtil.scala:228)
 at 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecGroupAggregate.(StreamExecGroupAggregate.scala:72)
 at 
org.apache.flink.table.planner.plan.rules.physical.stream.StreamExecGroupAggregateRule.convert(StreamExecGroupAggregateRule.scala:68)
 at 
org.apache.calcite.rel.convert.ConverterRule.onMatch(ConverterRule.java:139) at 
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:208)
 at 
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:631)
 at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:328) at 
org.apache.flink.table.planner.plan.optimize.program.FlinkVolcanoProgram.optimize(FlinkVolcanoProgram.scala:64)

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

2020-02-03 Thread Stephan Ewen

We have had much trouble in the past from "too deep too custom"
integrations that everyone got out of the box, i.e., Hadoop.
Flink has has such a broad spectrum of use cases, if we have custom build
for every other framework in that spectrum, we'll be in trouble.

So I would also be -1 for custom builds.

Couldn't we do something similar as we started doing for Hadoop? Moving
away from convenience downloads to allowing users to "export" their setup
for Flink?

  - We can have a "hive module (loader)" in flink/lib by default
  - The module loader would look for an environment variable like
"HIVE_CLASSPATH" and load these classes (ideally in a separate classloader).
  - The loader can search for certain classes and instantiate catalog /
functions / etc. when finding them instantiates the hive module referencing
them
  - That way, we use exactly what users have installed, without needing to
build our own bundles.

Could that work?

Best,
Stephan


On Wed, Dec 18, 2019 at 9:43 AM Till Rohrmann  wrote:

> Couldn't it simply be documented which jars are in the convenience jars
> which are pre built and can be downloaded from the website? Then people who
> need a custom version know which jars they need to provide to Flink?
>
> Cheers,
> Till
>
> On Tue, Dec 17, 2019 at 6:49 PM Bowen Li  wrote:
>
> > I'm not sure providing an uber jar would be possible.
> >
> > Different from kafka and elasticsearch connector who have dependencies
> for
> > a specific kafka/elastic version, or the kafka universal connector that
> > provides good compatibilities, hive connector needs to deal with hive
> jars
> > in all 1.x, 2.x, 3.x versions (let alone all the HDP/CDH distributions)
> > with incompatibility even between minor versions, different versioned
> > hadoop and other extra dependency jars for each hive version.
> >
> > Besides, users usually need to be able to easily see which individual
> jars
> > are required, which is invisible from an uber jar. Hive users already
> have
> > their hive deployments. They usually have to use their own hive jars
> > because, unlike hive jars on mvn, their own jars contain changes in-house
> > or from vendors. They need to easily tell which jars Flink requires for
> > corresponding open sourced hive version to their own hive deployment, and
> > copy in-hosue jars over from hive deployments as replacements.
> >
> > Providing a script to download all the individual jars for a specified
> hive
> > version can be an alternative.
> >
> > The goal is we need to provide a *product*, not a technology, to make it
> > less hassle for Hive users. Afterall, it's Flink embracing Hive community
> > and ecosystem, not the other way around. I'd argue Hive connector can be
> > treat differently because its community/ecosystem/userbase is much larger
> > than the other connectors, and it's way more important than other
> > connectors to Flink on the mission of becoming a batch/streaming unified
> > engine and get Flink more widely adopted.
> >
> >
> > On Sun, Dec 15, 2019 at 10:03 PM Danny Chan 
> wrote:
> >
> > > Also -1 on separate builds.
> > >
> > > After referencing some other BigData engines for distribution[1], i
> > didn't
> > > find strong needs to publish a separate build
> > > for just a separate Hive version, indeed there are builds for different
> > > Hadoop version.
> > >
> > > Just like Seth and Aljoscha said, we could push a
> > > flink-hive-version-uber.jar to use as a lib of SQL-CLI or other use
> > cases.
> > >
> > > [1] https://spark.apache.org/downloads.html
> > > [2]
> > https://www.elastic.co/guide/en/elasticsearch/hadoop/current/hive.html
> > >
> > > Best,
> > > Danny Chan
> > > 在 2019年12月14日 +0800 AM3:03，dev@flink.apache.org，写道：
> > > >
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html#dependencies
> > >
> >
>

Re: [ANNOUNCE] Yu Li became a Flink committer

2020-02-03 Thread Henry Saputra

Belated congrats to Yu Li

- Henry

On Thu, Jan 23, 2020 at 12:59 AM Stephan Ewen  wrote:

> Hi all!
>
> We are announcing that Yu Li has joined the rank of Flink committers.
>
> Yu joined already in late December, but the announcement got lost because
> of the Christmas and New Years season, so here is a belated proper
> announcement.
>
> Yu is one of the main contributors to the state backend components in the
> recent year, working on various improvements, for example the RocksDB
> memory management for 1.10.
> He has also been one of the release managers for the big 1.10 release.
>
> Congrats for joining us, Yu!
>
> Best,
> Stephan
>

[jira] [Created] (FLINK-15868) Kinesis consumer fails due to jackson-cbor conflict in 1.10 RC1

2020-02-03 Thread Thomas Weise (Jira)

Thomas Weise created FLINK-15868:


 Summary: Kinesis consumer fails due to jackson-cbor conflict in 
1.10 RC1
 Key: FLINK-15868
 URL: https://issues.apache.org/jira/browse/FLINK-15868
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kinesis
Affects Versions: 1.10.0
Reporter: Thomas Weise
Assignee: Thomas Weise
 Fix For: 1.10.0


There appears to be an issue with incompatible dependencies being shaded in the 
connector. This only happens when running against the actual Kinesis service 
(i.e. when CBOR isn't disabled).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] Release 1.10.0, release candidate #1

2020-02-03 Thread Thomas Weise

I found another issue with the Kinesis connector:

https://issues.apache.org/jira/browse/FLINK-15868


On Mon, Feb 3, 2020 at 3:35 AM Gary Yao  wrote:

> Hi everyone,
>
> I am hereby canceling the vote due to:
>
> FLINK-15837
> FLINK-15840
>
> Another RC will be created later today.
>
> Best,
> Gary
>
> On Mon, Jan 27, 2020 at 10:06 PM Gary Yao  wrote:
>
> > Hi everyone,
> > Please review and vote on the release candidate #1 for the version
> 1.10.0,
> > as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],
> > * the official Apache source release and binary convenience releases to
> be
> > deployed to dist.apache.org [2], which are signed with the key with
> > fingerprint BB137807CEFBE7DD2616556710B12A1F89C115E8 [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag "release-1.10.0-rc1" [5],
> >
> > The announcement blog post is in the works. I will update this voting
> > thread with a link to the pull request soon.
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Thanks,
> > Yu & Gary
> >
> > [1]
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12345845
> > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.10.0-rc1/
> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [4]
> https://repository.apache.org/content/repositories/orgapacheflink-1325
> > [5] https://github.com/apache/flink/releases/tag/release-1.10.0-rc1
> >
>

RocksDB Compaction filter to clean up state with TTL

2020-02-03 Thread Seth, Abhilasha

Hello,
Flink 1.8 introduces the config 
‘state.backend.rocksdb.ttl.compaction.filter.enabled’ to enable or disable the 
compaction filter to cleanup state with TTL. I was curious why its disabled by 
default. Are there any performance implications of turning it ON by default?
Thanks,
Abhilasha

[jira] [Created] (FLINK-15869) Suppoort to read and write meta information of a table

2020-02-03 Thread Jark Wu (Jira)

Jark Wu created FLINK-15869:
---

 Summary: Suppoort to read and write meta information of a table
 Key: FLINK-15869
 URL: https://issues.apache.org/jira/browse/FLINK-15869
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / API
Reporter: Jark Wu


Suppoort to read and write meta information of connectors, for example, 
currently we only support to read&write value part of Kafka, we should also 
support to read&wirte key part and 
timestamp, partition, topic meta information.

This an umbrella issue for the feature. It can be used to collect initial ideas 
and use cases until a FLIP is proposed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [ANNOUNCE] Yu Li became a Flink committer

2020-02-03 Thread Danny Chan

Congratulations Yu!

Best,
Danny Chan
在 2020年2月4日 +0800 AM1:46，dev@flink.apache.org，写道：
>
> Congratulations Yu!

Re: [VOTE] Release 1.10.0, release candidate #1

2020-02-03 Thread Thomas Weise

I opened a PR for FLINK-15868
:
https://github.com/apache/flink/pull/11006

With that change, I was able to run an application that consumes from
Kinesis.

I should have data tomorrow regarding the performance.

Two questions/observations:

1) Is the low watermark display in the UI still broken?
2) Was there a change in how job recovery reflects in the uptime metric?
Didn't uptime previously reset to 0 on recovery (now it just keeps
increasing)?

Thanks,
Thomas




On Mon, Feb 3, 2020 at 10:55 AM Thomas Weise  wrote:

> I found another issue with the Kinesis connector:
>
> https://issues.apache.org/jira/browse/FLINK-15868
>
>
> On Mon, Feb 3, 2020 at 3:35 AM Gary Yao  wrote:
>
>> Hi everyone,
>>
>> I am hereby canceling the vote due to:
>>
>> FLINK-15837
>> FLINK-15840
>>
>> Another RC will be created later today.
>>
>> Best,
>> Gary
>>
>> On Mon, Jan 27, 2020 at 10:06 PM Gary Yao  wrote:
>>
>> > Hi everyone,
>> > Please review and vote on the release candidate #1 for the version
>> 1.10.0,
>> > as follows:
>> > [ ] +1, Approve the release
>> > [ ] -1, Do not approve the release (please provide specific comments)
>> >
>> >
>> > The complete staging area is available for your review, which includes:
>> > * JIRA release notes [1],
>> > * the official Apache source release and binary convenience releases to
>> be
>> > deployed to dist.apache.org [2], which are signed with the key with
>> > fingerprint BB137807CEFBE7DD2616556710B12A1F89C115E8 [3],
>> > * all artifacts to be deployed to the Maven Central Repository [4],
>> > * source code tag "release-1.10.0-rc1" [5],
>> >
>> > The announcement blog post is in the works. I will update this voting
>> > thread with a link to the pull request soon.
>> >
>> > The vote will be open for at least 72 hours. It is adopted by majority
>> > approval, with at least 3 PMC affirmative votes.
>> >
>> > Thanks,
>> > Yu & Gary
>> >
>> > [1]
>> >
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12345845
>> > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.10.0-rc1/
>> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
>> > [4]
>> https://repository.apache.org/content/repositories/orgapacheflink-1325
>> > [5] https://github.com/apache/flink/releases/tag/release-1.10.0-rc1
>> >
>>
>

Re: Large intervaljoin related question

2020-02-03 Thread Jark Wu

Hi Chen,

AFAIK, DataStream doesn't have too much operator level optimization for
large window and interval join.
The suggested best practices are
 - please do not slide a window with very small step
 - use rocksdb statebackend on SSD for large state.
 - increase parallelism for the operators of large state

The pane optmization which is mentioned in the FLINK-7001 has partially
implemented in Flink SQL blink planner's window operator. You can try it
there.

Regarding the rocksdb statebackend tuning, I'm not an expert on rocksdb
statebackend. Hence, I'm pulling in Yu Li who might help you.

Best,
Jark

On Sat, 14 Dec 2019 at 03:24, Chen Qin  wrote:

> Hi there,
>
> We had seen growing interest of using large window and interval join
> operation. What is recommended way of handling these use cases?(e.g
> DeltaLake in Spark)
> After some benchmark, we found performance seems a bottleneck (still) on
> support those use cases.
> How is performance improvement
> https://issues.apache.org/jira/browse/FLINK-7001 <
> https://issues.apache.org/jira/browse/FLINK-7001> going?
>
> In tuning side, we plan to test giving larger blob cache on rocskdb side
> ~4GB, will this help?
> Otherwise, we plan to write to external hive table (seems no partition
> supported yet) and run frequent ETL job there.
>
>
> Thanks,
> Chen

Re: [VOTE] FLIP-27 - Refactor Source Interface

2020-02-03 Thread Jark Wu

Thanks for driving this Becket!

+1 from my side.

Cheers,
Jark

On Mon, 3 Feb 2020 at 18:06, Yu Li  wrote:

> +1, thanks for the efforts Becket!
>
> Best Regards,
> Yu
>
>
> On Mon, 3 Feb 2020 at 17:52, Becket Qin  wrote:
>
> > Bump up the thread.
> >
> > On Tue, Jan 21, 2020 at 10:43 AM Becket Qin 
> wrote:
> >
> > > Hi Folks,
> > >
> > > I'd like to resume the voting thread for FlIP-27.
> > >
> > > Please note that the FLIP wiki has been updated to reflect the latest
> > > discussions in the discussion thread.
> > >
> > > To avoid confusion, I'll only count the votes casted after this point.
> > >
> > > FLIP wiki:
> > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-27
> > > %3A+Refactor+Source+Interface
> > >
> > > Discussion thread:
> > >
> > >
> >
> https://lists.apache.org/thread.html/70484d6aa4b8e7121181ed8d5857a94bfb7d5a76334b9c8fcc59700c%40%3Cdev.flink.apache.org%3E
> > >
> > > The vote will last for at least 72 hours, following the consensus
> voting
> > >  process.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Thu, Dec 5, 2019 at 10:31 AM jincheng sun  >
> > > wrote:
> > >
> > >> +1 (binding), and looking forward to seeing the new interface in the
> > >> master.
> > >>
> > >> Best,
> > >> Jincheng
> > >>
> > >> Becket Qin  于2019年12月5日周四 上午8:05写道：
> > >>
> > >> > Hi all,
> > >> >
> > >> > I would like to start the vote for FLIP-27 which proposes to
> > introduce a
> > >> > new Source connector interface to address a few problems in the
> > existing
> > >> > source connector. The main goals of the the FLIP are following:
> > >> >
> > >> > 1. Unify the Source interface in Flink for batch and stream.
> > >> > 2. Significantly reduce the work for developers to develop new
> source
> > >> > connectors.
> > >> > 3. Provide a common abstraction for all the sources, as well as a
> > >> mechanism
> > >> > to allow source subtasks to coordinate among themselves.
> > >> >
> > >> > The vote will last for at least 72 hours, following the consensus
> > voting
> > >> > process.
> > >> >
> > >> > FLIP wiki:
> > >> >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> > >> >
> > >> > Discussion thread:
> > >> >
> > >> >
> > >>
> >
> https://lists.apache.org/thread.html/70484d6aa4b8e7121181ed8d5857a94bfb7d5a76334b9c8fcc59700c@%3Cdev.flink.apache.org%3E
> > >> >
> > >> > Thanks,
> > >> >
> > >> > Jiangjie (Becket) Qin
> > >> >
> > >>
> > >
> >
>

[jira] [Created] (FLINK-15870) Wait the job's terminal state through REST interface

2020-02-03 Thread Yangze Guo (Jira)

Yangze Guo created FLINK-15870:
--

 Summary: Wait the job's terminal state through REST interface
 Key: FLINK-15870
 URL: https://issues.apache.org/jira/browse/FLINK-15870
 Project: Flink
  Issue Type: Improvement
  Components: Test Infrastructure
Reporter: Yangze Guo
 Fix For: 1.11.0


As discussed in [this 
PR|[https://github.com/apache/flink/pull/10746#discussion_r374127425],|https://github.com/apache/flink/pull/10746#discussion_r374127425].]
 it's better to judge the job's terminal state through the REST interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15871) Support to start sidecar container

2020-02-03 Thread Yang Wang (Jira)

Yang Wang created FLINK-15871:
-

 Summary: Support to start sidecar container
 Key: FLINK-15871
 URL: https://issues.apache.org/jira/browse/FLINK-15871
 Project: Flink
  Issue Type: Sub-task
  Components: Deployment / Kubernetes
Affects Versions: 1.10.0
Reporter: Yang Wang
 Fix For: 1.11.0


>> How does sidecar container work?

A sidecar container is running beside the Jobmanager and TaskManager container. 
It could be used to collect log or debug some problems. For example, when we 
configure the sidecar container to fluentd and share the TaskManager log with 
volume, then it could be used to upload the logs to HDFS, elastic search, etc. 
Also we could start a sidecar container with debugging image which contains 
lots of tools and help to debug the network problems.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Update UpsertStreamTableSink and RetractStreamTableSink to new type system

2020-02-03 Thread Zhenghua Gao

Should we distinguish *record data type* and *consumed data type*?
Currently the design of UpsertStreamTableSink and RetractStreamTableSink
DO  distinguish them.

In my proposal the framework will ignore *getConsumedDataType*,
so it's ok to use *getConsumedDataType* to do the job if we
don't distinguish *record data type* and *consumed data type*.

*Best Regards,*
*Zhenghua Gao*


On Mon, Feb 3, 2020 at 4:49 PM Kurt Young  wrote:

> Would overriding `getConsumedDataType` do the job?
>
> Best,
> Kurt
>
>
> On Mon, Feb 3, 2020 at 3:52 PM Zhenghua Gao  wrote:
>
> > Hi all,
> >
> > FLINK-12254[1] [2] updated TableSink and related interfaces to new type
> > system which
> > allows connectors use the new type system based on DataTypes.
> >
> > But FLINK-12911 port UpsertStreamTableSink and RetractStreamTableSink to
> > flink-api-java-bridge and returns TypeInformation of the requested record
> > type which
> > can't support types with precision and scale, e.g. TIMESTAMP(p),
> > DECIMAL(p,s).
> >
> > /**
> >  * Returns the requested record type.
> >  */
> > TypeInformation getRecordType();
> >
> >
> > A proposal is deprecating the *getRecordType* API and adding a
> > *getRecordDataType* API instead to return the data type of the requested
> > record. I have filed the issue FLINK-15469 and
> > an initial PR to verify it.
> >
> > What do you think about this API changes? Any feedback are appreciated.
> > [1] https://issues.apache.org/jira/browse/FLINK-12254
> > [2] https://github.com/apache/flink/pull/8596
> > [3] https://issues.apache.org/jira/browse/FLINK-15469
> >
> > *Best Regards,*
> > *Zhenghua Gao*
> >
>

Re: [DISCUSS] Update UpsertStreamTableSink and RetractStreamTableSink to new type system

2020-02-03 Thread Zhenghua Gao

Hi Jark, thanks for your comments.
>>>IIUC, the framework will only recognize getRecordDataType and
>>>ignore getConsumedDataType for UpsertStreamTableSink, is that right?
Your are right.

>>>getRecordDataType is little confused as UpsertStreamTableSink already has
>>>three getXXXType().
the getRecordType and getOutputType is deprecated and mainly for backward
compatibility.

*Best Regards,*
*Zhenghua Gao*


On Mon, Feb 3, 2020 at 10:11 PM Jark Wu  wrote:

> Thanks Zhenghua for starting this discussion.
>
> Currently, all the UpsertStreamTableSinks can't upgrade to the new type
> system which affects usability a lot.
> I hope we can fix that in 1.11.
>
> I'm find with *getRecordDataType* for a temporary solution.
> IIUC, the framework will only recognize getRecordDataType and
> ignore getConsumedDataType for UpsertStreamTableSink, is that right?
>
> I guess Timo are planning to design a new source/sink interface which will
> also fix this problem, but I'm not sure the timelines. cc @Timo
> It would be better if we can have a new and complete interface, because
> getRecordDataType is little confused as UpsertStreamTableSink already has
> three getXXXType().
>
> Best,
> Jark
>
>
> On Mon, 3 Feb 2020 at 17:48, Zhenghua Gao  wrote:
>
> > Hi Jingsong,  For now, only UpsertStreamTableSink and
> > RetractStreamTableSink consumes JTuple2
> > So the 'getConsumedDataType' interface is not necessary in validate &
> > codegen phase.
> > See
> >
> >
> https://github.com/apache/flink/pull/10880/files#diff-137d090bd719f3a99ae8ba6603ed81ebR52
> >  and
> >
> >
> https://github.com/apache/flink/pull/10880/files#diff-8405c17e5155aa63d16e497c4c96a842R304
> >
> > What about stay the same to use RAW type?
> >
> > *Best Regards,*
> > *Zhenghua Gao*
> >
> >
> > On Mon, Feb 3, 2020 at 4:59 PM Jingsong Li 
> wrote:
> >
> > > Hi Zhenghua,
> > >
> > > The *getRecordDataType* looks good to me.
> > >
> > > But the main problem is how to represent the tuple type in DataType. I
> > > understand that it is necessary to use StructuredType, but at present,
> > > planner does not support StructuredType, so the other way is to support
> > > StructuredType.
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Mon, Feb 3, 2020 at 4:49 PM Kurt Young  wrote:
> > >
> > > > Would overriding `getConsumedDataType` do the job?
> > > >
> > > > Best,
> > > > Kurt
> > > >
> > > >
> > > > On Mon, Feb 3, 2020 at 3:52 PM Zhenghua Gao 
> wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> FLINK-12254[1] [2] updated TableSink and related interfaces to new
> > type
> > > >> system which
> > > >> allows connectors use the new type system based on DataTypes.
> > > >>
> > > >> But FLINK-12911 port UpsertStreamTableSink and
> RetractStreamTableSink
> > to
> > > >> flink-api-java-bridge and returns TypeInformation of the requested
> > > record
> > > >> type which
> > > >> can't support types with precision and scale, e.g. TIMESTAMP(p),
> > > >> DECIMAL(p,s).
> > > >>
> > > >> /**
> > > >>  * Returns the requested record type.
> > > >>  */
> > > >> TypeInformation getRecordType();
> > > >>
> > > >>
> > > >> A proposal is deprecating the *getRecordType* API and adding a
> > > >> *getRecordDataType* API instead to return the data type of the
> > requested
> > > >> record. I have filed the issue FLINK-15469 and
> > > >> an initial PR to verify it.
> > > >>
> > > >> What do you think about this API changes? Any feedback are
> > appreciated.
> > > >> [1] https://issues.apache.org/jira/browse/FLINK-12254
> > > >> [2] https://github.com/apache/flink/pull/8596
> > > >> [3] https://issues.apache.org/jira/browse/FLINK-15469
> > > >>
> > > >> *Best Regards,*
> > > >> *Zhenghua Gao*
> > > >>
> > > >
> > >
> > > --
> > > Best, Jingsong Lee
> > >
> >
>

Re: [VOTE] FLIP-27 - Refactor Source Interface

+1 (non-binding), thanks for driving.
FLIP-27 is the basis of a lot of follow-up work.

Best,
Jingsong Lee

On Tue, Feb 4, 2020 at 10:26 AM Jark Wu  wrote:

> Thanks for driving this Becket!
>
> +1 from my side.
>
> Cheers,
> Jark
>
> On Mon, 3 Feb 2020 at 18:06, Yu Li  wrote:
>
> > +1, thanks for the efforts Becket!
> >
> > Best Regards,
> > Yu
> >
> >
> > On Mon, 3 Feb 2020 at 17:52, Becket Qin  wrote:
> >
> > > Bump up the thread.
> > >
> > > On Tue, Jan 21, 2020 at 10:43 AM Becket Qin 
> > wrote:
> > >
> > > > Hi Folks,
> > > >
> > > > I'd like to resume the voting thread for FlIP-27.
> > > >
> > > > Please note that the FLIP wiki has been updated to reflect the latest
> > > > discussions in the discussion thread.
> > > >
> > > > To avoid confusion, I'll only count the votes casted after this
> point.
> > > >
> > > > FLIP wiki:
> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-27
> > > > %3A+Refactor+Source+Interface
> > > >
> > > > Discussion thread:
> > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/70484d6aa4b8e7121181ed8d5857a94bfb7d5a76334b9c8fcc59700c%40%3Cdev.flink.apache.org%3E
> > > >
> > > > The vote will last for at least 72 hours, following the consensus
> > voting
> > > >  process.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Thu, Dec 5, 2019 at 10:31 AM jincheng sun <
> sunjincheng...@gmail.com
> > >
> > > > wrote:
> > > >
> > > >> +1 (binding), and looking forward to seeing the new interface in the
> > > >> master.
> > > >>
> > > >> Best,
> > > >> Jincheng
> > > >>
> > > >> Becket Qin  于2019年12月5日周四 上午8:05写道：
> > > >>
> > > >> > Hi all,
> > > >> >
> > > >> > I would like to start the vote for FLIP-27 which proposes to
> > > introduce a
> > > >> > new Source connector interface to address a few problems in the
> > > existing
> > > >> > source connector. The main goals of the the FLIP are following:
> > > >> >
> > > >> > 1. Unify the Source interface in Flink for batch and stream.
> > > >> > 2. Significantly reduce the work for developers to develop new
> > source
> > > >> > connectors.
> > > >> > 3. Provide a common abstraction for all the sources, as well as a
> > > >> mechanism
> > > >> > to allow source subtasks to coordinate among themselves.
> > > >> >
> > > >> > The vote will last for at least 72 hours, following the consensus
> > > voting
> > > >> > process.
> > > >> >
> > > >> > FLIP wiki:
> > > >> >
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> > > >> >
> > > >> > Discussion thread:
> > > >> >
> > > >> >
> > > >>
> > >
> >
> https://lists.apache.org/thread.html/70484d6aa4b8e7121181ed8d5857a94bfb7d5a76334b9c8fcc59700c@%3Cdev.flink.apache.org%3E
> > > >> >
> > > >> > Thanks,
> > > >> >
> > > >> > Jiangjie (Becket) Qin
> > > >> >
> > > >>
> > > >
> > >
> >
>


-- 
Best, Jingsong Lee

Re: [VOTE] Release 1.10.0, release candidate #1

Another critical issue is FLINK-15858[1].
It is indeed a regression. But we don''t want to block release.
Will try our best to fix it.

[1] https://issues.apache.org/jira/browse/FLINK-15858

Best,
Jingsong Lee

On Tue, Feb 4, 2020 at 9:56 AM Thomas Weise  wrote:

> I opened a PR for FLINK-15868
> :
> https://github.com/apache/flink/pull/11006
>
> With that change, I was able to run an application that consumes from
> Kinesis.
>
> I should have data tomorrow regarding the performance.
>
> Two questions/observations:
>
> 1) Is the low watermark display in the UI still broken?
> 2) Was there a change in how job recovery reflects in the uptime metric?
> Didn't uptime previously reset to 0 on recovery (now it just keeps
> increasing)?
>
> Thanks,
> Thomas
>
>
>
>
> On Mon, Feb 3, 2020 at 10:55 AM Thomas Weise  wrote:
>
> > I found another issue with the Kinesis connector:
> >
> > https://issues.apache.org/jira/browse/FLINK-15868
> >
> >
> > On Mon, Feb 3, 2020 at 3:35 AM Gary Yao  wrote:
> >
> >> Hi everyone,
> >>
> >> I am hereby canceling the vote due to:
> >>
> >> FLINK-15837
> >> FLINK-15840
> >>
> >> Another RC will be created later today.
> >>
> >> Best,
> >> Gary
> >>
> >> On Mon, Jan 27, 2020 at 10:06 PM Gary Yao  wrote:
> >>
> >> > Hi everyone,
> >> > Please review and vote on the release candidate #1 for the version
> >> 1.10.0,
> >> > as follows:
> >> > [ ] +1, Approve the release
> >> > [ ] -1, Do not approve the release (please provide specific comments)
> >> >
> >> >
> >> > The complete staging area is available for your review, which
> includes:
> >> > * JIRA release notes [1],
> >> > * the official Apache source release and binary convenience releases
> to
> >> be
> >> > deployed to dist.apache.org [2], which are signed with the key with
> >> > fingerprint BB137807CEFBE7DD2616556710B12A1F89C115E8 [3],
> >> > * all artifacts to be deployed to the Maven Central Repository [4],
> >> > * source code tag "release-1.10.0-rc1" [5],
> >> >
> >> > The announcement blog post is in the works. I will update this voting
> >> > thread with a link to the pull request soon.
> >> >
> >> > The vote will be open for at least 72 hours. It is adopted by majority
> >> > approval, with at least 3 PMC affirmative votes.
> >> >
> >> > Thanks,
> >> > Yu & Gary
> >> >
> >> > [1]
> >> >
> >>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12345845
> >> > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.10.0-rc1/
> >> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> >> > [4]
> >> https://repository.apache.org/content/repositories/orgapacheflink-1325
> >> > [5] https://github.com/apache/flink/releases/tag/release-1.10.0-rc1
> >> >
> >>
> >
>


-- 
Best, Jingsong Lee

[jira] [Created] (FLINK-15872) Remove unnecessary code in InputFormat

2020-02-03 Thread Paul Lin (Jira)

Paul Lin created FLINK-15872:


 Summary: Remove unnecessary code in InputFormat
 Key: FLINK-15872
 URL: https://issues.apache.org/jira/browse/FLINK-15872
 Project: Flink
  Issue Type: Improvement
  Components: API / Core
Reporter: Paul Lin


InputFormat unnecessarily overrides  
InputSplitSource#getInputSplitAssigner(T[]) with very the same method signature 
and a mismatched javadoc. We could remove the method to improve the codes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] FLIP-27 - Refactor Source Interface

2020-02-03 Thread Danny Chan

+1 (non-binding), thanks for the great work !

Best,
Danny Chan
在 2020年2月4日 +0800 AM11:20，dev@flink.apache.org，写道：
>
> +1 (non-binding), thanks for driving.
> FLIP-27 is the basis of a lot of follow-up work.

[DISCUSS] Upload the Flink Python API 1.9.x to PyPI for user convenience.

2020-02-03 Thread jincheng sun

Hi folks,

I am very happy to receive some user inquiries about the use of Flink
Python API (PyFlink) recently. One of the more common questions is whether
it is possible to install PyFlink without using source code build. The most
convenient and natural way for users is to use `pip install apache-flink`.
We originally planned to support the use of `pip install apache-flink` in
Flink 1.10, but the reason for this decision was that when Flink 1.9 was
released at August 22, 2019[1], Flink's PyPI account system was not ready.
At present, our PyPI account is available at October 09, 2019 [2](Only PMC
can access), So for the convenience of users I propose:

- Publish the latest release version (1.9.2) of Flink 1.9 to PyPI.
- Update Flink 1.9 documentation to add support for `pip install`.

As we all know, Flink 1.9.2 was just completed released at January 31, 2020
[3]. There is still at least 1 to 2 months before the release of 1.9.3, so
my proposal is completely considered from the perspective of user
convenience. Although the proposed work is not large, we have not set a
precedent for independent release of the Flink Python API(PyFlink) in the
previous release process. I hereby initiate the current discussion and look
forward to your feedback!

Best,
Jincheng

[1]
https://lists.apache.org/thread.html/4a4d23c449f26b66bc58c71cc1a5c6079c79b5049c6c6744224c5f46%40%3Cdev.flink.apache.org%3E
[2]
https://lists.apache.org/thread.html/8273a5e8834b788d8ae552a5e177b69e04e96c0446bb90979444deee%40%3Cprivate.flink.apache.org%3E
[3]
https://lists.apache.org/thread.html/ra27644a4e111476b6041e8969def4322f47d5e0aae8da3ef30cd2926%40%3Cdev.flink.apache.org%3E

[jira] [Created] (FLINK-15873) Matched result may not be output if existing earlier partial matches

2020-02-03 Thread shuai.xu (Jira)

shuai.xu created FLINK-15873:


 Summary: Matched result may not be output if existing earlier 
partial matches
 Key: FLINK-15873
 URL: https://issues.apache.org/jira/browse/FLINK-15873
 Project: Flink
  Issue Type: Bug
  Components: Library / CEP
Affects Versions: 1.9.0
Reporter: shuai.xu


When running some cep jobs with skip strategy, I found that when we get a 
matched result, but if there is an earlier partial matches, the result will not 
be returned.

I think this is due to a bug in processMatchesAccordingToSkipStrategy() in NFA 
class. It should return matched result without judging whether this is partial 
matches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] FLIP-27 - Refactor Source Interface

2020-02-03 Thread Wei Zhong

+1 (non-binding), thanks for driving this!

Best,
Wei

> 在 2020年2月4日，11:47，Danny Chan  写道：
> 
> +1 (non-binding), thanks for the great work !
> 
> Best,
> Danny Chan
> 在 2020年2月4日 +0800 AM11:20，dev@flink.apache.org，写道：
>> 
>> +1 (non-binding), thanks for driving.
>> FLIP-27 is the basis of a lot of follow-up work.

[jira] [Created] (FLINK-15874) Setup Travis cron job for Stateful Functions documentation build

2020-02-03 Thread Tzu-Li (Gordon) Tai (Jira)

Tzu-Li (Gordon) Tai created FLINK-15874:
---

 Summary: Setup Travis cron job for Stateful Functions 
documentation build
 Key: FLINK-15874
 URL: https://issues.apache.org/jira/browse/FLINK-15874
 Project: Flink
  Issue Type: Sub-task
  Components: Stateful Functions, Test Infrastructure
Affects Versions: statefun-1.1
Reporter: Tzu-Li (Gordon) Tai


We should consider setting up a Travis cron for the Stateful Functions 
documentation, as it has silently broken a few times already.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15875) Bump Beam to 2.19.0

2020-02-03 Thread Dian Fu (Jira)

Dian Fu created FLINK-15875:
---

 Summary: Bump Beam to 2.19.0
 Key: FLINK-15875
 URL: https://issues.apache.org/jira/browse/FLINK-15875
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Dian Fu
 Fix For: 1.11.0


Currently PyFlink depends on Beam's portability framework for Python UDF 
execution. The current dependent version is 2.15.0. We should bump it to 2.19.0 
as it includes several critical features/fixes needed, e.g.
1) BEAM-7951: It allows to not serialize the window/timestamp/pane info between 
the Java operator and the Python worker which could definitely improve the 
performance a lot
2) BEAM-8935: It allows to fail fast if the Python worker start up failed. 
Currently it takes 2 minutes to detect whether the Python worker is started 
successfully. 
3) BEAM-7948: It supports periodically flush the data between the Java operator 
and the Python worker. This's especially useful for streaming jobs and could 
improve the latency.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] FLIP-27 - Refactor Source Interface

2020-02-03 Thread Guowei Ma

+1 (non-binding), thanks for driving.

Best,
Guowei


Jingsong Li  于2020年2月4日周二 上午11:20写道：

> +1 (non-binding), thanks for driving.
> FLIP-27 is the basis of a lot of follow-up work.
>
> Best,
> Jingsong Lee
>
> On Tue, Feb 4, 2020 at 10:26 AM Jark Wu  wrote:
>
> > Thanks for driving this Becket!
> >
> > +1 from my side.
> >
> > Cheers,
> > Jark
> >
> > On Mon, 3 Feb 2020 at 18:06, Yu Li  wrote:
> >
> > > +1, thanks for the efforts Becket!
> > >
> > > Best Regards,
> > > Yu
> > >
> > >
> > > On Mon, 3 Feb 2020 at 17:52, Becket Qin  wrote:
> > >
> > > > Bump up the thread.
> > > >
> > > > On Tue, Jan 21, 2020 at 10:43 AM Becket Qin 
> > > wrote:
> > > >
> > > > > Hi Folks,
> > > > >
> > > > > I'd like to resume the voting thread for FlIP-27.
> > > > >
> > > > > Please note that the FLIP wiki has been updated to reflect the
> latest
> > > > > discussions in the discussion thread.
> > > > >
> > > > > To avoid confusion, I'll only count the votes casted after this
> > point.
> > > > >
> > > > > FLIP wiki:
> > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-27
> > > > > %3A+Refactor+Source+Interface
> > > > >
> > > > > Discussion thread:
> > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/70484d6aa4b8e7121181ed8d5857a94bfb7d5a76334b9c8fcc59700c%40%3Cdev.flink.apache.org%3E
> > > > >
> > > > > The vote will last for at least 72 hours, following the consensus
> > > voting
> > > > >  process.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > On Thu, Dec 5, 2019 at 10:31 AM jincheng sun <
> > sunjincheng...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > >> +1 (binding), and looking forward to seeing the new interface in
> the
> > > > >> master.
> > > > >>
> > > > >> Best,
> > > > >> Jincheng
> > > > >>
> > > > >> Becket Qin  于2019年12月5日周四 上午8:05写道：
> > > > >>
> > > > >> > Hi all,
> > > > >> >
> > > > >> > I would like to start the vote for FLIP-27 which proposes to
> > > > introduce a
> > > > >> > new Source connector interface to address a few problems in the
> > > > existing
> > > > >> > source connector. The main goals of the the FLIP are following:
> > > > >> >
> > > > >> > 1. Unify the Source interface in Flink for batch and stream.
> > > > >> > 2. Significantly reduce the work for developers to develop new
> > > source
> > > > >> > connectors.
> > > > >> > 3. Provide a common abstraction for all the sources, as well as
> a
> > > > >> mechanism
> > > > >> > to allow source subtasks to coordinate among themselves.
> > > > >> >
> > > > >> > The vote will last for at least 72 hours, following the
> consensus
> > > > voting
> > > > >> > process.
> > > > >> >
> > > > >> > FLIP wiki:
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> > > > >> >
> > > > >> > Discussion thread:
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://lists.apache.org/thread.html/70484d6aa4b8e7121181ed8d5857a94bfb7d5a76334b9c8fcc59700c@%3Cdev.flink.apache.org%3E
> > > > >> >
> > > > >> > Thanks,
> > > > >> >
> > > > >> > Jiangjie (Becket) Qin
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>
>
> --
> Best, Jingsong Lee
>

Re: [VOTE] FLIP-27 - Refactor Source Interface

2020-02-03 Thread Zhijiang

+1 (binding), we are waiting too long for it. :)

Best,
Zhijiang


--
From:Guowei Ma 
Send Time:2020 Feb. 4 (Tue.) 12:34
To:dev 
Subject:Re: [VOTE] FLIP-27 - Refactor Source Interface

+1 (non-binding), thanks for driving.

Best,
Guowei


Jingsong Li  于2020年2月4日周二 上午11:20写道：

> +1 (non-binding), thanks for driving.
> FLIP-27 is the basis of a lot of follow-up work.
>
> Best,
> Jingsong Lee
>
> On Tue, Feb 4, 2020 at 10:26 AM Jark Wu  wrote:
>
> > Thanks for driving this Becket!
> >
> > +1 from my side.
> >
> > Cheers,
> > Jark
> >
> > On Mon, 3 Feb 2020 at 18:06, Yu Li  wrote:
> >
> > > +1, thanks for the efforts Becket!
> > >
> > > Best Regards,
> > > Yu
> > >
> > >
> > > On Mon, 3 Feb 2020 at 17:52, Becket Qin  wrote:
> > >
> > > > Bump up the thread.
> > > >
> > > > On Tue, Jan 21, 2020 at 10:43 AM Becket Qin 
> > > wrote:
> > > >
> > > > > Hi Folks,
> > > > >
> > > > > I'd like to resume the voting thread for FlIP-27.
> > > > >
> > > > > Please note that the FLIP wiki has been updated to reflect the
> latest
> > > > > discussions in the discussion thread.
> > > > >
> > > > > To avoid confusion, I'll only count the votes casted after this
> > point.
> > > > >
> > > > > FLIP wiki:
> > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-27
> > > > > %3A+Refactor+Source+Interface
> > > > >
> > > > > Discussion thread:
> > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/70484d6aa4b8e7121181ed8d5857a94bfb7d5a76334b9c8fcc59700c%40%3Cdev.flink.apache.org%3E
> > > > >
> > > > > The vote will last for at least 72 hours, following the consensus
> > > voting
> > > > >  process.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > On Thu, Dec 5, 2019 at 10:31 AM jincheng sun <
> > sunjincheng...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > >> +1 (binding), and looking forward to seeing the new interface in
> the
> > > > >> master.
> > > > >>
> > > > >> Best,
> > > > >> Jincheng
> > > > >>
> > > > >> Becket Qin  于2019年12月5日周四 上午8:05写道：
> > > > >>
> > > > >> > Hi all,
> > > > >> >
> > > > >> > I would like to start the vote for FLIP-27 which proposes to
> > > > introduce a
> > > > >> > new Source connector interface to address a few problems in the
> > > > existing
> > > > >> > source connector. The main goals of the the FLIP are following:
> > > > >> >
> > > > >> > 1. Unify the Source interface in Flink for batch and stream.
> > > > >> > 2. Significantly reduce the work for developers to develop new
> > > source
> > > > >> > connectors.
> > > > >> > 3. Provide a common abstraction for all the sources, as well as
> a
> > > > >> mechanism
> > > > >> > to allow source subtasks to coordinate among themselves.
> > > > >> >
> > > > >> > The vote will last for at least 72 hours, following the
> consensus
> > > > voting
> > > > >> > process.
> > > > >> >
> > > > >> > FLIP wiki:
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> > > > >> >
> > > > >> > Discussion thread:
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://lists.apache.org/thread.html/70484d6aa4b8e7121181ed8d5857a94bfb7d5a76334b9c8fcc59700c@%3Cdev.flink.apache.org%3E
> > > > >> >
> > > > >> > Thanks,
> > > > >> >
> > > > >> > Jiangjie (Becket) Qin
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>
>
> --
> Best, Jingsong Lee
>

Re: [DISCUSS] Upload the Flink Python API 1.9.x to PyPI for user convenience.

2020-02-03 Thread Wei Zhong

Hi Jincheng,

Thanks for bring up this discussion!

+1 for this proposal. Building from source takes long time and requires a
good network environment. Some users may not have such an environment.
Uploading to PyPI will greatly improve the user experience.

Best,
Wei

jincheng sun  于2020年2月4日周二 上午11:49写道：

> Hi folks,
>
> I am very happy to receive some user inquiries about the use of Flink
> Python API (PyFlink) recently. One of the more common questions is whether
> it is possible to install PyFlink without using source code build. The most
> convenient and natural way for users is to use `pip install apache-flink`.
> We originally planned to support the use of `pip install apache-flink` in
> Flink 1.10, but the reason for this decision was that when Flink 1.9 was
> released at August 22, 2019[1], Flink's PyPI account system was not ready.
> At present, our PyPI account is available at October 09, 2019 [2](Only PMC
> can access), So for the convenience of users I propose:
>
> - Publish the latest release version (1.9.2) of Flink 1.9 to PyPI.
> - Update Flink 1.9 documentation to add support for `pip install`.
>
> As we all know, Flink 1.9.2 was just completed released at January 31, 2020
> [3]. There is still at least 1 to 2 months before the release of 1.9.3, so
> my proposal is completely considered from the perspective of user
> convenience. Although the proposed work is not large, we have not set a
> precedent for independent release of the Flink Python API(PyFlink) in the
> previous release process. I hereby initiate the current discussion and look
> forward to your feedback!
>
> Best,
> Jincheng
>
> [1]
>
> https://lists.apache.org/thread.html/4a4d23c449f26b66bc58c71cc1a5c6079c79b5049c6c6744224c5f46%40%3Cdev.flink.apache.org%3E
> [2]
>
> https://lists.apache.org/thread.html/8273a5e8834b788d8ae552a5e177b69e04e96c0446bb90979444deee%40%3Cprivate.flink.apache.org%3E
> [3]
>
> https://lists.apache.org/thread.html/ra27644a4e111476b6041e8969def4322f47d5e0aae8da3ef30cd2926%40%3Cdev.flink.apache.org%3E
>

Re: [DISCUSS] Upload the Flink Python API 1.9.x to PyPI for user convenience.

2020-02-03 Thread Xingbo Huang

Hi Jincheng,

Thanks for driving this.
+1 for this proposal.

Compared to building from source, downloading directly from PyPI will
greatly save the development cost of Python users.

Best,
Xingbo



Wei Zhong  于2020年2月4日周二 下午12:43写道：

> Hi Jincheng,
>
> Thanks for bring up this discussion!
>
> +1 for this proposal. Building from source takes long time and requires a
> good network environment. Some users may not have such an environment.
> Uploading to PyPI will greatly improve the user experience.
>
> Best,
> Wei
>
> jincheng sun  于2020年2月4日周二 上午11:49写道：
>
>> Hi folks,
>>
>> I am very happy to receive some user inquiries about the use of Flink
>> Python API (PyFlink) recently. One of the more common questions is whether
>> it is possible to install PyFlink without using source code build. The
>> most
>> convenient and natural way for users is to use `pip install apache-flink`.
>> We originally planned to support the use of `pip install apache-flink` in
>> Flink 1.10, but the reason for this decision was that when Flink 1.9 was
>> released at August 22, 2019[1], Flink's PyPI account system was not ready.
>> At present, our PyPI account is available at October 09, 2019 [2](Only PMC
>> can access), So for the convenience of users I propose:
>>
>> - Publish the latest release version (1.9.2) of Flink 1.9 to PyPI.
>> - Update Flink 1.9 documentation to add support for `pip install`.
>>
>> As we all know, Flink 1.9.2 was just completed released at January 31,
>> 2020
>> [3]. There is still at least 1 to 2 months before the release of 1.9.3, so
>> my proposal is completely considered from the perspective of user
>> convenience. Although the proposed work is not large, we have not set a
>> precedent for independent release of the Flink Python API(PyFlink) in the
>> previous release process. I hereby initiate the current discussion and
>> look
>> forward to your feedback!
>>
>> Best,
>> Jincheng
>>
>> [1]
>>
>> https://lists.apache.org/thread.html/4a4d23c449f26b66bc58c71cc1a5c6079c79b5049c6c6744224c5f46%40%3Cdev.flink.apache.org%3E
>> [2]
>>
>> https://lists.apache.org/thread.html/8273a5e8834b788d8ae552a5e177b69e04e96c0446bb90979444deee%40%3Cprivate.flink.apache.org%3E
>> [3]
>>
>> https://lists.apache.org/thread.html/ra27644a4e111476b6041e8969def4322f47d5e0aae8da3ef30cd2926%40%3Cdev.flink.apache.org%3E
>>
>

Re: [DISCUSS] Upload the Flink Python API 1.9.x to PyPI for user convenience.

2020-02-03 Thread Jeff Zhang

+1


Xingbo Huang  于2020年2月4日周二 下午1:07写道：

> Hi Jincheng,
>
> Thanks for driving this.
> +1 for this proposal.
>
> Compared to building from source, downloading directly from PyPI will
> greatly save the development cost of Python users.
>
> Best,
> Xingbo
>
>
>
> Wei Zhong  于2020年2月4日周二 下午12:43写道：
>
>> Hi Jincheng,
>>
>> Thanks for bring up this discussion!
>>
>> +1 for this proposal. Building from source takes long time and requires a
>> good network environment. Some users may not have such an environment.
>> Uploading to PyPI will greatly improve the user experience.
>>
>> Best,
>> Wei
>>
>> jincheng sun  于2020年2月4日周二 上午11:49写道：
>>
>>> Hi folks,
>>>
>>> I am very happy to receive some user inquiries about the use of Flink
>>> Python API (PyFlink) recently. One of the more common questions is
>>> whether
>>> it is possible to install PyFlink without using source code build. The
>>> most
>>> convenient and natural way for users is to use `pip install
>>> apache-flink`.
>>> We originally planned to support the use of `pip install apache-flink` in
>>> Flink 1.10, but the reason for this decision was that when Flink 1.9 was
>>> released at August 22, 2019[1], Flink's PyPI account system was not
>>> ready.
>>> At present, our PyPI account is available at October 09, 2019 [2](Only
>>> PMC
>>> can access), So for the convenience of users I propose:
>>>
>>> - Publish the latest release version (1.9.2) of Flink 1.9 to PyPI.
>>> - Update Flink 1.9 documentation to add support for `pip install`.
>>>
>>> As we all know, Flink 1.9.2 was just completed released at January 31,
>>> 2020
>>> [3]. There is still at least 1 to 2 months before the release of 1.9.3,
>>> so
>>> my proposal is completely considered from the perspective of user
>>> convenience. Although the proposed work is not large, we have not set a
>>> precedent for independent release of the Flink Python API(PyFlink) in the
>>> previous release process. I hereby initiate the current discussion and
>>> look
>>> forward to your feedback!
>>>
>>> Best,
>>> Jincheng
>>>
>>> [1]
>>>
>>> https://lists.apache.org/thread.html/4a4d23c449f26b66bc58c71cc1a5c6079c79b5049c6c6744224c5f46%40%3Cdev.flink.apache.org%3E
>>> [2]
>>>
>>> https://lists.apache.org/thread.html/8273a5e8834b788d8ae552a5e177b69e04e96c0446bb90979444deee%40%3Cprivate.flink.apache.org%3E
>>> [3]
>>>
>>> https://lists.apache.org/thread.html/ra27644a4e111476b6041e8969def4322f47d5e0aae8da3ef30cd2926%40%3Cdev.flink.apache.org%3E
>>>
>>

-- 
Best Regards

Jeff Zhang

Re: [DISCUSS] Improve TableFactory

Hi all,

After rethinking and discussion with Kurt, I'd like to remove "isBounded".
We can delay this is bounded message to TableSink.
With TableSink refactor, we need consider "consumeDataStream"
and "consumeBoundedStream".

Best,
Jingsong Lee

On Mon, Feb 3, 2020 at 4:17 PM Jingsong Li  wrote:

> Hi Jark,
>
> Thanks involving, yes, it's hard to understand to add isBounded on the
> source.
> I recommend adding only to sink at present, because sink has upstream. Its
> upstream is either bounded or unbounded.
>
> Hi all,
>
> Let me summarize with your suggestions.
>
> public interface TableSourceFactory extends TableFactory {
>
>..
>
>
>/**
> * Creates and configures a {@link TableSource} based on the given {@link 
> Context}.
> *
> * @param context context of this table source.
> * @return the configured table source.
> */
>default TableSource createTableSource(Context context) {
>   ObjectIdentifier tableIdentifier = context.getTableIdentifier();
>   return createTableSource(
> new ObjectPath(tableIdentifier.getDatabaseName(), 
> tableIdentifier.getObjectName()),
> context.getTable());
>}
>
>/**
> * Context of table source creation. Contains table information and 
> environment information.
> */
>interface Context {
>
>   /**
>* @return full identifier of the given {@link CatalogTable}.
>*/
>   ObjectIdentifier getTableIdentifier();
>
>   /**
>* @return table {@link CatalogTable} instance.
>*/
>   CatalogTable getTable();
>
>   /**
>* @return readable config of this table environment.
>*/
>   ReadableConfig getTableConfig();
>}
> }
>
> public interface TableSinkFactory extends TableFactory {
>
>..
>
>/**
> * Creates and configures a {@link TableSink} based on the given {@link 
> Context}.
> *
> * @param context context of this table sink.
> * @return the configured table sink.
> */
>default TableSink createTableSink(Context context) {
>   ObjectIdentifier tableIdentifier = context.getTableIdentifier();
>   return createTableSink(
> new ObjectPath(tableIdentifier.getDatabaseName(), 
> tableIdentifier.getObjectName()),
> context.getTable());
>}
>
>/**
> * Context of table sink creation. Contains table information and 
> environment information.
> */
>interface Context {
>
>   /**
>* @return full identifier of the given {@link CatalogTable}.
>*/
>   ObjectIdentifier getTableIdentifier();
>
>   /**
>* @return table {@link CatalogTable} instance.
>*/
>   CatalogTable getTable();
>
>   /**
>* @return readable config of this table environment.
>*/
>   ReadableConfig getTableConfig();
>
>   /**
>* @return Input whether or not it is bounded.
>*/
>   boolean isBounded();
>}
> }
>
> If there is no objection, I will start a vote thread. (if necessary, I can
> also edit a FLIP).
>
> Best,
> Jingsong Lee
>
> On Thu, Jan 16, 2020 at 7:56 PM Jingsong Li 
> wrote:
>
>> Thanks Bowen and Timo for involving.
>>
>> Hi Bowen,
>>
>> > 1. is it better to have explicit APIs like "createBatchTableSource(...)"
>> I think it is better to keep one method, since in [1], we have reached
>> one in DataStream layer to maintain a single API in "env.source". I think
>> it is good to not split batch and stream, And our TableSource/TableSink are
>> the same class for both batch and streaming too.
>>
>> > 2. I'm not sure of the benefits to have a CatalogTableContext class.
>> As Timo said, We may have more parameters to add in the future, take a
>> look to "AbstractRichFunction.RuntimeContext", It's added little by little.
>>
>> Hi Timo,
>>
>> Your suggestion about Context looks good to me.
>> "TablePath" used in Hive for updating the catalog information of this
>> table. Yes, "ObjectIdentifier" looks better than "ObjectPath".
>>
>> > Can we postpone the change of TableValidators?
>> Yes, ConfigOption validation looks good to me. It seems that you have
>> been thinking about this for a long time. It's very good. Looking forward
>> to the promotion of FLIP-54.
>>
>> [1]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-27-Refactor-Source-Interface-td24952i80.html#a36692
>>
>> Best,
>> Jingsong Lee
>>
>> On Thu, Jan 16, 2020 at 6:01 PM Timo Walther  wrote:
>>
>>> Hi Jingsong,
>>>
>>> +1 for adding a context in the source and sink factories. A context
>>> class also allows for future modifications without touching the
>>> TableFactory interface again.
>>>
>>> How about:
>>>
>>> interface TableSourceFactory {
>>>  interface Context {
>>> // ...
>>>  }
>>> }
>>>
>>> Because I find the name `CatalogTableContext` confusing and we can bound
>>> the interface to the factory class itself as an inner interface.
>>>
>>> Readable access to configuration sounds also

Re: [DISCUSS] Improve TableFactory

So the interface will be:

public interface TableSourceFactory extends TableFactory {
   ..

   /**
* Creates and configures a {@link TableSource} based on the given
{@link Context}.
*
* @param context context of this table source.
* @return the configured table source.
*/
   default TableSource createTableSource(Context context) {
  ObjectIdentifier tableIdentifier = context.getTableIdentifier();
  return createTableSource(
new ObjectPath(tableIdentifier.getDatabaseName(),
tableIdentifier.getObjectName()),
context.getTable());
   }
   /**
* Context of table source creation. Contains table information and
environment information.
*/
   interface Context {
  /**
   * @return full identifier of the given {@link CatalogTable}.
   */
  ObjectIdentifier getTableIdentifier();
  /**
   * @return table {@link CatalogTable} instance.
   */
  CatalogTable getTable();
  /**
   * @return readable config of this table environment.
   */
  ReadableConfig getTableConfig();
   }
}

public interface TableSinkFactory extends TableFactory {
   ..
   /**
* Creates and configures a {@link TableSink} based on the given
{@link Context}.
*
* @param context context of this table sink.
* @return the configured table sink.
*/
   default TableSink createTableSink(Context context) {
  ObjectIdentifier tableIdentifier = context.getTableIdentifier();
  return createTableSink(
new ObjectPath(tableIdentifier.getDatabaseName(),
tableIdentifier.getObjectName()),
context.getTable());
   }
   /**
* Context of table sink creation. Contains table information and
environment information.
*/
   interface Context {
  /**
   * @return full identifier of the given {@link CatalogTable}.
   */
  ObjectIdentifier getTableIdentifier();
  /**
   * @return table {@link CatalogTable} instance.
   */
  CatalogTable getTable();
  /**
   * @return readable config of this table environment.
   */
  ReadableConfig getTableConfig();
   }
}

Best,
Jingsong Lee

On Tue, Feb 4, 2020 at 1:22 PM Jingsong Li  wrote:

> Hi all,
>
> After rethinking and discussion with Kurt, I'd like to remove "isBounded".
> We can delay this is bounded message to TableSink.
> With TableSink refactor, we need consider "consumeDataStream"
> and "consumeBoundedStream".
>
> Best,
> Jingsong Lee
>
> On Mon, Feb 3, 2020 at 4:17 PM Jingsong Li  wrote:
>
>> Hi Jark,
>>
>> Thanks involving, yes, it's hard to understand to add isBounded on the
>> source.
>> I recommend adding only to sink at present, because sink has upstream.
>> Its upstream is either bounded or unbounded.
>>
>> Hi all,
>>
>> Let me summarize with your suggestions.
>>
>> public interface TableSourceFactory extends TableFactory {
>>
>>..
>>
>>
>>/**
>> * Creates and configures a {@link TableSource} based on the given {@link 
>> Context}.
>> *
>> * @param context context of this table source.
>> * @return the configured table source.
>> */
>>default TableSource createTableSource(Context context) {
>>   ObjectIdentifier tableIdentifier = context.getTableIdentifier();
>>   return createTableSource(
>> new ObjectPath(tableIdentifier.getDatabaseName(), 
>> tableIdentifier.getObjectName()),
>> context.getTable());
>>}
>>
>>/**
>> * Context of table source creation. Contains table information and 
>> environment information.
>> */
>>interface Context {
>>
>>   /**
>>* @return full identifier of the given {@link CatalogTable}.
>>*/
>>   ObjectIdentifier getTableIdentifier();
>>
>>   /**
>>* @return table {@link CatalogTable} instance.
>>*/
>>   CatalogTable getTable();
>>
>>   /**
>>* @return readable config of this table environment.
>>*/
>>   ReadableConfig getTableConfig();
>>}
>> }
>>
>> public interface TableSinkFactory extends TableFactory {
>>
>>..
>>
>>/**
>> * Creates and configures a {@link TableSink} based on the given {@link 
>> Context}.
>> *
>> * @param context context of this table sink.
>> * @return the configured table sink.
>> */
>>default TableSink createTableSink(Context context) {
>>   ObjectIdentifier tableIdentifier = context.getTableIdentifier();
>>   return createTableSink(
>> new ObjectPath(tableIdentifier.getDatabaseName(), 
>> tableIdentifier.getObjectName()),
>> context.getTable());
>>}
>>
>>/**
>> * Context of table sink creation. Contains table information and 
>> environment information.
>> */
>>interface Context {
>>
>>   /**
>>* @return full identifier of the given {@link CatalogTable}.
>>*/
>>   ObjectIdentifier getTableIdentifier();
>>
>>   /**
>>* @return table {@link CatalogTable} instance.
>>

[VOTE] Improve TableFactory to add Context

Hi all,

I would like to start the vote for the improve of
TableFactory, which is discussed and
reached a consensus in the discussion thread[2].

The vote will be open for at least 72 hours. I'll try to close it
unless there is an objection or not enough votes.

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improve-TableFactory-td36647.html

Best,
Jingsong Lee

[jira] [Created] (FLINK-15876) The alternative code for GroupedProcessingTimeWindowExample don't compile pass current version

2020-02-03 Thread Wong (Jira)

Wong created FLINK-15876:


 Summary: The alternative code for 
GroupedProcessingTimeWindowExample don't compile pass current version
 Key: FLINK-15876
 URL: https://issues.apache.org/jira/browse/FLINK-15876
 Project: Flink
  Issue Type: Bug
  Components: Examples
Affects Versions: 1.9.2, 1.7.2, 1.6.3
 Environment: Mac osx 10.14

JDK 1.8.202
Reporter: Wong
 Fix For: 1.10.0


stream
 .keyBy(0)
 .timeWindow(Time.of(2500, MILLISECONDS), Time.of(500, MILLISECONDS))
 .reduce(new SummingReducer())

 // alternative: use a apply function which does not pre-aggregate
// .keyBy(new FirstFieldKeyExtractor, Long>())
// .window(Time.of(2500, MILLISECONDS), Time.of(500, MILLISECONDS))
// .apply(new SummingWindowFunction())

 .addSink(new SinkFunction>() {
 @Override
 public void invoke(Tuple2 value) {
 }
 });

 

 

if use The alternative code ,the compile doe'st comile it successfully. The api 
is used a serveral major version ago.

I change it to this 

.keyBy(new KeySelector, Long>() {
 @Override
 public Long getKey(Tuple2 value) throws Exception {
 return value.f0;
 }
})
.timeWindow(Time.of(2500, MILLISECONDS), Time.of(500, MILLISECONDS))

 

private static class SummingWindowFunction implements 
WindowFunction, Tuple2, Long, TimeWindow> {

 @Override
 public void apply(Long key, TimeWindow window, Iterable> 
values, Collector> out) {
 long sum = 0L;
 for (Tuple2 value : values) {
 sum += value.f1;
 }

 out.collect(new Tuple2<>(key, sum));
 }
} 

And it passed


.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: RocksDB Compaction filter to clean up state with TTL

2020-02-03 Thread Yu Li

Hi Abhilasha,

We were conservative about enabling this by default to prevent any
unexpected problems. However, since there has been no reported issues so
far, we will enable this by default in 1.10.0 release. Please refer to
FLINK-14898 [1], FLINK-15506 [2] and (currently drafted) 1.10.0 release
note [3] for more details

Best Regards,
Yu

[1] https://issues.apache.org/jira/browse/FLINK-14898
[2] https://issues.apache.org/jira/browse/FLINK-15506
[3]
https://ci.apache.org/projects/flink/flink-docs-master/release-notes/flink-1.10.html#state

On Tue, 4 Feb 2020 at 03:32, Seth, Abhilasha 
wrote:

> Hello,
> Flink 1.8 introduces the config
> ‘state.backend.rocksdb.ttl.compaction.filter.enabled’ to enable or disable
> the compaction filter to cleanup state with TTL. I was curious why its
> disabled by default. Are there any performance implications of turning it
> ON by default?
> Thanks,
> Abhilasha
>
>

Re: [DISCUSS] Upload the Flink Python API 1.9.x to PyPI for user convenience.

2020-02-03 Thread Hequn Cheng

Hi Jincheng,

+1 for this proposal.
>From the perspective of users, I think it would nice to have PyFlink on
PyPI which makes it much easier to install PyFlink.

Best, Hequn

On Tue, Feb 4, 2020 at 1:09 PM Jeff Zhang  wrote:

> +1
>
>
> Xingbo Huang  于2020年2月4日周二 下午1:07写道：
>
>> Hi Jincheng,
>>
>> Thanks for driving this.
>> +1 for this proposal.
>>
>> Compared to building from source, downloading directly from PyPI will
>> greatly save the development cost of Python users.
>>
>> Best,
>> Xingbo
>>
>>
>>
>> Wei Zhong  于2020年2月4日周二 下午12:43写道：
>>
>>> Hi Jincheng,
>>>
>>> Thanks for bring up this discussion!
>>>
>>> +1 for this proposal. Building from source takes long time and requires
>>> a good network environment. Some users may not have such an environment.
>>> Uploading to PyPI will greatly improve the user experience.
>>>
>>> Best,
>>> Wei
>>>
>>> jincheng sun  于2020年2月4日周二 上午11:49写道：
>>>
 Hi folks,

 I am very happy to receive some user inquiries about the use of Flink
 Python API (PyFlink) recently. One of the more common questions is
 whether
 it is possible to install PyFlink without using source code build. The
 most
 convenient and natural way for users is to use `pip install
 apache-flink`.
 We originally planned to support the use of `pip install apache-flink`
 in
 Flink 1.10, but the reason for this decision was that when Flink 1.9 was
 released at August 22, 2019[1], Flink's PyPI account system was not
 ready.
 At present, our PyPI account is available at October 09, 2019 [2](Only
 PMC
 can access), So for the convenience of users I propose:

 - Publish the latest release version (1.9.2) of Flink 1.9 to PyPI.
 - Update Flink 1.9 documentation to add support for `pip install`.

 As we all know, Flink 1.9.2 was just completed released at January 31,
 2020
 [3]. There is still at least 1 to 2 months before the release of 1.9.3,
 so
 my proposal is completely considered from the perspective of user
 convenience. Although the proposed work is not large, we have not set a
 precedent for independent release of the Flink Python API(PyFlink) in
 the
 previous release process. I hereby initiate the current discussion and
 look
 forward to your feedback!

 Best,
 Jincheng

 [1]

 https://lists.apache.org/thread.html/4a4d23c449f26b66bc58c71cc1a5c6079c79b5049c6c6744224c5f46%40%3Cdev.flink.apache.org%3E
 [2]

 https://lists.apache.org/thread.html/8273a5e8834b788d8ae552a5e177b69e04e96c0446bb90979444deee%40%3Cprivate.flink.apache.org%3E
 [3]

 https://lists.apache.org/thread.html/ra27644a4e111476b6041e8969def4322f47d5e0aae8da3ef30cd2926%40%3Cdev.flink.apache.org%3E

>>>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

2020-02-03 Thread Hequn Cheng

Hi Till,

Thanks a lot for your suggestion. It's a good idea to offer the flink-ml
libraries as optional dependencies on the download page which can make the
dist smaller.

But I also have some concerns for it, e.g., the download page now only
includes the latest 3 releases. We may need to find ways to support more
versions.
On the other hand, the size of the flink-ml libraries now is very
small(about 246K), so it would not bring much impact on the size of dist.

What do you think?

Best,
Hequn

On Mon, Feb 3, 2020 at 6:24 PM Till Rohrmann  wrote:

> An alternative solution would be to offer the flink-ml libraries as
> optional dependencies on the download page. Similar to how we offer the
> different SQL formats and Hadoop releases [1].
>
> [1] https://flink.apache.org/downloads.html
>
> Cheers,
> Till
>
> On Mon, Feb 3, 2020 at 10:19 AM Hequn Cheng  wrote:
>
> > Thank you all for your feedback and suggestions!
> >
> > Best, Hequn
> >
> > On Mon, Feb 3, 2020 at 5:07 PM Becket Qin  wrote:
> >
> > > Thanks for bringing up the discussion, Hequn.
> > >
> > > +1 on adding `flink-ml-api` and `flink-ml-lib` into opt. This would
> make
> > > it much easier for the users to try out some simple ml tasks.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Mon, Feb 3, 2020 at 4:34 PM jincheng sun 
> > > wrote:
> > >
> > >> Thank you for pushing forward @Hequn Cheng  !
> > >>
> > >> Hi  @Becket Qin  , Do you have any concerns on
> > >> this ?
> > >>
> > >> Best,
> > >> Jincheng
> > >>
> > >> Hequn Cheng  于2020年2月3日周一 下午2:09写道：
> > >>
> > >>> Hi everyone,
> > >>>
> > >>> Thanks for the feedback. As there are no objections, I've opened a
> JIRA
> > >>> issue(FLINK-15847[1]) to address this issue.
> > >>> The implementation details can be discussed in the issue or in the
> > >>> following PR.
> > >>>
> > >>> Best,
> > >>> Hequn
> > >>>
> > >>> [1] https://issues.apache.org/jira/browse/FLINK-15847
> > >>>
> > >>> On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng 
> > wrote:
> > >>>
> > >>> > Hi Jincheng,
> > >>> >
> > >>> > Thanks a lot for your feedback!
> > >>> > Yes, I agree with you. There are cases that multi jars need to be
> > >>> > uploaded. I will prepare another discussion later. Maybe with a
> > simple
> > >>> > design doc.
> > >>> >
> > >>> > Best, Hequn
> > >>> >
> > >>> > On Wed, Jan 8, 2020 at 3:06 PM jincheng sun <
> > sunjincheng...@gmail.com>
> > >>> > wrote:
> > >>> >
> > >>> >> Thanks for bring up this discussion Hequn!
> > >>> >>
> > >>> >> +1 for include `flink-ml-api` and `flink-ml-lib` in opt.
> > >>> >>
> > >>> >> BTW: I think would be great if bring up a discussion for upload
> > >>> multiple
> > >>> >> Jars at the same time. as PyFlink JOB also can have the benefit if
> > we
> > >>> do
> > >>> >> that improvement.
> > >>> >>
> > >>> >> Best,
> > >>> >> Jincheng
> > >>> >>
> > >>> >>
> > >>> >> Hequn Cheng  于2020年1月8日周三 上午11:50写道：
> > >>> >>
> > >>> >> > Hi everyone,
> > >>> >> >
> > >>> >> > FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI which
> > moves
> > >>> >> Flink
> > >>> >> > ML a step further. Base on it, users can develop their ML jobs
> and
> > >>> more
> > >>> >> and
> > >>> >> > more machine learning platforms are providing ML services.
> > >>> >> >
> > >>> >> > However, the problem now is the jars of flink-ml-api and
> > >>> flink-ml-lib
> > >>> >> are
> > >>> >> > only exist on maven repo. Whenever users want to submit ML jobs,
> > >>> they
> > >>> >> can
> > >>> >> > only depend on the ml modules and package a fat jar. This would
> be
> > >>> >> > inconvenient especially for the machine learning platforms on
> > which
> > >>> >> nearly
> > >>> >> > all jobs depend on Flink ML modules and have to package a fat
> jar.
> > >>> >> >
> > >>> >> > Given this, it would be better to include jars of flink-ml-api
> and
> > >>> >> > flink-ml-lib in the `opt` folder, so that users can directly use
> > the
> > >>> >> jars
> > >>> >> > with the binary release. For example, users can move the jars
> into
> > >>> the
> > >>> >> > `lib` folder or use -j to upload the jars. (Currently, -j only
> > >>> support
> > >>> >> > upload one jar. Supporting multi jars for -j can be discussed in
> > >>> another
> > >>> >> > discussion.)
> > >>> >> >
> > >>> >> > Putting the jars in the `opt` folder instead of the `lib` folder
> > is
> > >>> >> because
> > >>> >> > currently, the ml jars are still optional for the Flink project
> by
> > >>> >> default.
> > >>> >> >
> > >>> >> > What do you think? Welcome any feedback!
> > >>> >> >
> > >>> >> > Best,
> > >>> >> >
> > >>> >> > Hequn
> > >>> >> >
> > >>> >> > [1]
> > >>> >> >
> > >>> >> >
> > >>> >>
> > >>>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
> > >>> >> >
> > >>> >>
> > >>> >
> > >>>
> > >>
> >
>

[jira] [Created] (FLINK-15877) Stop using deprecated methods from TableSource interface

Kurt Young created FLINK-15877:
--

 Summary: Stop using deprecated methods from TableSource interface
 Key: FLINK-15877
 URL: https://issues.apache.org/jira/browse/FLINK-15877
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Reporter: Kurt Young
 Fix For: 1.11.0


This is an *umbrella* issue to track the cleaning work of current TableSource 
interface. 

Currently, methods like `getReturnType` and `getTableSchema` are already 
deprecated, but still used by lots of codes in various connectors and test 
codes. We should make sure no connector and testing codes would use these 
deprecated methods anymore, except for the backward compatibility callings. 
This is to prepare for the further interface improvement.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15878) Stop overriding TableSource::getReturnType in tests

Kurt Young created FLINK-15878:
--

 Summary: Stop overriding TableSource::getReturnType in tests
 Key: FLINK-15878
 URL: https://issues.apache.org/jira/browse/FLINK-15878
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Reporter: Kurt Young
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15879) Stop overriding TableSource::getReturnType in kafka connector

Kurt Young created FLINK-15879:
--

 Summary: Stop overriding TableSource::getReturnType in kafka 
connector
 Key: FLINK-15879
 URL: https://issues.apache.org/jira/browse/FLINK-15879
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Kafka, Table SQL / API
Reporter: Kurt Young
Assignee: Jark Wu
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15880) Stop overriding TableSource::getReturnType in hbase connector

Kurt Young created FLINK-15880:
--

 Summary: Stop overriding TableSource::getReturnType in hbase 
connector
 Key: FLINK-15880
 URL: https://issues.apache.org/jira/browse/FLINK-15880
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / HBase, Table SQL / API
Reporter: Kurt Young
Assignee: Leonard Xu
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15881) Stop overriding TableSource::getReturnType in jdbc connector

Kurt Young created FLINK-15881:
--

 Summary: Stop overriding TableSource::getReturnType in jdbc 
connector
 Key: FLINK-15881
 URL: https://issues.apache.org/jira/browse/FLINK-15881
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / JDBC, Table SQL / API
Reporter: Kurt Young
Assignee: Zhenghua Gao
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15882) Stop overriding TableSource::getReturnType in orc table source

Kurt Young created FLINK-15882:
--

 Summary: Stop overriding TableSource::getReturnType in orc table 
source
 Key: FLINK-15882
 URL: https://issues.apache.org/jira/browse/FLINK-15882
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / ORC, Table SQL / API
Reporter: Kurt Young
Assignee: Jingsong Lee
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15883) Stop overriding TableSource::getReturnType in parquet table source

Kurt Young created FLINK-15883:
--

 Summary: Stop overriding TableSource::getReturnType in parquet 
table source
 Key: FLINK-15883
 URL: https://issues.apache.org/jira/browse/FLINK-15883
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Kurt Young
Assignee: Jingsong Lee
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15884) Stop overriding TableSource::getReturnType in python table source

Kurt Young created FLINK-15884:
--

 Summary: Stop overriding TableSource::getReturnType in python 
table source
 Key: FLINK-15884
 URL: https://issues.apache.org/jira/browse/FLINK-15884
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python, Table SQL / API
Reporter: Kurt Young
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15886) Stop overriding TableSource::getTableSchema in hive table source

Kurt Young created FLINK-15886:
--

 Summary: Stop overriding TableSource::getTableSchema in hive table 
source
 Key: FLINK-15886
 URL: https://issues.apache.org/jira/browse/FLINK-15886
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Hive, Table SQL / API
Reporter: Kurt Young
Assignee: Rui Li
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15885) Stop overriding TableSource::getTableSchema in tests

Kurt Young created FLINK-15885:
--

 Summary: Stop overriding TableSource::getTableSchema in tests
 Key: FLINK-15885
 URL: https://issues.apache.org/jira/browse/FLINK-15885
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API, Tests
Reporter: Kurt Young
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15888) Stop overriding TableSource::getTableSchema in hbase

Kurt Young created FLINK-15888:
--

 Summary: Stop overriding TableSource::getTableSchema in hbase
 Key: FLINK-15888
 URL: https://issues.apache.org/jira/browse/FLINK-15888
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / HBase, Table SQL / API
Reporter: Kurt Young
Assignee: Leonard Xu
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15887) Stop overriding TableSource::getTableSchema in kafka

Kurt Young created FLINK-15887:
--

 Summary: Stop overriding TableSource::getTableSchema in kafka
 Key: FLINK-15887
 URL: https://issues.apache.org/jira/browse/FLINK-15887
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Kafka, Table SQL / API
Reporter: Kurt Young
Assignee: Jark Wu
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15890) Stop overriding TableSource::getTableSchema in orc table source

Kurt Young created FLINK-15890:
--

 Summary: Stop overriding TableSource::getTableSchema in orc table 
source
 Key: FLINK-15890
 URL: https://issues.apache.org/jira/browse/FLINK-15890
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / ORC, Table SQL / API
Reporter: Kurt Young
Assignee: Jingsong Lee
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15891) Stop overriding TableSource::getTableSchema in parquet table source

Kurt Young created FLINK-15891:
--

 Summary: Stop overriding TableSource::getTableSchema in parquet 
table source
 Key: FLINK-15891
 URL: https://issues.apache.org/jira/browse/FLINK-15891
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Kurt Young
Assignee: Jingsong Lee
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15889) Stop overriding TableSource::getTableSchema in jdbc connector

Kurt Young created FLINK-15889:
--

 Summary: Stop overriding TableSource::getTableSchema in jdbc 
connector
 Key: FLINK-15889
 URL: https://issues.apache.org/jira/browse/FLINK-15889
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / JDBC, Table SQL / API
Reporter: Kurt Young
Assignee: Zhenghua Gao
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15892) Stop overriding TableSource::getTableSchema in csv table source

Kurt Young created FLINK-15892:
--

 Summary: Stop overriding TableSource::getTableSchema in csv table 
source
 Key: FLINK-15892
 URL: https://issues.apache.org/jira/browse/FLINK-15892
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Kurt Young
Assignee: Jingsong Lee
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15893) Stop overriding TableSource::getTableSchema in python table source

Kurt Young created FLINK-15893:
--

 Summary: Stop overriding TableSource::getTableSchema in python 
table source
 Key: FLINK-15893
 URL: https://issues.apache.org/jira/browse/FLINK-15893
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python, Table SQL / API
Reporter: Kurt Young
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15894) Stop overriding TableSource::getTableSchema in flink walkthrough

Kurt Young created FLINK-15894:
--

 Summary: Stop overriding TableSource::getTableSchema in flink 
walkthrough
 Key: FLINK-15894
 URL: https://issues.apache.org/jira/browse/FLINK-15894
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Kurt Young
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15896) Stop using TableSource::getTableSchema

Kurt Young created FLINK-15896:
--

 Summary: Stop using TableSource::getTableSchema
 Key: FLINK-15896
 URL: https://issues.apache.org/jira/browse/FLINK-15896
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Kurt Young
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15895) Stop using TableSource::getReturnType except for compatibility purpose