[jira] [Created] (FLINK-19051) Exception message should be reserved in log When the Job Manager failed

2020-08-26 Thread Pua (Jira)
Pua created FLINK-19051:
---

 Summary: Exception message should be reserved in log When the Job 
Manager failed 
 Key: FLINK-19051
 URL: https://issues.apache.org/jira/browse/FLINK-19051
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Task
Affects Versions: 1.11.1, 1.12.0
Reporter: Pua


{code:java}
protected void closeJobManagerConnection(JobID jobId, Exception cause) {
   JobManagerRegistration jobManagerRegistration = 
jobManagerRegistrations.remove(jobId);

   if (jobManagerRegistration != null) {
  final ResourceID jobManagerResourceId = 
jobManagerRegistration.getJobManagerResourceID();
  final JobMasterGateway jobMasterGateway = 
jobManagerRegistration.getJobManagerGateway();
  final JobMasterId jobMasterId = jobManagerRegistration.getJobMasterId();

  log.info("Disconnect job manager {}@{} for job {} from the resource 
manager.",
 jobMasterId,
 jobMasterGateway.getAddress(),
 jobId);

  jobManagerHeartbeatManager.unmonitorTarget(jobManagerResourceId);

  jmResourceIdRegistrations.remove(jobManagerResourceId);

  // tell the job manager about the disconnect
  jobMasterGateway.disconnectResourceManager(getFencingToken(), cause);
   } else {
  log.debug("There was no registered job manager for job {}.", jobId);
   }
{code}
Exception message should be reserved in log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-138: Declarative Resource management

2020-08-26 Thread Xintong Song
Thanks for preparing the FLIP and driving this discussion, @Chesnay & @Till.

I really like the idea. I see a great value in the proposed declarative
resource management, in terms of flexibility, usability and efficiency.

I have a few comments and questions regarding the FLIP design. In general,
the protocol design makes good sense to me. My main concern is that it is
not very clear to me what changes are required from the
Resource/SlotManager side to adapt to the new protocol.

*1. Distributed slots across different jobs*

Jobs which register their requirements first, will have precedence over
> other jobs also if the requirements change during the runtime.

Just trying to understand, does this mean jobs are prioritized by the order
of their first resource declaring?

*2. AllocationID*

Is this FLIP suggesting to completely remove AllocationID?

I'm fine with this change. It seems where AllocationID is used can either
be removed or be replaced by JobID. This reflects the concept that slots
are now assigned to a job instead of its individual slot requests.

I would like to bring to attention that this also requires changes on the
TM side, with respect to FLIP-56[1].

In the context of dynamic slot allocation introduced by FLIP-56, slots do
not pre-exist on TM and are dynamically created when RM calls
TaskExecutorGateway.requestSlot. Since the slots do not pre-exist, nor
their SlotIDs, RM requests slots from TM with a special SlotID (negative
slot index). The semantic changes from "requesting the slot identified by
the given SlotID" to "requesting a slot with the given resource profile".
The AllocationID is used for identifying the dynamic slots in such cases.

>From the perspective of FLIP-56 and fine grained resource management, I'm
fine with removing AllocationID. In the meantime, we would need TM to
recognize the special negative indexed SlotID and generate a new unique
SlotID for identifying the slot.

*3. Minimum resource requirement*

However, we can let the JobMaster know if we cannot fulfill the minimum
> resource requirement for a job after
> resourcemanager.standalone.start-up-time has passed.

What is the "minimum resource requirement for a job"? Did I overlook
anything?

*4. Offer/free slots between JM/TM*

This probably deserves a separate discussion thread. Just want to bring it
up.

The idea has been coming to me for quite some time. Is this design, that JM
requests resources from RM while accepting/releasing resources from/to TM,
the right thing?

The pain point is that events of JM's activities (requesting/releasing
resources) arrive at RM out of order. This leads to several problems.

   - When a job fails and task cancelation takes long, some of the slots
   might be released from the slot pool due to being unused for a while. Then
   the job restarts and requests these slots again. At this time, RM may
   receive slot requests before noticing from TM heartbeats that previous
   slots are released, thus requesting new resources. I've seen many times
   that the Yarn cluster has a heavy load and is not allocating resources
   quickly enough, which leads to slot request timeout and job failover, and
   during the failover more resources are requested which adds more load to
   the Yarn cluster. Happily, this should be improved with the declarative
   resource management. :)
   - As described in this FLIP, it is possible that RM learns the releasing
   of slots from TM heartbeat before noticing the resource requirement
   decreasing, it may allocate more resources which need to be released soon.
   - It complicates the ResourceManager/SlotManager, by requiring an
   additional slot state PENDING, which means the slot is assigned by RM but
   is not confirmed successfully ordered by TM.

Why not just make RM offer the allocated resources (TM address, SlotID,
etc.) to JM, and JM release resources to RM? So that for all the resource
management JM talks to RM, and for the task deployment and execution it
talks to TM?

I tried to understand the benefits for having the current design, and found
the following in FLIP-6[2].

> All that the ResourceManager does is negotiate between the
> cluster-manager, the JobManager, and the TaskManagers. Its state can hence
> be reconstructed from re-acquiring containers and re-registration from
> JobManagers and TaskManagers


Correct me if I'm wrong, it seems the original purpose is to make sure the
assignment between jobs and slots are confirmed between JM and TMs, so that
failures of RM will not lead to any inconsistency. However, this only
benefits scenarios where RM fails while JM and TMs live. Currently, JM and
RM are in the same process. We do not really have any scenario where RM
fails alone. We might separate JM and RM to different processes in future,
but as far as I can see we don't have such requirements at the moment. It
seems to me that we are suffering the current problems, complying to
potential future benefits.

Maybe I overlooked something.


[ANNOUNCE] Introducing the GSoD 2020 Participants.

2020-08-26 Thread Marta Paes Moreira
Hi, Everyone!

I'd like to officially welcome the applicants that were selected to work
with the Flink community for this year's Google Season of Docs (GSoD)
[1]: *Kartik
Khare* and *Muhammad Haseeb Asif*!

   - Kartik [2] is a software engineer at Walmart Labs and a regular
   contributor to multiple Apache projects. He is also a prolific writer on
   Medium and has previously published on the Flink blog. Last year, he
   contributed to Apache Airflow as part of GSoD and he's currently revamping
   the Apache Pinot documentation.


   - Muhammad [3] is a dual degree master student at KTH and TU Berlin,
   focusing on distributed systems and data intensive processing (in
   particular, performance optimization of state backends). He writes
   frequently about Flink on Medium and you can catch him and his colleague
   Sruthi this Friday at Beam Summit [4]!

They will be working to improve the Table API/SQL documentation over a
3-month period, with the support of Aljoscha and Seth as mentors.

Please give them a warm welcome to the Flink developer community!

Marta

[1] https://developers.google.com/season-of-docs/docs/participants
[2] https://github.com/KKcorps
[3] https://www.linkedin.com/in/haseebasif/
[4] https://2020.beamsummit.org/sessions/nexmark-beam-flinkndb/


Re: [DISCUSS] FLIP-138: Declarative Resource management

2020-08-26 Thread Chesnay Schepler

Thank you Xintong for your questions!


   Job prioritization

Yes, the job which declares it's initial requirements first is prioritized.
This is very much for simplicity; for example this avoids the nasty case 
where all jobs get some resources, but none get enough to actually run 
the job.



   Minimum resource requirements

My bad; at some point we want to allow the JobMaster to declare a range 
of resources it could use to run a job, for example min=1, target=10, 
max=+inf.


With this model, the RM would then try to balance the resources such 
that as many jobs as possible are as close to the target state as possible.


Currently, the minimum/target/maximum resources are all the same. So the 
notification is sent whenever the current requirements cannot be met.



   Allocation IDs

We do intend to, at the very least, remove AllocationIDs on the 
SlotManager side, as they are just not required there.


On the slotpool side we have to keep them around at least until the 
existing Slotpool implementations are removed (not sure whether we'll 
fully commit to this in 1.12), since the interfaces use AllocationIDs, 
which also bleed into the JobMaster.

The TaskExecutor is in a similar position.
But in the long-term, yes they will be removed, and most usages will 
probably be replaced by the SlotID.



   FLIP-56

Dynamic slot allocations are indeed quite interesting and raise a few 
questions; for example, the main purpose of it is to ensure maximum 
resource utilization. In that case, should the JobMaster be allowed to 
re-use a slot it if the task requires less resources than the slot 
provides, or should it always request a new slot that exactly matches?


There is a trade-off to be made between maximum resource utilization 
(request exactly matching slots, and only re-use exact matches) and 
quicker job deployment (re-use slot even if they don't exactly match, 
skip round-trip to RM).


As for how to handle the lack of a preemptively known SlotIDs, that 
should be fine in and of itself; we already handle a similar case when 
we request a new TaskExecutor to be started. So long as there is some 
way to know how many resources the TaskExecutor has in total I do not 
see a problem at the moment. We will get the SlotID eventually by virtue 
of the heartbeat SlotReport.



   Implementation plan (SlotManager)

You are on the right track. The SlotManager tracks the declared resource 
requirements, and if the requirements increased it creates a 
SlotRequest, which then goes through similar code paths as we have at 
the moment (try to find a free slot, if found tell the TM, otherwise try 
to request new TM).
The SlotManager changes are not that substantial to get a working 
version; we have a PoC and most of the work went into refactoring the 
SlotManager into a more manageable state. (split into several 
components, stricter and simplified Slot life-cycle, ...).



   Offer/free slots between JM/TM

Gotta run, but that's a good question and I'll think about. But I think 
it comes down to making less changes, and being able to leverage 
existing reconciliation protocols.
Do note that TaskExecutor also explicitly inform the RM about freed 
slots; the heartbeat slot report is just a safety net.
I'm not sure whether slot requests are able to overtake a slot release; 
@till do you have thoughts on that?
As for the race condition between the requirements reduction and slot 
release, if we run into problems we have the backup plan of only 
releasing the slot after the requirement reduction has been acknowledged.


On 26/08/2020 10:31, Xintong Song wrote:

Thanks for preparing the FLIP and driving this discussion, @Chesnay & @Till.

I really like the idea. I see a great value in the proposed declarative
resource management, in terms of flexibility, usability and efficiency.

I have a few comments and questions regarding the FLIP design. In general,
the protocol design makes good sense to me. My main concern is that it is
not very clear to me what changes are required from the
Resource/SlotManager side to adapt to the new protocol.

*1. Distributed slots across different jobs*

Jobs which register their requirements first, will have precedence over

other jobs also if the requirements change during the runtime.

Just trying to understand, does this mean jobs are prioritized by the order
of their first resource declaring?

*2. AllocationID*

Is this FLIP suggesting to completely remove AllocationID?

I'm fine with this change. It seems where AllocationID is used can either
be removed or be replaced by JobID. This reflects the concept that slots
are now assigned to a job instead of its individual slot requests.

I would like to bring to attention that this also requires changes on the
TM side, with respect to FLIP-56[1].

In the context of dynamic slot allocation introduced by FLIP-56, slots do
not pre-exist on TM and are dynamically created when RM calls
TaskExecutorGateway.requestSlot. Since

[DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

2020-08-26 Thread Xintong Song
Hi devs,

I'd like to bring the discussion over FLIP-141[1], which proposes how
managed memory should be shared by various use cases within a slot. This is
an extension to FLIP-53[2], where we assumed that RocksDB state backend and
batch operators are the only use cases of managed memory for streaming and
batch jobs respectively, which is no longer true with the introduction of
Python UDFs.

Please notice that we have not reached consensus between two different
designs. The major part of this FLIP describes one of the candidates, while
the alternative is discussed in the section "Rejected Alternatives". We are
hoping to borrow intelligence from the community to help us resolve the
disagreement.

Any feedback would be appreciated.

Thank you~

Xintong Song


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility

[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management


[jira] [Created] (FLINK-19052) Performance issue with PojoSerializer

2020-08-26 Thread Roman Grebennikov (Jira)
Roman Grebennikov created FLINK-19052:
-

 Summary: Performance issue with PojoSerializer
 Key: FLINK-19052
 URL: https://issues.apache.org/jira/browse/FLINK-19052
 Project: Flink
  Issue Type: Improvement
  Components: API / Type Serialization System
Affects Versions: 1.11.1
 Environment: Flink 1.12 master on 26.08.2020
Reporter: Roman Grebennikov
 Attachments: image-2020-08-26-10-46-19-800.png, 
image-2020-08-26-10-49-59-400.png

Currently PojoSerializer.createInstance() uses reflection call to create a 
class instance. As this method is called on each stream element on 
deserialization, reflection overhead can become noticeable in 
serialization-bound cases when:
 # Pojo class is small, so instantiation is noticeable.
 # The job is not having heavy CPU-bound event processing.

See this flamegraph built for 
flink-benchmarks/SerializationFrameworkMiniBenchmarks.serializerPojo benchmark:

!image-2020-08-26-10-46-19-800.png!

This Reflection.getCallerClass method consumes a lot of CPU, mostly doing a 
security check if we allowed to do this reflective call.

 

There is no true reason to perform this check on each deserializing event, so 
to speed things up we can just cache the constructor using MetaHandle, so this 
check will be performed only once. With this tiny fix, the getCallerClass is 
gone:

 

!image-2020-08-26-10-49-59-400.png!

 

The benchmark result:
{noformat}
serializerPojo thrpt 100 487.706 ± 30.480 ops/ms
serializerPojo thrpt 100 569.828 ± 8.815 ops/m{noformat}
Which is +15% to throughput.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-138: Declarative Resource management

2020-08-26 Thread Xintong Song
Thanks for the quick response.

*Job prioritization, Allocation IDs, Minimum resource
requirements, SlotManager Implementation Plan:* Sounds good to me.

*FLIP-56*
Good point about the trade-off. I believe maximum resource utilization and
quick deployment are desired in different scenarios. E.g., a long running
streaming job deserves some deployment latency to improve the resource
utilization, which benefits the entire lifecycle of the job. On the other
hand, short batch queries may prefer quick deployment, otherwise the time
for resource allocation might significantly increase the response time.
It would be good enough for me to bring these questions to attention.
Nothing that I'm aware of should block this FLIP.

Thank you~

Xintong Song



On Wed, Aug 26, 2020 at 5:14 PM Chesnay Schepler  wrote:

> Thank you Xintong for your questions!
> Job prioritization
> Yes, the job which declares it's initial requirements first is prioritized.
> This is very much for simplicity; for example this avoids the nasty case
> where all jobs get some resources, but none get enough to actually run the
> job.
> Minimum resource requirements
>
> My bad; at some point we want to allow the JobMaster to declare a range of
> resources it could use to run a job, for example min=1, target=10,
> max=+inf.
>
> With this model, the RM would then try to balance the resources such that
> as many jobs as possible are as close to the target state as possible.
>
> Currently, the minimum/target/maximum resources are all the same. So the
> notification is sent whenever the current requirements cannot be met.
> Allocation IDs
> We do intend to, at the very least, remove AllocationIDs on the
> SlotManager side, as they are just not required there.
>
> On the slotpool side we have to keep them around at least until the
> existing Slotpool implementations are removed (not sure whether we'll fully
> commit to this in 1.12), since the interfaces use AllocationIDs, which also
> bleed into the JobMaster.
> The TaskExecutor is in a similar position.
> But in the long-term, yes they will be removed, and most usages will
> probably be replaced by the SlotID.
> FLIP-56
>
> Dynamic slot allocations are indeed quite interesting and raise a few
> questions; for example, the main purpose of it is to ensure maximum
> resource utilization. In that case, should the JobMaster be allowed to
> re-use a slot it if the task requires less resources than the slot
> provides, or should it always request a new slot that exactly matches?
>
> There is a trade-off to be made between maximum resource utilization
> (request exactly matching slots, and only re-use exact matches) and quicker
> job deployment (re-use slot even if they don't exactly match, skip
> round-trip to RM).
>
> As for how to handle the lack of a preemptively known SlotIDs, that should
> be fine in and of itself; we already handle a similar case when we request
> a new TaskExecutor to be started. So long as there is some way to know how
> many resources the TaskExecutor has in total I do not see a problem at the
> moment. We will get the SlotID eventually by virtue of the heartbeat
> SlotReport.
> Implementation plan (SlotManager)
> You are on the right track. The SlotManager tracks the declared resource
> requirements, and if the requirements increased it creates a SlotRequest,
> which then goes through similar code paths as we have at the moment (try to
> find a free slot, if found tell the TM, otherwise try to request new TM).
> The SlotManager changes are not that substantial to get a working version;
> we have a PoC and most of the work went into refactoring the SlotManager
> into a more manageable state. (split into several components, stricter and
> simplified Slot life-cycle, ...).
> Offer/free slots between JM/TM
> Gotta run, but that's a good question and I'll think about. But I think it
> comes down to making less changes, and being able to leverage existing
> reconciliation protocols.
> Do note that TaskExecutor also explicitly inform the RM about freed slots;
> the heartbeat slot report is just a safety net.
> I'm not sure whether slot requests are able to overtake a slot release;
> @till do you have thoughts on that?
> As for the race condition between the requirements reduction and slot
> release, if we run into problems we have the backup plan of only releasing
> the slot after the requirement reduction has been acknowledged.
>
> On 26/08/2020 10:31, Xintong Song wrote:
>
> Thanks for preparing the FLIP and driving this discussion, @Chesnay & @Till.
>
> I really like the idea. I see a great value in the proposed declarative
> resource management, in terms of flexibility, usability and efficiency.
>
> I have a few comments and questions regarding the FLIP design. In general,
> the protocol design makes good sense to me. My main concern is that it is
> not very clear to me what changes are required from the
> Resource/SlotManager side to adapt to the new protocol.
>
> *1. Distributed slots

Re: [ANNOUNCE] Apache Flink 1.10.2 released

2020-08-26 Thread Congxian Qiu
Thanks ZhuZhu for managing this release and everyone else who contributed
to this release!

Best,
Congxian


Xingbo Huang  于2020年8月26日周三 下午1:53写道:

> Thanks Zhu for the great work and everyone who contributed to this release!
>
> Best,
> Xingbo
>
> Guowei Ma  于2020年8月26日周三 下午12:43写道:
>
>> Hi,
>>
>> Thanks a lot for being the release manager Zhu Zhu!
>> Thanks everyone contributed to this!
>>
>> Best,
>> Guowei
>>
>>
>> On Wed, Aug 26, 2020 at 11:18 AM Yun Tang  wrote:
>>
>>> Thanks for Zhu's work to manage this release and everyone who
>>> contributed to this!
>>>
>>> Best,
>>> Yun Tang
>>> 
>>> From: Yangze Guo 
>>> Sent: Tuesday, August 25, 2020 14:47
>>> To: Dian Fu 
>>> Cc: Zhu Zhu ; dev ; user <
>>> u...@flink.apache.org>; user-zh 
>>> Subject: Re: [ANNOUNCE] Apache Flink 1.10.2 released
>>>
>>> Thanks a lot for being the release manager Zhu Zhu!
>>> Congrats to all others who have contributed to the release!
>>>
>>> Best,
>>> Yangze Guo
>>>
>>> On Tue, Aug 25, 2020 at 2:42 PM Dian Fu  wrote:
>>> >
>>> > Thanks ZhuZhu for managing this release and everyone else who
>>> contributed to this release!
>>> >
>>> > Regards,
>>> > Dian
>>> >
>>> > 在 2020年8月25日,下午2:22,Till Rohrmann  写道:
>>> >
>>> > Great news. Thanks a lot for being our release manager Zhu Zhu and to
>>> all others who have contributed to the release!
>>> >
>>> > Cheers,
>>> > Till
>>> >
>>> > On Tue, Aug 25, 2020 at 5:37 AM Zhu Zhu  wrote:
>>> >>
>>> >> The Apache Flink community is very happy to announce the release of
>>> Apache Flink 1.10.2, which is the first bugfix release for the Apache Flink
>>> 1.10 series.
>>> >>
>>> >> Apache Flink® is an open-source stream processing framework for
>>> distributed, high-performing, always-available, and accurate data streaming
>>> applications.
>>> >>
>>> >> The release is available for download at:
>>> >> https://flink.apache.org/downloads.html
>>> >>
>>> >> Please check out the release blog post for an overview of the
>>> improvements for this bugfix release:
>>> >> https://flink.apache.org/news/2020/08/25/release-1.10.2.html
>>> >>
>>> >> The full release notes are available in Jira:
>>> >>
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12347791
>>> >>
>>> >> We would like to thank all contributors of the Apache Flink community
>>> who made this release possible!
>>> >>
>>> >> Thanks,
>>> >> Zhu
>>> >
>>> >
>>>
>>


[jira] [Created] (FLINK-19053) Flink Kafka Connector Dependency Error

2020-08-26 Thread liugh (Jira)
liugh created FLINK-19053:
-

 Summary: Flink Kafka Connector Dependency Error
 Key: FLINK-19053
 URL: https://issues.apache.org/jira/browse/FLINK-19053
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.11.1
 Environment: Flink V1.11.0 or Flink V1.11.1
Reporter: liugh
 Attachments: 0.10.png, 0.11.png

>From the Flink doucumention 
>URL:[https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/connectors/kafka.html]

We can see the "Dependency" section as follows:

{{}}

{{  org.apache.flink}}

{{  
flink-connector-kafka-{color:#FF}*011*{color}_2.111.11.0}}

{{}}

{{}}

{{}}

{{  org.apache.flink}}

{{  
flink-connector-kafka-{color:#FF}*010*{color}_2.111.11.0}}

{{}}

{{}}

{{However,I couldn't get the correct jar in the pom.xml which was configured 
the dependency as shown above.}}

Then{{ I searched it in the [https://mvnrepository.com/] and aliyun maven,I 
found the dependency should be as follows:}}

{{}}

{{}}

{{}}

{{  org.apache.flink}}

{{}}

{{  
flink-connector-kafka-{color:#FF}*0.11*{color}_2.111.11.0}}

{{}}

{{}}

{{}}

{{}}

{{  org.apache.flink}}

{{  
flink-connector-kafka-{color:#FF}*0.10*{color}_2.111.11.0}}

{{}}

{{}}

{{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ANNOUNCE] Apache Flink 1.10.2 released

2020-08-26 Thread Leonard Xu
Thanks ZhuZhu for being the release manager and everyone who contributed to 
this release.

Best,
Leonard



Re: [ANNOUNCE] Apache Flink 1.10.2 released

2020-08-26 Thread liupengcheng
Thanks ZhuZhu for managing this release and everyone who contributed to this.

Best,
Pengcheng

在 2020/8/26 下午7:06,“Congxian Qiu” 写入:

Thanks ZhuZhu for managing this release and everyone else who contributed
to this release!

Best,
Congxian


Xingbo Huang  于2020年8月26日周三 下午1:53写道:

> Thanks Zhu for the great work and everyone who contributed to this 
release!
>
> Best,
> Xingbo
>
> Guowei Ma  于2020年8月26日周三 下午12:43写道:
>
>> Hi,
>>
>> Thanks a lot for being the release manager Zhu Zhu!
>> Thanks everyone contributed to this!
>>
>> Best,
>> Guowei
>>
>>
>> On Wed, Aug 26, 2020 at 11:18 AM Yun Tang  wrote:
>>
>>> Thanks for Zhu's work to manage this release and everyone who
>>> contributed to this!
>>>
>>> Best,
>>> Yun Tang
>>> 
>>> From: Yangze Guo 
>>> Sent: Tuesday, August 25, 2020 14:47
>>> To: Dian Fu 
>>> Cc: Zhu Zhu ; dev ; user <
>>> u...@flink.apache.org>; user-zh 
>>> Subject: Re: [ANNOUNCE] Apache Flink 1.10.2 released
>>>
>>> Thanks a lot for being the release manager Zhu Zhu!
>>> Congrats to all others who have contributed to the release!
>>>
>>> Best,
>>> Yangze Guo
>>>
>>> On Tue, Aug 25, 2020 at 2:42 PM Dian Fu  wrote:
>>> >
>>> > Thanks ZhuZhu for managing this release and everyone else who
>>> contributed to this release!
>>> >
>>> > Regards,
>>> > Dian
>>> >
>>> > 在 2020年8月25日,下午2:22,Till Rohrmann  写道:
>>> >
>>> > Great news. Thanks a lot for being our release manager Zhu Zhu and to
>>> all others who have contributed to the release!
>>> >
>>> > Cheers,
>>> > Till
>>> >
>>> > On Tue, Aug 25, 2020 at 5:37 AM Zhu Zhu  wrote:
>>> >>
>>> >> The Apache Flink community is very happy to announce the release of
>>> Apache Flink 1.10.2, which is the first bugfix release for the Apache 
Flink
>>> 1.10 series.
>>> >>
>>> >> Apache Flink® is an open-source stream processing framework for
>>> distributed, high-performing, always-available, and accurate data 
streaming
>>> applications.
>>> >>
>>> >> The release is available for download at:
>>> >> https://flink.apache.org/downloads.html
>>> >>
>>> >> Please check out the release blog post for an overview of the
>>> improvements for this bugfix release:
>>> >> https://flink.apache.org/news/2020/08/25/release-1.10.2.html
>>> >>
>>> >> The full release notes are available in Jira:
>>> >>
>>> 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12347791
>>> >>
>>> >> We would like to thank all contributors of the Apache Flink community
>>> who made this release possible!
>>> >>
>>> >> Thanks,
>>> >> Zhu
>>> >
>>> >
>>>
>>




[jira] [Created] (FLINK-19054) KafkaTableITCase.testKafkaSourceSink hangs

2020-08-26 Thread Dian Fu (Jira)
Dian Fu created FLINK-19054:
---

 Summary: KafkaTableITCase.testKafkaSourceSink hangs
 Key: FLINK-19054
 URL: https://issues.apache.org/jira/browse/FLINK-19054
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka, Table SQL / API
Affects Versions: 1.11.2
Reporter: Dian Fu


[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=5844&view=logs&j=c5f0071e-1851-543e-9a45-9ac140befc32&t=684b1416-4c17-504e-d5ab-97ee44e08a20]

{code}
2020-08-25T09:04:57.3569768Z "Kafka Fetcher for Source: KafkaTableSource(price, 
currency, log_date, log_time, log_ts) -> 
SourceConversion(table=[default_catalog.default_database.kafka, source: 
[KafkaTableSource(price, currency, log_date, log_time, log_ts)]], 
fields=[price, currency, log_date, log_time, log_ts]) -> Calc(select=[(price + 
1.0:DECIMAL(2, 1)) AS computed-price, price, currency, log_date, log_time, 
log_ts, (log_ts + 1000:INTERVAL SECOND) AS ts]) -> 
WatermarkAssigner(rowtime=[ts], watermark=[ts]) -> Calc(select=[ts, log_date, 
log_time, CAST(ts) AS ts0, price]) (1/1)" #1501 daemon prio=5 os_prio=0 
tid=0x7f25b800 nid=0x22b8 runnable [0x7f2127efd000]
2020-08-25T09:04:57.3571373Zjava.lang.Thread.State: RUNNABLE
2020-08-25T09:04:57.3571672Zat sun.nio.ch.FileDispatcherImpl.read0(Native 
Method)
2020-08-25T09:04:57.3572191Zat 
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
2020-08-25T09:04:57.3572921Zat 
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
2020-08-25T09:04:57.3573419Zat sun.nio.ch.IOUtil.read(IOUtil.java:197)
2020-08-25T09:04:57.3573957Zat 
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:377)
2020-08-25T09:04:57.3574809Z- locked <0xfde5a308> (a 
java.lang.Object)
2020-08-25T09:04:57.3575448Zat 
org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:103)
2020-08-25T09:04:57.3576309Zat 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:117)
2020-08-25T09:04:57.3577086Zat 
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:424)
2020-08-25T09:04:57.3577727Zat 
org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:385)
2020-08-25T09:04:57.3578403Zat 
org.apache.kafka.common.network.Selector.attemptRead(Selector.java:651)
2020-08-25T09:04:57.3579486Zat 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:572)
2020-08-25T09:04:57.3580240Zat 
org.apache.kafka.common.network.Selector.poll(Selector.java:483)
2020-08-25T09:04:57.3580880Zat 
org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:547)
2020-08-25T09:04:57.3581756Zat 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262)
2020-08-25T09:04:57.3583015Zat 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
2020-08-25T09:04:57.3583847Zat 
org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1300)
2020-08-25T09:04:57.3584555Zat 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1240)
2020-08-25T09:04:57.3585197Zat 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1168)
2020-08-25T09:04:57.3585961Zat 
org.apache.flink.streaming.connectors.kafka.internal.KafkaConsumerThread.run(KafkaConsumerThread.java:253)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-19055) MemoryManagerSharedResourcesTest contains three tests running extraordinary long

2020-08-26 Thread Matthias (Jira)
Matthias created FLINK-19055:


 Summary: MemoryManagerSharedResourcesTest contains three tests 
running extraordinary long
 Key: FLINK-19055
 URL: https://issues.apache.org/jira/browse/FLINK-19055
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Affects Versions: 1.11.1
Reporter: Matthias


The following tests are taking quite some time to succeed:
* {{MemoryManagerSharedResourcesTest::testLastReleaseReleasesMemory}}
* {{MemoryManagerSharedResourcesTest::testPartialReleaseDoesNotDisposeResource}}
* {{MemoryManagerSharedResourcesTest::testPartialReleaseDoesNotReleaseMemory}}

This is due to the fact that {{MemoryManager::verifyEmpty()}} causes 
exponentially-increasing sleep times in {{UnsafeMemoryBudget::reserveMemory}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-131: Consolidate the user-facing Dataflow SDKs/APIs (and deprecate the DataSet API)

2020-08-26 Thread Aljoscha Krettek

I added more changes to the FLIP to try and address comments.

You can see the changes from the last version here: 
https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=158866741&selectedPageVersions=31&selectedPageVersions=27


If no-one objects anymore I would like to proceed to a VOTE soon.

Best,
Aljoscha

On 30.07.20 17:36, Aljoscha Krettek wrote:
I see, we actually have some thoughts along that line as well. We have 
ideas about adding such functionality for `Transformation`, which is the 
graph structure that underlies both the DataStream API and the newer 
Table API Runner/Planner.


There a very rough PoC for that available at [1]. It's a very contrived 
example but it shows off what would be possible. The `Sink` interface 
here is just a subclass of the general `TransformationApply` [2] and we 
could envision a `DataStream.apply()` that let's you apply these general 
transformation "bundles".


Keep in mind that this is just rough early ideas and the naming/location 
of things is somewhat rough. And we might not do it like this in the end.


Best,
Aljoscha

[1] 
https://github.com/aljoscha/flink/blob/poc-transform-apply-sink/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/windowing/SinkExample.java 



[2] 
https://github.com/aljoscha/flink/blob/poc-transform-apply-sink/flink-core/src/main/java/org/apache/flink/api/dag/TransformationApply.java 



On 30.07.20 17:26, Flavio Pompermaier wrote:

We use runCustomOperation to group a set of operators and into a single
functional unit, just to make the code more modular..
It's very comfortable indeed.

On Thu, Jul 30, 2020 at 5:20 PM Aljoscha Krettek 
wrote:


That is good input! I was not aware that anyone was actually using
`runCustomOperation()`. Out of curiosity, what are you using that for?

We have definitely thought about the first two points you mentioned,
though. Especially processing-time will make it tricky to define unified
execution semantics.

Best,
Aljoscha

On 30.07.20 17:10, Flavio Pompermaier wrote:

I just wanted to be propositive about missing api.. :D

On Thu, Jul 30, 2020 at 4:29 PM Seth Wiesman 

wrote:



+1 Its time to drop DataSet

Flavio, those issues are expected. This FLIP isn't just to drop 
DataSet

but to also add the necessary enhancements to DataStream such that it

works

well on bounded input.

On Thu, Jul 30, 2020 at 8:49 AM Flavio Pompermaier <

pomperma...@okkam.it>

wrote:


Just to contribute to the discussion, when we tried to do the

migration we

faced some problems that could make migration quite difficult.
1 - It's difficult to test because of
https://issues.apache.org/jira/browse/FLINK-18647
2 - missing mapPartition
3 - missing   DataSet runOperation(CustomUnaryOperation
operation)

On Thu, Jul 30, 2020 at 12:40 PM Arvid Heise 

wrote:



+1 of getting rid of the DataSet API. Is DataStream#iterate already
superseding DataSet iterations or would that also need to be 
accounted

for?


In general, all surviving APIs should also offer a smooth experience

for

switching back and forth.

On Thu, Jul 30, 2020 at 9:39 AM Márton Balassi <

balassi.mar...@gmail.com>

wrote:


Hi All,

Thanks for the write up and starting the discussion. I am in 
favor of

unifying the APIs the way described in the FLIP and deprecating the

DataSet

API. I am looking forward to the detailed discussion of the changes
necessary.

Best,
Marton

On Wed, Jul 29, 2020 at 12:46 PM Aljoscha Krettek <

aljos...@apache.org>

wrote:


Hi Everyone,

my colleagues (in cc) and I would like to propose this FLIP for
discussion. In short, we want to reduce the number of APIs that we

have
by deprecating the DataSet API. This is a big step for Flink, 
that's

why

I'm also cross-posting this to the User Mailing List.

FLIP-131: http://s.apache.org/FLIP-131

I'm posting the introduction of the FLIP below but please refer to

the

document linked above for the full details:

--
Flink provides three main SDKs/APIs for writing Dataflow Programs:

Table

API/SQL, the DataStream API, and the DataSet API. We believe that

this

is one API too many and propose to deprecate the DataSet API in

favor of
the Table API/SQL and the DataStream API. Of course, this is 
easier

said

than done, so in the following, we will outline why we think that

having

too many APIs is detrimental to the project and community. We will

then
describe how we can enhance the Table API/SQL and the 
DataStream API

to

subsume the DataSet API's functionality.

In this FLIP, we will not describe all the technical details of 
how

the

Table API/SQL and DataStream will be enhanced. The goal is to

achieve

consensus on the idea of deprecating the DataSet API. There will

have to
be follow-up FLIPs that describe the necessary changes for the 
APIs

that

we maintain.
--

Please let us know if you have any concerns or comments. Also,

please
keep discussion to this ML thread instead of commenting in the 
Wiki

so

that we

Re: [ANNOUNCE] Introducing the GSoD 2020 Participants.

2020-08-26 Thread Kartik Khare
Hi all,
It's a great opportunity to get to work with you guys. I have always
admired Flink's performance and simplicity and have been looking forward to
contribute more.

Looking forward to exciting next 3 months.

Regards,
Kartik

On Wed, 26 Aug 2020, 14:42 Marta Paes Moreira,  wrote:

> Hi, Everyone!
>
> I'd like to officially welcome the applicants that were selected to work
> with the Flink community for this year's Google Season of Docs (GSoD) [1]: 
> *Kartik
> Khare* and *Muhammad Haseeb Asif*!
>
>- Kartik [2] is a software engineer at Walmart Labs and a regular
>contributor to multiple Apache projects. He is also a prolific writer on
>Medium and has previously published on the Flink blog. Last year, he
>contributed to Apache Airflow as part of GSoD and he's currently revamping
>the Apache Pinot documentation.
>
>
>- Muhammad [3] is a dual degree master student at KTH and TU Berlin,
>focusing on distributed systems and data intensive processing (in
>particular, performance optimization of state backends). He writes
>frequently about Flink on Medium and you can catch him and his colleague
>Sruthi this Friday at Beam Summit [4]!
>
> They will be working to improve the Table API/SQL documentation over a
> 3-month period, with the support of Aljoscha and Seth as mentors.
>
> Please give them a warm welcome to the Flink developer community!
>
> Marta
>
> [1] https://developers.google.com/season-of-docs/docs/participants
> [2] https://github.com/KKcorps
> [3] https://www.linkedin.com/in/haseebasif/
> [4] https://2020.beamsummit.org/sessions/nexmark-beam-flinkndb/
>


Re: [ANNOUNCE] Introducing the GSoD 2020 Participants.

2020-08-26 Thread Konstantin Knauf
Welcome Kartik & Muhammad! Looking very much forward to your contributions
:)

On Wed, Aug 26, 2020 at 5:52 PM Kartik Khare  wrote:

> Hi all,
> It's a great opportunity to get to work with you guys. I have always
> admired Flink's performance and simplicity and have been looking forward to
> contribute more.
>
> Looking forward to exciting next 3 months.
>
> Regards,
> Kartik
>
> On Wed, 26 Aug 2020, 14:42 Marta Paes Moreira, 
> wrote:
>
> > Hi, Everyone!
> >
> > I'd like to officially welcome the applicants that were selected to work
> > with the Flink community for this year's Google Season of Docs (GSoD)
> [1]: *Kartik
> > Khare* and *Muhammad Haseeb Asif*!
> >
> >- Kartik [2] is a software engineer at Walmart Labs and a regular
> >contributor to multiple Apache projects. He is also a prolific writer
> on
> >Medium and has previously published on the Flink blog. Last year, he
> >contributed to Apache Airflow as part of GSoD and he's currently
> revamping
> >the Apache Pinot documentation.
> >
> >
> >- Muhammad [3] is a dual degree master student at KTH and TU Berlin,
> >focusing on distributed systems and data intensive processing (in
> >particular, performance optimization of state backends). He writes
> >frequently about Flink on Medium and you can catch him and his
> colleague
> >Sruthi this Friday at Beam Summit [4]!
> >
> > They will be working to improve the Table API/SQL documentation over a
> > 3-month period, with the support of Aljoscha and Seth as mentors.
> >
> > Please give them a warm welcome to the Flink developer community!
> >
> > Marta
> >
> > [1] https://developers.google.com/season-of-docs/docs/participants
> > [2] https://github.com/KKcorps
> > [3] https://www.linkedin.com/in/haseebasif/
> > [4] https://2020.beamsummit.org/sessions/nexmark-beam-flinkndb/
> >
>


-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: [ANNOUNCE] Introducing the GSoD 2020 Participants.

2020-08-26 Thread Till Rohrmann
Welcome Muhammad and Kartik! Thanks a lot for helping us with improving
Flink's documentation.

Cheers,
Till

On Wed, Aug 26, 2020 at 7:32 PM Konstantin Knauf  wrote:

> Welcome Kartik & Muhammad! Looking very much forward to your contributions
> :)
>
> On Wed, Aug 26, 2020 at 5:52 PM Kartik Khare 
> wrote:
>
> > Hi all,
> > It's a great opportunity to get to work with you guys. I have always
> > admired Flink's performance and simplicity and have been looking forward
> to
> > contribute more.
> >
> > Looking forward to exciting next 3 months.
> >
> > Regards,
> > Kartik
> >
> > On Wed, 26 Aug 2020, 14:42 Marta Paes Moreira, 
> > wrote:
> >
> > > Hi, Everyone!
> > >
> > > I'd like to officially welcome the applicants that were selected to
> work
> > > with the Flink community for this year's Google Season of Docs (GSoD)
> > [1]: *Kartik
> > > Khare* and *Muhammad Haseeb Asif*!
> > >
> > >- Kartik [2] is a software engineer at Walmart Labs and a regular
> > >contributor to multiple Apache projects. He is also a prolific
> writer
> > on
> > >Medium and has previously published on the Flink blog. Last year, he
> > >contributed to Apache Airflow as part of GSoD and he's currently
> > revamping
> > >the Apache Pinot documentation.
> > >
> > >
> > >- Muhammad [3] is a dual degree master student at KTH and TU Berlin,
> > >focusing on distributed systems and data intensive processing (in
> > >particular, performance optimization of state backends). He writes
> > >frequently about Flink on Medium and you can catch him and his
> > colleague
> > >Sruthi this Friday at Beam Summit [4]!
> > >
> > > They will be working to improve the Table API/SQL documentation over a
> > > 3-month period, with the support of Aljoscha and Seth as mentors.
> > >
> > > Please give them a warm welcome to the Flink developer community!
> > >
> > > Marta
> > >
> > > [1] https://developers.google.com/season-of-docs/docs/participants
> > > [2] https://github.com/KKcorps
> > > [3] https://www.linkedin.com/in/haseebasif/
> > > [4] https://2020.beamsummit.org/sessions/nexmark-beam-flinkndb/
> > >
> >
>
>
> --
>
> Konstantin Knauf
>
> https://twitter.com/snntrable
>
> https://github.com/knaufk
>


[jira] [Created] (FLINK-19056) Investigate multipart upload performance regression

2020-08-26 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-19056:


 Summary: Investigate multipart upload performance regression
 Key: FLINK-19056
 URL: https://issues.apache.org/jira/browse/FLINK-19056
 Project: Flink
  Issue Type: Task
  Components: Runtime / REST
Affects Versions: 1.12.0
Reporter: Chesnay Schepler
 Fix For: 1.12.0


When using Netty 4.1.50 the multipart upload of files is more than a 100 times 
slower in the {{FileUploadHandlerTest}}.

This test has traditionally been somewhat heavy, since it repeatedly tests the 
upload of 60mb files.

On my machine this test currently finishes in 2-3 seconds, but with the 
upgraded Netty version it runs for several _minutes_ instead. I have not 
verified yet whether this is purely an issue of the test, but I would consider 
it unlikely.

This would make Flink effectively unusable when uploading larger jars or 
JobGraphs.

 

My theore is that is due to [this|https://github.com/netty/netty/pull/10226] 
change in Netty.

Before this change, the {{HttpPostMultipartRequestDecoder}} was always creating 
unpooled heap buffers for _something_; after the change the buffer type is 
dependent on the input buffer. The input buffer is a direct one, so my 
conclusion is that with the upgrade we ended up allocating more direct buffers 
than we did previously.

 

One solution I found was to explicitly create an {{UnpooledByteBufAllocator}} 
for the {{RestServerEndpoint}} that prefers heap buffers, which results in the 
input buffer to be a heap buffer, and thus we are never allocating direct ones.

However, this should also imply that we are creating more heap buffers than we 
did in the previously; I don't know how much of a problem that is. It would 
seem a reasonable thing to do since we at least should be able to skip a bunch 
of memory copies?

 

On a somewhat related note, we could think about increasing the chunkSize from 
8kb to 64kb to reduce the GC pressure a bit, along with some arenas for the 
REST API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-19057) Avoid usage of GlobalConfiguration in ResourceManager

2020-08-26 Thread Xintong Song (Jira)
Xintong Song created FLINK-19057:


 Summary: Avoid usage of GlobalConfiguration in ResourceManager 
 Key: FLINK-19057
 URL: https://issues.apache.org/jira/browse/FLINK-19057
 Project: Flink
  Issue Type: Task
  Components: Runtime / Coordination
Reporter: Xintong Song


This is a follow up of this PR 
[discussion|https://github.com/apache/flink/pull/13186/#discussion_r476459874].

On Kubernetes/Yarn deployments, resource manager try to compare the effective 
configurations with the original configuration file shipped from client, and 
only set the differences to dynamic properties for task managers.

During which, {{GlobalConfiguration.loadConfiguration()}} is used for getting 
the original configuration file. The strongly relies on that Kubernetes/Yarn 
entry points do not support custom configuration directories, which is true at 
the moment but brittle in future.

It would be better to rethink the usage of GlobalConfiguration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Whether catalog table factory should be used for temporary tables

2020-08-26 Thread Rui Li
Hi everyone,

According to the feedback, it seems adding `isTemporary` to factory context
is the preferred way to fix the issue. I'll go ahead and make the change if
no one objects.

On Tue, Aug 25, 2020 at 5:36 PM Rui Li  wrote:

> Hi,
>
> Thanks everyone for your inputs.
>
> Temporary hive table is not supported at the moment. If we want to support
> it, I agree with Jingsong that the life cycle of the temporary table should
> somehow be bound to the hive catalog. For example, hive catalog should be
> responsible to delete the table folder when the temporary table is dropped,
> or when hive catalog itself gets unregistered. Whether such a table should
> be stored in metastore is another topic and open to discussion. Hive
> doesn't store temporary tables in metastore, so we probably won't want to
> do it either.
>
> I'm fine with either of the solutions proposed. I think we can add the
> `isTemporary` flag to factory context, even though temporary tables
> currently don't belong to a catalog. Because conceptually, whether a table
> is temporary should be part of the context that a factory may be interested
> in.
>
> On Tue, Aug 25, 2020 at 4:48 PM Jingsong Li 
> wrote:
>
>> Hi Jark,
>>
>> You raised a good point: Creating the Hive temporary table.
>> AFAIK, Hive temporary tables should be stored in metastore, Hive
>> metastore will maintain their life cycle. Correct me if I am wrong.
>>
>> So actually, if we want to support Hive temporary tables, we should
>> finish one thing:
>> - A temporary table should belong to Catalog.
>> - Instead of current, a temporary table belongs to CatalogManager.
>>
>> It means, `createTemporaryTable` and `dropTemporaryTable` should be
>> proxied into the Catalog.
>> In this situation, actually, we don't need the "isTemporary" flag. (But
>> we can have it too)
>>
>> Best,
>> Jingsong
>>
>> On Tue, Aug 25, 2020 at 4:32 PM Jark Wu  wrote:
>>
>>> Hi,
>>>
>>> I'm wondering if we always fallback to using SPI for temporary tables,
>>> then
>>> how does the create Hive temporary table using Hive dialect work?
>>>
>>> IMO, adding an "isTemporary" to the factory context sounds reasonable to
>>> me, because the factory context should describe the full content of
>>> create
>>> table DDL.
>>>
>>> Best,
>>> Jark
>>>
>>>
>>> On Tue, 25 Aug 2020 at 16:01, Jingsong Li 
>>> wrote:
>>>
>>> > Hi Dawid,
>>> >
>>> > But the temporary table does not belong to Catalog, actually
>>> > Catalog doesn't know the existence of the temporary table. Let the
>>> table
>>> > factory of catalog to create source/sink sounds a little sudden.
>>> >
>>> > If we want to make temporary tables belong to Catalog, I think we need
>>> to
>>> > involve catalog when creating temporary tables.
>>> >
>>> > Best,
>>> > Jingsong
>>> >
>>> > On Tue, Aug 25, 2020 at 3:55 PM Dawid Wysakowicz <
>>> dwysakow...@apache.org>
>>> > wrote:
>>> >
>>> > > Hi Rui,
>>> > >
>>> > > My take is that temporary tables should use the factory of the
>>> catalog
>>> > > they were registered with.
>>> > >
>>> > > What you are describing sounds very much like a limitation/bug in
>>> Hive
>>> > > catalog only. I'd be in favor of passing the *isTemporary* flag.
>>> > >
>>> > > Best,
>>> > >
>>> > > Dawid
>>> > >
>>> > > On 25/08/2020 09:37, Rui Li wrote:
>>> > > > Hi Dev,
>>> > > >
>>> > > > Currently temporary generic tables cannot work with hive catalog
>>> [1].
>>> > > When
>>> > > > hive catalog is chosen as the current catalog, planner will use
>>> > > > HiveTableFactory to create source/sink for the temporary
>>> > > > table. HiveTableFactory cannot tell whether a table is temporary or
>>> > not,
>>> > > > and considers it as a Hive table, which leads to job failure.
>>> > > > I've discussed with Jingsong offline and we believe one solution
>>> is to
>>> > > make
>>> > > > planner avoid using catalog table factory for temporary tables.
>>> But I'd
>>> > > > also like to hear more opinions from others whether this is the
>>> right
>>> > way
>>> > > > to go. I think a possible alternative is to add an *isTemporary*
>>> field
>>> > > > to TableSourceFactory.Context & TableSinkFactory.Context, so that
>>> > > > HiveTableFactory knows how to handle such tables. What do you
>>> think?
>>> > > >
>>> > > > [1] https://issues.apache.org/jira/browse/FLINK-18999
>>> > > >
>>> > >
>>> > >
>>> >
>>> > --
>>> > Best, Jingsong Lee
>>> >
>>>
>>
>>
>> --
>> Best, Jingsong Lee
>>
>
>
> --
> Best regards!
> Rui Li
>


-- 
Best regards!
Rui Li


Re: [ANNOUNCE] Introducing the GSoD 2020 Participants.

2020-08-26 Thread Jark Wu
Welcome Kartik and Muhammad! Thanks in advance for helping improve Flink
documentation.

Best,
Jark

On Thu, 27 Aug 2020 at 03:59, Till Rohrmann  wrote:

> Welcome Muhammad and Kartik! Thanks a lot for helping us with improving
> Flink's documentation.
>
> Cheers,
> Till
>
> On Wed, Aug 26, 2020 at 7:32 PM Konstantin Knauf 
> wrote:
>
> > Welcome Kartik & Muhammad! Looking very much forward to your
> contributions
> > :)
> >
> > On Wed, Aug 26, 2020 at 5:52 PM Kartik Khare 
> > wrote:
> >
> > > Hi all,
> > > It's a great opportunity to get to work with you guys. I have always
> > > admired Flink's performance and simplicity and have been looking
> forward
> > to
> > > contribute more.
> > >
> > > Looking forward to exciting next 3 months.
> > >
> > > Regards,
> > > Kartik
> > >
> > > On Wed, 26 Aug 2020, 14:42 Marta Paes Moreira, 
> > > wrote:
> > >
> > > > Hi, Everyone!
> > > >
> > > > I'd like to officially welcome the applicants that were selected to
> > work
> > > > with the Flink community for this year's Google Season of Docs (GSoD)
> > > [1]: *Kartik
> > > > Khare* and *Muhammad Haseeb Asif*!
> > > >
> > > >- Kartik [2] is a software engineer at Walmart Labs and a regular
> > > >contributor to multiple Apache projects. He is also a prolific
> > writer
> > > on
> > > >Medium and has previously published on the Flink blog. Last year,
> he
> > > >contributed to Apache Airflow as part of GSoD and he's currently
> > > revamping
> > > >the Apache Pinot documentation.
> > > >
> > > >
> > > >- Muhammad [3] is a dual degree master student at KTH and TU
> Berlin,
> > > >focusing on distributed systems and data intensive processing (in
> > > >particular, performance optimization of state backends). He writes
> > > >frequently about Flink on Medium and you can catch him and his
> > > colleague
> > > >Sruthi this Friday at Beam Summit [4]!
> > > >
> > > > They will be working to improve the Table API/SQL documentation over
> a
> > > > 3-month period, with the support of Aljoscha and Seth as mentors.
> > > >
> > > > Please give them a warm welcome to the Flink developer community!
> > > >
> > > > Marta
> > > >
> > > > [1] https://developers.google.com/season-of-docs/docs/participants
> > > > [2] https://github.com/KKcorps
> > > > [3] https://www.linkedin.com/in/haseebasif/
> > > > [4] https://2020.beamsummit.org/sessions/nexmark-beam-flinkndb/
> > > >
> > >
> >
> >
> > --
> >
> > Konstantin Knauf
> >
> > https://twitter.com/snntrable
> >
> > https://github.com/knaufk
> >
>


[jira] [Created] (FLINK-19058) Dataset transformations document page error

2020-08-26 Thread Fin-chan (Jira)
Fin-chan created FLINK-19058:


 Summary: Dataset transformations document page error
 Key: FLINK-19058
 URL: https://issues.apache.org/jira/browse/FLINK-19058
 Project: Flink
  Issue Type: Bug
  Components: API / DataSet, Documentation
Affects Versions: 1.11.0
Reporter: Fin-chan


In OuterJoin with Flat-Join Function 

Java:

public void join(Tuple2 movie, Rating rating

Collector> out)

Missing ","



--
This message was sent by Atlassian Jira
(v8.3.4#803005)