[jira] [Created] (FLINK-19051) Exception message should be reserved in log When the Job Manager failed
Pua created FLINK-19051: --- Summary: Exception message should be reserved in log When the Job Manager failed Key: FLINK-19051 URL: https://issues.apache.org/jira/browse/FLINK-19051 Project: Flink Issue Type: Improvement Components: Runtime / Task Affects Versions: 1.11.1, 1.12.0 Reporter: Pua {code:java} protected void closeJobManagerConnection(JobID jobId, Exception cause) { JobManagerRegistration jobManagerRegistration = jobManagerRegistrations.remove(jobId); if (jobManagerRegistration != null) { final ResourceID jobManagerResourceId = jobManagerRegistration.getJobManagerResourceID(); final JobMasterGateway jobMasterGateway = jobManagerRegistration.getJobManagerGateway(); final JobMasterId jobMasterId = jobManagerRegistration.getJobMasterId(); log.info("Disconnect job manager {}@{} for job {} from the resource manager.", jobMasterId, jobMasterGateway.getAddress(), jobId); jobManagerHeartbeatManager.unmonitorTarget(jobManagerResourceId); jmResourceIdRegistrations.remove(jobManagerResourceId); // tell the job manager about the disconnect jobMasterGateway.disconnectResourceManager(getFencingToken(), cause); } else { log.debug("There was no registered job manager for job {}.", jobId); } {code} Exception message should be reserved in log. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] FLIP-138: Declarative Resource management
Thanks for preparing the FLIP and driving this discussion, @Chesnay & @Till. I really like the idea. I see a great value in the proposed declarative resource management, in terms of flexibility, usability and efficiency. I have a few comments and questions regarding the FLIP design. In general, the protocol design makes good sense to me. My main concern is that it is not very clear to me what changes are required from the Resource/SlotManager side to adapt to the new protocol. *1. Distributed slots across different jobs* Jobs which register their requirements first, will have precedence over > other jobs also if the requirements change during the runtime. Just trying to understand, does this mean jobs are prioritized by the order of their first resource declaring? *2. AllocationID* Is this FLIP suggesting to completely remove AllocationID? I'm fine with this change. It seems where AllocationID is used can either be removed or be replaced by JobID. This reflects the concept that slots are now assigned to a job instead of its individual slot requests. I would like to bring to attention that this also requires changes on the TM side, with respect to FLIP-56[1]. In the context of dynamic slot allocation introduced by FLIP-56, slots do not pre-exist on TM and are dynamically created when RM calls TaskExecutorGateway.requestSlot. Since the slots do not pre-exist, nor their SlotIDs, RM requests slots from TM with a special SlotID (negative slot index). The semantic changes from "requesting the slot identified by the given SlotID" to "requesting a slot with the given resource profile". The AllocationID is used for identifying the dynamic slots in such cases. >From the perspective of FLIP-56 and fine grained resource management, I'm fine with removing AllocationID. In the meantime, we would need TM to recognize the special negative indexed SlotID and generate a new unique SlotID for identifying the slot. *3. Minimum resource requirement* However, we can let the JobMaster know if we cannot fulfill the minimum > resource requirement for a job after > resourcemanager.standalone.start-up-time has passed. What is the "minimum resource requirement for a job"? Did I overlook anything? *4. Offer/free slots between JM/TM* This probably deserves a separate discussion thread. Just want to bring it up. The idea has been coming to me for quite some time. Is this design, that JM requests resources from RM while accepting/releasing resources from/to TM, the right thing? The pain point is that events of JM's activities (requesting/releasing resources) arrive at RM out of order. This leads to several problems. - When a job fails and task cancelation takes long, some of the slots might be released from the slot pool due to being unused for a while. Then the job restarts and requests these slots again. At this time, RM may receive slot requests before noticing from TM heartbeats that previous slots are released, thus requesting new resources. I've seen many times that the Yarn cluster has a heavy load and is not allocating resources quickly enough, which leads to slot request timeout and job failover, and during the failover more resources are requested which adds more load to the Yarn cluster. Happily, this should be improved with the declarative resource management. :) - As described in this FLIP, it is possible that RM learns the releasing of slots from TM heartbeat before noticing the resource requirement decreasing, it may allocate more resources which need to be released soon. - It complicates the ResourceManager/SlotManager, by requiring an additional slot state PENDING, which means the slot is assigned by RM but is not confirmed successfully ordered by TM. Why not just make RM offer the allocated resources (TM address, SlotID, etc.) to JM, and JM release resources to RM? So that for all the resource management JM talks to RM, and for the task deployment and execution it talks to TM? I tried to understand the benefits for having the current design, and found the following in FLIP-6[2]. > All that the ResourceManager does is negotiate between the > cluster-manager, the JobManager, and the TaskManagers. Its state can hence > be reconstructed from re-acquiring containers and re-registration from > JobManagers and TaskManagers Correct me if I'm wrong, it seems the original purpose is to make sure the assignment between jobs and slots are confirmed between JM and TMs, so that failures of RM will not lead to any inconsistency. However, this only benefits scenarios where RM fails while JM and TMs live. Currently, JM and RM are in the same process. We do not really have any scenario where RM fails alone. We might separate JM and RM to different processes in future, but as far as I can see we don't have such requirements at the moment. It seems to me that we are suffering the current problems, complying to potential future benefits. Maybe I overlooked something.
[ANNOUNCE] Introducing the GSoD 2020 Participants.
Hi, Everyone! I'd like to officially welcome the applicants that were selected to work with the Flink community for this year's Google Season of Docs (GSoD) [1]: *Kartik Khare* and *Muhammad Haseeb Asif*! - Kartik [2] is a software engineer at Walmart Labs and a regular contributor to multiple Apache projects. He is also a prolific writer on Medium and has previously published on the Flink blog. Last year, he contributed to Apache Airflow as part of GSoD and he's currently revamping the Apache Pinot documentation. - Muhammad [3] is a dual degree master student at KTH and TU Berlin, focusing on distributed systems and data intensive processing (in particular, performance optimization of state backends). He writes frequently about Flink on Medium and you can catch him and his colleague Sruthi this Friday at Beam Summit [4]! They will be working to improve the Table API/SQL documentation over a 3-month period, with the support of Aljoscha and Seth as mentors. Please give them a warm welcome to the Flink developer community! Marta [1] https://developers.google.com/season-of-docs/docs/participants [2] https://github.com/KKcorps [3] https://www.linkedin.com/in/haseebasif/ [4] https://2020.beamsummit.org/sessions/nexmark-beam-flinkndb/
Re: [DISCUSS] FLIP-138: Declarative Resource management
Thank you Xintong for your questions! Job prioritization Yes, the job which declares it's initial requirements first is prioritized. This is very much for simplicity; for example this avoids the nasty case where all jobs get some resources, but none get enough to actually run the job. Minimum resource requirements My bad; at some point we want to allow the JobMaster to declare a range of resources it could use to run a job, for example min=1, target=10, max=+inf. With this model, the RM would then try to balance the resources such that as many jobs as possible are as close to the target state as possible. Currently, the minimum/target/maximum resources are all the same. So the notification is sent whenever the current requirements cannot be met. Allocation IDs We do intend to, at the very least, remove AllocationIDs on the SlotManager side, as they are just not required there. On the slotpool side we have to keep them around at least until the existing Slotpool implementations are removed (not sure whether we'll fully commit to this in 1.12), since the interfaces use AllocationIDs, which also bleed into the JobMaster. The TaskExecutor is in a similar position. But in the long-term, yes they will be removed, and most usages will probably be replaced by the SlotID. FLIP-56 Dynamic slot allocations are indeed quite interesting and raise a few questions; for example, the main purpose of it is to ensure maximum resource utilization. In that case, should the JobMaster be allowed to re-use a slot it if the task requires less resources than the slot provides, or should it always request a new slot that exactly matches? There is a trade-off to be made between maximum resource utilization (request exactly matching slots, and only re-use exact matches) and quicker job deployment (re-use slot even if they don't exactly match, skip round-trip to RM). As for how to handle the lack of a preemptively known SlotIDs, that should be fine in and of itself; we already handle a similar case when we request a new TaskExecutor to be started. So long as there is some way to know how many resources the TaskExecutor has in total I do not see a problem at the moment. We will get the SlotID eventually by virtue of the heartbeat SlotReport. Implementation plan (SlotManager) You are on the right track. The SlotManager tracks the declared resource requirements, and if the requirements increased it creates a SlotRequest, which then goes through similar code paths as we have at the moment (try to find a free slot, if found tell the TM, otherwise try to request new TM). The SlotManager changes are not that substantial to get a working version; we have a PoC and most of the work went into refactoring the SlotManager into a more manageable state. (split into several components, stricter and simplified Slot life-cycle, ...). Offer/free slots between JM/TM Gotta run, but that's a good question and I'll think about. But I think it comes down to making less changes, and being able to leverage existing reconciliation protocols. Do note that TaskExecutor also explicitly inform the RM about freed slots; the heartbeat slot report is just a safety net. I'm not sure whether slot requests are able to overtake a slot release; @till do you have thoughts on that? As for the race condition between the requirements reduction and slot release, if we run into problems we have the backup plan of only releasing the slot after the requirement reduction has been acknowledged. On 26/08/2020 10:31, Xintong Song wrote: Thanks for preparing the FLIP and driving this discussion, @Chesnay & @Till. I really like the idea. I see a great value in the proposed declarative resource management, in terms of flexibility, usability and efficiency. I have a few comments and questions regarding the FLIP design. In general, the protocol design makes good sense to me. My main concern is that it is not very clear to me what changes are required from the Resource/SlotManager side to adapt to the new protocol. *1. Distributed slots across different jobs* Jobs which register their requirements first, will have precedence over other jobs also if the requirements change during the runtime. Just trying to understand, does this mean jobs are prioritized by the order of their first resource declaring? *2. AllocationID* Is this FLIP suggesting to completely remove AllocationID? I'm fine with this change. It seems where AllocationID is used can either be removed or be replaced by JobID. This reflects the concept that slots are now assigned to a job instead of its individual slot requests. I would like to bring to attention that this also requires changes on the TM side, with respect to FLIP-56[1]. In the context of dynamic slot allocation introduced by FLIP-56, slots do not pre-exist on TM and are dynamically created when RM calls TaskExecutorGateway.requestSlot. Since
[DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing
Hi devs, I'd like to bring the discussion over FLIP-141[1], which proposes how managed memory should be shared by various use cases within a slot. This is an extension to FLIP-53[2], where we assumed that RocksDB state backend and batch operators are the only use cases of managed memory for streaming and batch jobs respectively, which is no longer true with the introduction of Python UDFs. Please notice that we have not reached consensus between two different designs. The major part of this FLIP describes one of the candidates, while the alternative is discussed in the section "Rejected Alternatives". We are hoping to borrow intelligence from the community to help us resolve the disagreement. Any feedback would be appreciated. Thank you~ Xintong Song [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-141%3A+Intra-Slot+Managed+Memory+Sharing#FLIP141:IntraSlotManagedMemorySharing-compatibility [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
[jira] [Created] (FLINK-19052) Performance issue with PojoSerializer
Roman Grebennikov created FLINK-19052: - Summary: Performance issue with PojoSerializer Key: FLINK-19052 URL: https://issues.apache.org/jira/browse/FLINK-19052 Project: Flink Issue Type: Improvement Components: API / Type Serialization System Affects Versions: 1.11.1 Environment: Flink 1.12 master on 26.08.2020 Reporter: Roman Grebennikov Attachments: image-2020-08-26-10-46-19-800.png, image-2020-08-26-10-49-59-400.png Currently PojoSerializer.createInstance() uses reflection call to create a class instance. As this method is called on each stream element on deserialization, reflection overhead can become noticeable in serialization-bound cases when: # Pojo class is small, so instantiation is noticeable. # The job is not having heavy CPU-bound event processing. See this flamegraph built for flink-benchmarks/SerializationFrameworkMiniBenchmarks.serializerPojo benchmark: !image-2020-08-26-10-46-19-800.png! This Reflection.getCallerClass method consumes a lot of CPU, mostly doing a security check if we allowed to do this reflective call. There is no true reason to perform this check on each deserializing event, so to speed things up we can just cache the constructor using MetaHandle, so this check will be performed only once. With this tiny fix, the getCallerClass is gone: !image-2020-08-26-10-49-59-400.png! The benchmark result: {noformat} serializerPojo thrpt 100 487.706 ± 30.480 ops/ms serializerPojo thrpt 100 569.828 ± 8.815 ops/m{noformat} Which is +15% to throughput. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] FLIP-138: Declarative Resource management
Thanks for the quick response. *Job prioritization, Allocation IDs, Minimum resource requirements, SlotManager Implementation Plan:* Sounds good to me. *FLIP-56* Good point about the trade-off. I believe maximum resource utilization and quick deployment are desired in different scenarios. E.g., a long running streaming job deserves some deployment latency to improve the resource utilization, which benefits the entire lifecycle of the job. On the other hand, short batch queries may prefer quick deployment, otherwise the time for resource allocation might significantly increase the response time. It would be good enough for me to bring these questions to attention. Nothing that I'm aware of should block this FLIP. Thank you~ Xintong Song On Wed, Aug 26, 2020 at 5:14 PM Chesnay Schepler wrote: > Thank you Xintong for your questions! > Job prioritization > Yes, the job which declares it's initial requirements first is prioritized. > This is very much for simplicity; for example this avoids the nasty case > where all jobs get some resources, but none get enough to actually run the > job. > Minimum resource requirements > > My bad; at some point we want to allow the JobMaster to declare a range of > resources it could use to run a job, for example min=1, target=10, > max=+inf. > > With this model, the RM would then try to balance the resources such that > as many jobs as possible are as close to the target state as possible. > > Currently, the minimum/target/maximum resources are all the same. So the > notification is sent whenever the current requirements cannot be met. > Allocation IDs > We do intend to, at the very least, remove AllocationIDs on the > SlotManager side, as they are just not required there. > > On the slotpool side we have to keep them around at least until the > existing Slotpool implementations are removed (not sure whether we'll fully > commit to this in 1.12), since the interfaces use AllocationIDs, which also > bleed into the JobMaster. > The TaskExecutor is in a similar position. > But in the long-term, yes they will be removed, and most usages will > probably be replaced by the SlotID. > FLIP-56 > > Dynamic slot allocations are indeed quite interesting and raise a few > questions; for example, the main purpose of it is to ensure maximum > resource utilization. In that case, should the JobMaster be allowed to > re-use a slot it if the task requires less resources than the slot > provides, or should it always request a new slot that exactly matches? > > There is a trade-off to be made between maximum resource utilization > (request exactly matching slots, and only re-use exact matches) and quicker > job deployment (re-use slot even if they don't exactly match, skip > round-trip to RM). > > As for how to handle the lack of a preemptively known SlotIDs, that should > be fine in and of itself; we already handle a similar case when we request > a new TaskExecutor to be started. So long as there is some way to know how > many resources the TaskExecutor has in total I do not see a problem at the > moment. We will get the SlotID eventually by virtue of the heartbeat > SlotReport. > Implementation plan (SlotManager) > You are on the right track. The SlotManager tracks the declared resource > requirements, and if the requirements increased it creates a SlotRequest, > which then goes through similar code paths as we have at the moment (try to > find a free slot, if found tell the TM, otherwise try to request new TM). > The SlotManager changes are not that substantial to get a working version; > we have a PoC and most of the work went into refactoring the SlotManager > into a more manageable state. (split into several components, stricter and > simplified Slot life-cycle, ...). > Offer/free slots between JM/TM > Gotta run, but that's a good question and I'll think about. But I think it > comes down to making less changes, and being able to leverage existing > reconciliation protocols. > Do note that TaskExecutor also explicitly inform the RM about freed slots; > the heartbeat slot report is just a safety net. > I'm not sure whether slot requests are able to overtake a slot release; > @till do you have thoughts on that? > As for the race condition between the requirements reduction and slot > release, if we run into problems we have the backup plan of only releasing > the slot after the requirement reduction has been acknowledged. > > On 26/08/2020 10:31, Xintong Song wrote: > > Thanks for preparing the FLIP and driving this discussion, @Chesnay & @Till. > > I really like the idea. I see a great value in the proposed declarative > resource management, in terms of flexibility, usability and efficiency. > > I have a few comments and questions regarding the FLIP design. In general, > the protocol design makes good sense to me. My main concern is that it is > not very clear to me what changes are required from the > Resource/SlotManager side to adapt to the new protocol. > > *1. Distributed slots
Re: [ANNOUNCE] Apache Flink 1.10.2 released
Thanks ZhuZhu for managing this release and everyone else who contributed to this release! Best, Congxian Xingbo Huang 于2020年8月26日周三 下午1:53写道: > Thanks Zhu for the great work and everyone who contributed to this release! > > Best, > Xingbo > > Guowei Ma 于2020年8月26日周三 下午12:43写道: > >> Hi, >> >> Thanks a lot for being the release manager Zhu Zhu! >> Thanks everyone contributed to this! >> >> Best, >> Guowei >> >> >> On Wed, Aug 26, 2020 at 11:18 AM Yun Tang wrote: >> >>> Thanks for Zhu's work to manage this release and everyone who >>> contributed to this! >>> >>> Best, >>> Yun Tang >>> >>> From: Yangze Guo >>> Sent: Tuesday, August 25, 2020 14:47 >>> To: Dian Fu >>> Cc: Zhu Zhu ; dev ; user < >>> u...@flink.apache.org>; user-zh >>> Subject: Re: [ANNOUNCE] Apache Flink 1.10.2 released >>> >>> Thanks a lot for being the release manager Zhu Zhu! >>> Congrats to all others who have contributed to the release! >>> >>> Best, >>> Yangze Guo >>> >>> On Tue, Aug 25, 2020 at 2:42 PM Dian Fu wrote: >>> > >>> > Thanks ZhuZhu for managing this release and everyone else who >>> contributed to this release! >>> > >>> > Regards, >>> > Dian >>> > >>> > 在 2020年8月25日,下午2:22,Till Rohrmann 写道: >>> > >>> > Great news. Thanks a lot for being our release manager Zhu Zhu and to >>> all others who have contributed to the release! >>> > >>> > Cheers, >>> > Till >>> > >>> > On Tue, Aug 25, 2020 at 5:37 AM Zhu Zhu wrote: >>> >> >>> >> The Apache Flink community is very happy to announce the release of >>> Apache Flink 1.10.2, which is the first bugfix release for the Apache Flink >>> 1.10 series. >>> >> >>> >> Apache Flink® is an open-source stream processing framework for >>> distributed, high-performing, always-available, and accurate data streaming >>> applications. >>> >> >>> >> The release is available for download at: >>> >> https://flink.apache.org/downloads.html >>> >> >>> >> Please check out the release blog post for an overview of the >>> improvements for this bugfix release: >>> >> https://flink.apache.org/news/2020/08/25/release-1.10.2.html >>> >> >>> >> The full release notes are available in Jira: >>> >> >>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12347791 >>> >> >>> >> We would like to thank all contributors of the Apache Flink community >>> who made this release possible! >>> >> >>> >> Thanks, >>> >> Zhu >>> > >>> > >>> >>
[jira] [Created] (FLINK-19053) Flink Kafka Connector Dependency Error
liugh created FLINK-19053: - Summary: Flink Kafka Connector Dependency Error Key: FLINK-19053 URL: https://issues.apache.org/jira/browse/FLINK-19053 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 1.11.1 Environment: Flink V1.11.0 or Flink V1.11.1 Reporter: liugh Attachments: 0.10.png, 0.11.png >From the Flink doucumention >URL:[https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/connectors/kafka.html] We can see the "Dependency" section as follows: {{}} {{ org.apache.flink}} {{ flink-connector-kafka-{color:#FF}*011*{color}_2.111.11.0}} {{}} {{}} {{}} {{ org.apache.flink}} {{ flink-connector-kafka-{color:#FF}*010*{color}_2.111.11.0}} {{}} {{}} {{However,I couldn't get the correct jar in the pom.xml which was configured the dependency as shown above.}} Then{{ I searched it in the [https://mvnrepository.com/] and aliyun maven,I found the dependency should be as follows:}} {{}} {{}} {{}} {{ org.apache.flink}} {{}} {{ flink-connector-kafka-{color:#FF}*0.11*{color}_2.111.11.0}} {{}} {{}} {{}} {{}} {{ org.apache.flink}} {{ flink-connector-kafka-{color:#FF}*0.10*{color}_2.111.11.0}} {{}} {{}} {{}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [ANNOUNCE] Apache Flink 1.10.2 released
Thanks ZhuZhu for being the release manager and everyone who contributed to this release. Best, Leonard
Re: [ANNOUNCE] Apache Flink 1.10.2 released
Thanks ZhuZhu for managing this release and everyone who contributed to this. Best, Pengcheng 在 2020/8/26 下午7:06,“Congxian Qiu” 写入: Thanks ZhuZhu for managing this release and everyone else who contributed to this release! Best, Congxian Xingbo Huang 于2020年8月26日周三 下午1:53写道: > Thanks Zhu for the great work and everyone who contributed to this release! > > Best, > Xingbo > > Guowei Ma 于2020年8月26日周三 下午12:43写道: > >> Hi, >> >> Thanks a lot for being the release manager Zhu Zhu! >> Thanks everyone contributed to this! >> >> Best, >> Guowei >> >> >> On Wed, Aug 26, 2020 at 11:18 AM Yun Tang wrote: >> >>> Thanks for Zhu's work to manage this release and everyone who >>> contributed to this! >>> >>> Best, >>> Yun Tang >>> >>> From: Yangze Guo >>> Sent: Tuesday, August 25, 2020 14:47 >>> To: Dian Fu >>> Cc: Zhu Zhu ; dev ; user < >>> u...@flink.apache.org>; user-zh >>> Subject: Re: [ANNOUNCE] Apache Flink 1.10.2 released >>> >>> Thanks a lot for being the release manager Zhu Zhu! >>> Congrats to all others who have contributed to the release! >>> >>> Best, >>> Yangze Guo >>> >>> On Tue, Aug 25, 2020 at 2:42 PM Dian Fu wrote: >>> > >>> > Thanks ZhuZhu for managing this release and everyone else who >>> contributed to this release! >>> > >>> > Regards, >>> > Dian >>> > >>> > 在 2020年8月25日,下午2:22,Till Rohrmann 写道: >>> > >>> > Great news. Thanks a lot for being our release manager Zhu Zhu and to >>> all others who have contributed to the release! >>> > >>> > Cheers, >>> > Till >>> > >>> > On Tue, Aug 25, 2020 at 5:37 AM Zhu Zhu wrote: >>> >> >>> >> The Apache Flink community is very happy to announce the release of >>> Apache Flink 1.10.2, which is the first bugfix release for the Apache Flink >>> 1.10 series. >>> >> >>> >> Apache Flink® is an open-source stream processing framework for >>> distributed, high-performing, always-available, and accurate data streaming >>> applications. >>> >> >>> >> The release is available for download at: >>> >> https://flink.apache.org/downloads.html >>> >> >>> >> Please check out the release blog post for an overview of the >>> improvements for this bugfix release: >>> >> https://flink.apache.org/news/2020/08/25/release-1.10.2.html >>> >> >>> >> The full release notes are available in Jira: >>> >> >>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12347791 >>> >> >>> >> We would like to thank all contributors of the Apache Flink community >>> who made this release possible! >>> >> >>> >> Thanks, >>> >> Zhu >>> > >>> > >>> >>
[jira] [Created] (FLINK-19054) KafkaTableITCase.testKafkaSourceSink hangs
Dian Fu created FLINK-19054: --- Summary: KafkaTableITCase.testKafkaSourceSink hangs Key: FLINK-19054 URL: https://issues.apache.org/jira/browse/FLINK-19054 Project: Flink Issue Type: Bug Components: Connectors / Kafka, Table SQL / API Affects Versions: 1.11.2 Reporter: Dian Fu [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=5844&view=logs&j=c5f0071e-1851-543e-9a45-9ac140befc32&t=684b1416-4c17-504e-d5ab-97ee44e08a20] {code} 2020-08-25T09:04:57.3569768Z "Kafka Fetcher for Source: KafkaTableSource(price, currency, log_date, log_time, log_ts) -> SourceConversion(table=[default_catalog.default_database.kafka, source: [KafkaTableSource(price, currency, log_date, log_time, log_ts)]], fields=[price, currency, log_date, log_time, log_ts]) -> Calc(select=[(price + 1.0:DECIMAL(2, 1)) AS computed-price, price, currency, log_date, log_time, log_ts, (log_ts + 1000:INTERVAL SECOND) AS ts]) -> WatermarkAssigner(rowtime=[ts], watermark=[ts]) -> Calc(select=[ts, log_date, log_time, CAST(ts) AS ts0, price]) (1/1)" #1501 daemon prio=5 os_prio=0 tid=0x7f25b800 nid=0x22b8 runnable [0x7f2127efd000] 2020-08-25T09:04:57.3571373Zjava.lang.Thread.State: RUNNABLE 2020-08-25T09:04:57.3571672Zat sun.nio.ch.FileDispatcherImpl.read0(Native Method) 2020-08-25T09:04:57.3572191Zat sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) 2020-08-25T09:04:57.3572921Zat sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) 2020-08-25T09:04:57.3573419Zat sun.nio.ch.IOUtil.read(IOUtil.java:197) 2020-08-25T09:04:57.3573957Zat sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:377) 2020-08-25T09:04:57.3574809Z- locked <0xfde5a308> (a java.lang.Object) 2020-08-25T09:04:57.3575448Zat org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:103) 2020-08-25T09:04:57.3576309Zat org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:117) 2020-08-25T09:04:57.3577086Zat org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:424) 2020-08-25T09:04:57.3577727Zat org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:385) 2020-08-25T09:04:57.3578403Zat org.apache.kafka.common.network.Selector.attemptRead(Selector.java:651) 2020-08-25T09:04:57.3579486Zat org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:572) 2020-08-25T09:04:57.3580240Zat org.apache.kafka.common.network.Selector.poll(Selector.java:483) 2020-08-25T09:04:57.3580880Zat org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:547) 2020-08-25T09:04:57.3581756Zat org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262) 2020-08-25T09:04:57.3583015Zat org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233) 2020-08-25T09:04:57.3583847Zat org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1300) 2020-08-25T09:04:57.3584555Zat org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1240) 2020-08-25T09:04:57.3585197Zat org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1168) 2020-08-25T09:04:57.3585961Zat org.apache.flink.streaming.connectors.kafka.internal.KafkaConsumerThread.run(KafkaConsumerThread.java:253) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-19055) MemoryManagerSharedResourcesTest contains three tests running extraordinary long
Matthias created FLINK-19055: Summary: MemoryManagerSharedResourcesTest contains three tests running extraordinary long Key: FLINK-19055 URL: https://issues.apache.org/jira/browse/FLINK-19055 Project: Flink Issue Type: Improvement Components: Runtime / Coordination Affects Versions: 1.11.1 Reporter: Matthias The following tests are taking quite some time to succeed: * {{MemoryManagerSharedResourcesTest::testLastReleaseReleasesMemory}} * {{MemoryManagerSharedResourcesTest::testPartialReleaseDoesNotDisposeResource}} * {{MemoryManagerSharedResourcesTest::testPartialReleaseDoesNotReleaseMemory}} This is due to the fact that {{MemoryManager::verifyEmpty()}} causes exponentially-increasing sleep times in {{UnsafeMemoryBudget::reserveMemory}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] FLIP-131: Consolidate the user-facing Dataflow SDKs/APIs (and deprecate the DataSet API)
I added more changes to the FLIP to try and address comments. You can see the changes from the last version here: https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=158866741&selectedPageVersions=31&selectedPageVersions=27 If no-one objects anymore I would like to proceed to a VOTE soon. Best, Aljoscha On 30.07.20 17:36, Aljoscha Krettek wrote: I see, we actually have some thoughts along that line as well. We have ideas about adding such functionality for `Transformation`, which is the graph structure that underlies both the DataStream API and the newer Table API Runner/Planner. There a very rough PoC for that available at [1]. It's a very contrived example but it shows off what would be possible. The `Sink` interface here is just a subclass of the general `TransformationApply` [2] and we could envision a `DataStream.apply()` that let's you apply these general transformation "bundles". Keep in mind that this is just rough early ideas and the naming/location of things is somewhat rough. And we might not do it like this in the end. Best, Aljoscha [1] https://github.com/aljoscha/flink/blob/poc-transform-apply-sink/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/windowing/SinkExample.java [2] https://github.com/aljoscha/flink/blob/poc-transform-apply-sink/flink-core/src/main/java/org/apache/flink/api/dag/TransformationApply.java On 30.07.20 17:26, Flavio Pompermaier wrote: We use runCustomOperation to group a set of operators and into a single functional unit, just to make the code more modular.. It's very comfortable indeed. On Thu, Jul 30, 2020 at 5:20 PM Aljoscha Krettek wrote: That is good input! I was not aware that anyone was actually using `runCustomOperation()`. Out of curiosity, what are you using that for? We have definitely thought about the first two points you mentioned, though. Especially processing-time will make it tricky to define unified execution semantics. Best, Aljoscha On 30.07.20 17:10, Flavio Pompermaier wrote: I just wanted to be propositive about missing api.. :D On Thu, Jul 30, 2020 at 4:29 PM Seth Wiesman wrote: +1 Its time to drop DataSet Flavio, those issues are expected. This FLIP isn't just to drop DataSet but to also add the necessary enhancements to DataStream such that it works well on bounded input. On Thu, Jul 30, 2020 at 8:49 AM Flavio Pompermaier < pomperma...@okkam.it> wrote: Just to contribute to the discussion, when we tried to do the migration we faced some problems that could make migration quite difficult. 1 - It's difficult to test because of https://issues.apache.org/jira/browse/FLINK-18647 2 - missing mapPartition 3 - missing DataSet runOperation(CustomUnaryOperation operation) On Thu, Jul 30, 2020 at 12:40 PM Arvid Heise wrote: +1 of getting rid of the DataSet API. Is DataStream#iterate already superseding DataSet iterations or would that also need to be accounted for? In general, all surviving APIs should also offer a smooth experience for switching back and forth. On Thu, Jul 30, 2020 at 9:39 AM Márton Balassi < balassi.mar...@gmail.com> wrote: Hi All, Thanks for the write up and starting the discussion. I am in favor of unifying the APIs the way described in the FLIP and deprecating the DataSet API. I am looking forward to the detailed discussion of the changes necessary. Best, Marton On Wed, Jul 29, 2020 at 12:46 PM Aljoscha Krettek < aljos...@apache.org> wrote: Hi Everyone, my colleagues (in cc) and I would like to propose this FLIP for discussion. In short, we want to reduce the number of APIs that we have by deprecating the DataSet API. This is a big step for Flink, that's why I'm also cross-posting this to the User Mailing List. FLIP-131: http://s.apache.org/FLIP-131 I'm posting the introduction of the FLIP below but please refer to the document linked above for the full details: -- Flink provides three main SDKs/APIs for writing Dataflow Programs: Table API/SQL, the DataStream API, and the DataSet API. We believe that this is one API too many and propose to deprecate the DataSet API in favor of the Table API/SQL and the DataStream API. Of course, this is easier said than done, so in the following, we will outline why we think that having too many APIs is detrimental to the project and community. We will then describe how we can enhance the Table API/SQL and the DataStream API to subsume the DataSet API's functionality. In this FLIP, we will not describe all the technical details of how the Table API/SQL and DataStream will be enhanced. The goal is to achieve consensus on the idea of deprecating the DataSet API. There will have to be follow-up FLIPs that describe the necessary changes for the APIs that we maintain. -- Please let us know if you have any concerns or comments. Also, please keep discussion to this ML thread instead of commenting in the Wiki so that we
Re: [ANNOUNCE] Introducing the GSoD 2020 Participants.
Hi all, It's a great opportunity to get to work with you guys. I have always admired Flink's performance and simplicity and have been looking forward to contribute more. Looking forward to exciting next 3 months. Regards, Kartik On Wed, 26 Aug 2020, 14:42 Marta Paes Moreira, wrote: > Hi, Everyone! > > I'd like to officially welcome the applicants that were selected to work > with the Flink community for this year's Google Season of Docs (GSoD) [1]: > *Kartik > Khare* and *Muhammad Haseeb Asif*! > >- Kartik [2] is a software engineer at Walmart Labs and a regular >contributor to multiple Apache projects. He is also a prolific writer on >Medium and has previously published on the Flink blog. Last year, he >contributed to Apache Airflow as part of GSoD and he's currently revamping >the Apache Pinot documentation. > > >- Muhammad [3] is a dual degree master student at KTH and TU Berlin, >focusing on distributed systems and data intensive processing (in >particular, performance optimization of state backends). He writes >frequently about Flink on Medium and you can catch him and his colleague >Sruthi this Friday at Beam Summit [4]! > > They will be working to improve the Table API/SQL documentation over a > 3-month period, with the support of Aljoscha and Seth as mentors. > > Please give them a warm welcome to the Flink developer community! > > Marta > > [1] https://developers.google.com/season-of-docs/docs/participants > [2] https://github.com/KKcorps > [3] https://www.linkedin.com/in/haseebasif/ > [4] https://2020.beamsummit.org/sessions/nexmark-beam-flinkndb/ >
Re: [ANNOUNCE] Introducing the GSoD 2020 Participants.
Welcome Kartik & Muhammad! Looking very much forward to your contributions :) On Wed, Aug 26, 2020 at 5:52 PM Kartik Khare wrote: > Hi all, > It's a great opportunity to get to work with you guys. I have always > admired Flink's performance and simplicity and have been looking forward to > contribute more. > > Looking forward to exciting next 3 months. > > Regards, > Kartik > > On Wed, 26 Aug 2020, 14:42 Marta Paes Moreira, > wrote: > > > Hi, Everyone! > > > > I'd like to officially welcome the applicants that were selected to work > > with the Flink community for this year's Google Season of Docs (GSoD) > [1]: *Kartik > > Khare* and *Muhammad Haseeb Asif*! > > > >- Kartik [2] is a software engineer at Walmart Labs and a regular > >contributor to multiple Apache projects. He is also a prolific writer > on > >Medium and has previously published on the Flink blog. Last year, he > >contributed to Apache Airflow as part of GSoD and he's currently > revamping > >the Apache Pinot documentation. > > > > > >- Muhammad [3] is a dual degree master student at KTH and TU Berlin, > >focusing on distributed systems and data intensive processing (in > >particular, performance optimization of state backends). He writes > >frequently about Flink on Medium and you can catch him and his > colleague > >Sruthi this Friday at Beam Summit [4]! > > > > They will be working to improve the Table API/SQL documentation over a > > 3-month period, with the support of Aljoscha and Seth as mentors. > > > > Please give them a warm welcome to the Flink developer community! > > > > Marta > > > > [1] https://developers.google.com/season-of-docs/docs/participants > > [2] https://github.com/KKcorps > > [3] https://www.linkedin.com/in/haseebasif/ > > [4] https://2020.beamsummit.org/sessions/nexmark-beam-flinkndb/ > > > -- Konstantin Knauf https://twitter.com/snntrable https://github.com/knaufk
Re: [ANNOUNCE] Introducing the GSoD 2020 Participants.
Welcome Muhammad and Kartik! Thanks a lot for helping us with improving Flink's documentation. Cheers, Till On Wed, Aug 26, 2020 at 7:32 PM Konstantin Knauf wrote: > Welcome Kartik & Muhammad! Looking very much forward to your contributions > :) > > On Wed, Aug 26, 2020 at 5:52 PM Kartik Khare > wrote: > > > Hi all, > > It's a great opportunity to get to work with you guys. I have always > > admired Flink's performance and simplicity and have been looking forward > to > > contribute more. > > > > Looking forward to exciting next 3 months. > > > > Regards, > > Kartik > > > > On Wed, 26 Aug 2020, 14:42 Marta Paes Moreira, > > wrote: > > > > > Hi, Everyone! > > > > > > I'd like to officially welcome the applicants that were selected to > work > > > with the Flink community for this year's Google Season of Docs (GSoD) > > [1]: *Kartik > > > Khare* and *Muhammad Haseeb Asif*! > > > > > >- Kartik [2] is a software engineer at Walmart Labs and a regular > > >contributor to multiple Apache projects. He is also a prolific > writer > > on > > >Medium and has previously published on the Flink blog. Last year, he > > >contributed to Apache Airflow as part of GSoD and he's currently > > revamping > > >the Apache Pinot documentation. > > > > > > > > >- Muhammad [3] is a dual degree master student at KTH and TU Berlin, > > >focusing on distributed systems and data intensive processing (in > > >particular, performance optimization of state backends). He writes > > >frequently about Flink on Medium and you can catch him and his > > colleague > > >Sruthi this Friday at Beam Summit [4]! > > > > > > They will be working to improve the Table API/SQL documentation over a > > > 3-month period, with the support of Aljoscha and Seth as mentors. > > > > > > Please give them a warm welcome to the Flink developer community! > > > > > > Marta > > > > > > [1] https://developers.google.com/season-of-docs/docs/participants > > > [2] https://github.com/KKcorps > > > [3] https://www.linkedin.com/in/haseebasif/ > > > [4] https://2020.beamsummit.org/sessions/nexmark-beam-flinkndb/ > > > > > > > > -- > > Konstantin Knauf > > https://twitter.com/snntrable > > https://github.com/knaufk >
[jira] [Created] (FLINK-19056) Investigate multipart upload performance regression
Chesnay Schepler created FLINK-19056: Summary: Investigate multipart upload performance regression Key: FLINK-19056 URL: https://issues.apache.org/jira/browse/FLINK-19056 Project: Flink Issue Type: Task Components: Runtime / REST Affects Versions: 1.12.0 Reporter: Chesnay Schepler Fix For: 1.12.0 When using Netty 4.1.50 the multipart upload of files is more than a 100 times slower in the {{FileUploadHandlerTest}}. This test has traditionally been somewhat heavy, since it repeatedly tests the upload of 60mb files. On my machine this test currently finishes in 2-3 seconds, but with the upgraded Netty version it runs for several _minutes_ instead. I have not verified yet whether this is purely an issue of the test, but I would consider it unlikely. This would make Flink effectively unusable when uploading larger jars or JobGraphs. My theore is that is due to [this|https://github.com/netty/netty/pull/10226] change in Netty. Before this change, the {{HttpPostMultipartRequestDecoder}} was always creating unpooled heap buffers for _something_; after the change the buffer type is dependent on the input buffer. The input buffer is a direct one, so my conclusion is that with the upgrade we ended up allocating more direct buffers than we did previously. One solution I found was to explicitly create an {{UnpooledByteBufAllocator}} for the {{RestServerEndpoint}} that prefers heap buffers, which results in the input buffer to be a heap buffer, and thus we are never allocating direct ones. However, this should also imply that we are creating more heap buffers than we did in the previously; I don't know how much of a problem that is. It would seem a reasonable thing to do since we at least should be able to skip a bunch of memory copies? On a somewhat related note, we could think about increasing the chunkSize from 8kb to 64kb to reduce the GC pressure a bit, along with some arenas for the REST API. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-19057) Avoid usage of GlobalConfiguration in ResourceManager
Xintong Song created FLINK-19057: Summary: Avoid usage of GlobalConfiguration in ResourceManager Key: FLINK-19057 URL: https://issues.apache.org/jira/browse/FLINK-19057 Project: Flink Issue Type: Task Components: Runtime / Coordination Reporter: Xintong Song This is a follow up of this PR [discussion|https://github.com/apache/flink/pull/13186/#discussion_r476459874]. On Kubernetes/Yarn deployments, resource manager try to compare the effective configurations with the original configuration file shipped from client, and only set the differences to dynamic properties for task managers. During which, {{GlobalConfiguration.loadConfiguration()}} is used for getting the original configuration file. The strongly relies on that Kubernetes/Yarn entry points do not support custom configuration directories, which is true at the moment but brittle in future. It would be better to rethink the usage of GlobalConfiguration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Whether catalog table factory should be used for temporary tables
Hi everyone, According to the feedback, it seems adding `isTemporary` to factory context is the preferred way to fix the issue. I'll go ahead and make the change if no one objects. On Tue, Aug 25, 2020 at 5:36 PM Rui Li wrote: > Hi, > > Thanks everyone for your inputs. > > Temporary hive table is not supported at the moment. If we want to support > it, I agree with Jingsong that the life cycle of the temporary table should > somehow be bound to the hive catalog. For example, hive catalog should be > responsible to delete the table folder when the temporary table is dropped, > or when hive catalog itself gets unregistered. Whether such a table should > be stored in metastore is another topic and open to discussion. Hive > doesn't store temporary tables in metastore, so we probably won't want to > do it either. > > I'm fine with either of the solutions proposed. I think we can add the > `isTemporary` flag to factory context, even though temporary tables > currently don't belong to a catalog. Because conceptually, whether a table > is temporary should be part of the context that a factory may be interested > in. > > On Tue, Aug 25, 2020 at 4:48 PM Jingsong Li > wrote: > >> Hi Jark, >> >> You raised a good point: Creating the Hive temporary table. >> AFAIK, Hive temporary tables should be stored in metastore, Hive >> metastore will maintain their life cycle. Correct me if I am wrong. >> >> So actually, if we want to support Hive temporary tables, we should >> finish one thing: >> - A temporary table should belong to Catalog. >> - Instead of current, a temporary table belongs to CatalogManager. >> >> It means, `createTemporaryTable` and `dropTemporaryTable` should be >> proxied into the Catalog. >> In this situation, actually, we don't need the "isTemporary" flag. (But >> we can have it too) >> >> Best, >> Jingsong >> >> On Tue, Aug 25, 2020 at 4:32 PM Jark Wu wrote: >> >>> Hi, >>> >>> I'm wondering if we always fallback to using SPI for temporary tables, >>> then >>> how does the create Hive temporary table using Hive dialect work? >>> >>> IMO, adding an "isTemporary" to the factory context sounds reasonable to >>> me, because the factory context should describe the full content of >>> create >>> table DDL. >>> >>> Best, >>> Jark >>> >>> >>> On Tue, 25 Aug 2020 at 16:01, Jingsong Li >>> wrote: >>> >>> > Hi Dawid, >>> > >>> > But the temporary table does not belong to Catalog, actually >>> > Catalog doesn't know the existence of the temporary table. Let the >>> table >>> > factory of catalog to create source/sink sounds a little sudden. >>> > >>> > If we want to make temporary tables belong to Catalog, I think we need >>> to >>> > involve catalog when creating temporary tables. >>> > >>> > Best, >>> > Jingsong >>> > >>> > On Tue, Aug 25, 2020 at 3:55 PM Dawid Wysakowicz < >>> dwysakow...@apache.org> >>> > wrote: >>> > >>> > > Hi Rui, >>> > > >>> > > My take is that temporary tables should use the factory of the >>> catalog >>> > > they were registered with. >>> > > >>> > > What you are describing sounds very much like a limitation/bug in >>> Hive >>> > > catalog only. I'd be in favor of passing the *isTemporary* flag. >>> > > >>> > > Best, >>> > > >>> > > Dawid >>> > > >>> > > On 25/08/2020 09:37, Rui Li wrote: >>> > > > Hi Dev, >>> > > > >>> > > > Currently temporary generic tables cannot work with hive catalog >>> [1]. >>> > > When >>> > > > hive catalog is chosen as the current catalog, planner will use >>> > > > HiveTableFactory to create source/sink for the temporary >>> > > > table. HiveTableFactory cannot tell whether a table is temporary or >>> > not, >>> > > > and considers it as a Hive table, which leads to job failure. >>> > > > I've discussed with Jingsong offline and we believe one solution >>> is to >>> > > make >>> > > > planner avoid using catalog table factory for temporary tables. >>> But I'd >>> > > > also like to hear more opinions from others whether this is the >>> right >>> > way >>> > > > to go. I think a possible alternative is to add an *isTemporary* >>> field >>> > > > to TableSourceFactory.Context & TableSinkFactory.Context, so that >>> > > > HiveTableFactory knows how to handle such tables. What do you >>> think? >>> > > > >>> > > > [1] https://issues.apache.org/jira/browse/FLINK-18999 >>> > > > >>> > > >>> > > >>> > >>> > -- >>> > Best, Jingsong Lee >>> > >>> >> >> >> -- >> Best, Jingsong Lee >> > > > -- > Best regards! > Rui Li > -- Best regards! Rui Li
Re: [ANNOUNCE] Introducing the GSoD 2020 Participants.
Welcome Kartik and Muhammad! Thanks in advance for helping improve Flink documentation. Best, Jark On Thu, 27 Aug 2020 at 03:59, Till Rohrmann wrote: > Welcome Muhammad and Kartik! Thanks a lot for helping us with improving > Flink's documentation. > > Cheers, > Till > > On Wed, Aug 26, 2020 at 7:32 PM Konstantin Knauf > wrote: > > > Welcome Kartik & Muhammad! Looking very much forward to your > contributions > > :) > > > > On Wed, Aug 26, 2020 at 5:52 PM Kartik Khare > > wrote: > > > > > Hi all, > > > It's a great opportunity to get to work with you guys. I have always > > > admired Flink's performance and simplicity and have been looking > forward > > to > > > contribute more. > > > > > > Looking forward to exciting next 3 months. > > > > > > Regards, > > > Kartik > > > > > > On Wed, 26 Aug 2020, 14:42 Marta Paes Moreira, > > > wrote: > > > > > > > Hi, Everyone! > > > > > > > > I'd like to officially welcome the applicants that were selected to > > work > > > > with the Flink community for this year's Google Season of Docs (GSoD) > > > [1]: *Kartik > > > > Khare* and *Muhammad Haseeb Asif*! > > > > > > > >- Kartik [2] is a software engineer at Walmart Labs and a regular > > > >contributor to multiple Apache projects. He is also a prolific > > writer > > > on > > > >Medium and has previously published on the Flink blog. Last year, > he > > > >contributed to Apache Airflow as part of GSoD and he's currently > > > revamping > > > >the Apache Pinot documentation. > > > > > > > > > > > >- Muhammad [3] is a dual degree master student at KTH and TU > Berlin, > > > >focusing on distributed systems and data intensive processing (in > > > >particular, performance optimization of state backends). He writes > > > >frequently about Flink on Medium and you can catch him and his > > > colleague > > > >Sruthi this Friday at Beam Summit [4]! > > > > > > > > They will be working to improve the Table API/SQL documentation over > a > > > > 3-month period, with the support of Aljoscha and Seth as mentors. > > > > > > > > Please give them a warm welcome to the Flink developer community! > > > > > > > > Marta > > > > > > > > [1] https://developers.google.com/season-of-docs/docs/participants > > > > [2] https://github.com/KKcorps > > > > [3] https://www.linkedin.com/in/haseebasif/ > > > > [4] https://2020.beamsummit.org/sessions/nexmark-beam-flinkndb/ > > > > > > > > > > > > > -- > > > > Konstantin Knauf > > > > https://twitter.com/snntrable > > > > https://github.com/knaufk > > >
[jira] [Created] (FLINK-19058) Dataset transformations document page error
Fin-chan created FLINK-19058: Summary: Dataset transformations document page error Key: FLINK-19058 URL: https://issues.apache.org/jira/browse/FLINK-19058 Project: Flink Issue Type: Bug Components: API / DataSet, Documentation Affects Versions: 1.11.0 Reporter: Fin-chan In OuterJoin with Flat-Join Function Java: public void join(Tuple2 movie, Rating rating Collector> out) Missing "," -- This message was sent by Atlassian Jira (v8.3.4#803005)