[jira] [Created] (FLINK-27729) Support constructing StartCursor and StopCursor from MessageId
Dian Fu created FLINK-27729: --- Summary: Support constructing StartCursor and StopCursor from MessageId Key: FLINK-27729 URL: https://issues.apache.org/jira/browse/FLINK-27729 Project: Flink Issue Type: Improvement Components: API / Python, Connectors / Pulsar Reporter: Dian Fu Fix For: 1.16.0 Currently, StartCursor.fromMessageId and StopCursor.fromMessageId are still not supported in Python pulsar connectors. I think we could leverage the [pulsar Python library|https://pulsar.apache.org/api/python/#pulsar.MessageId] to implement these methods. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (FLINK-27730) Kafka connector document code sink has an error
liuwei created FLINK-27730: -- Summary: Kafka connector document code sink has an error Key: FLINK-27730 URL: https://issues.apache.org/jira/browse/FLINK-27730 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 1.14.4 Environment: Flink 1.14.4 Reporter: liuwei Fix For: 1.14.4 Attachments: kafka-sink.png Kafka Sink document sample code API call error. -- This message was sent by Atlassian Jira (v8.20.7#820007)
Request for Review: FLINK-27507 and FLINK-27509
Hi Everyone, I am not sure who to reach out for the reviews of these changesets, so I am putting this on the mailing list here. I have raised the review for FLINK-27507 - https://github.com/apache/flink-playgrounds/pull/30 FLINK-27509 - https://github.com/apache/flink-playgrounds/pull/29 I would appreciate it if somebody can review these change sets as I want to make the similar changes for 1.15 version after that. Let me know if you need more information regarding this PR, I would be happy to connect with you and explain the changes. Thanks, Shubham Bansal
Contributing to Apache Flink
Hi Everyone, I have been looking at some starter tagged JIRA in the last couple of weeks. https://issues.apache.org/jira/browse/FLINK-27506 https://issues.apache.org/jira/browse/FLINK-27508 https://issues.apache.org/jira/browse/FLINK-27507 https://issues.apache.org/jira/browse/FLINK-27509 One of the changes is checked in, and others are in review. I wanted to be more involved in the code contribution process and would be happy to look at non-starter issues which can give me more insight into the codebase. Eventually wanting to move to core components over time. My Background: I have worked in backend API, Infrastructure, Cloud Storage and System Security for the past 6 years. I skimmed through different components on the JIRA and found the following to be very interesting. Runtime / Coordination Runtime / Queryable State Runtime / State Backends Runtime/ Task Stateful functions Table SQL / Planner Table SQL/ Runtime Table Store FileSystems Library / CEP Library / Graph Processing Runtime / Checkpointing I would be happy to start somewhere in any of the above areas or anything which you think would be more suitable. Please let me know. Thanks, Shubham
Re: Flatmap node at 100%
Hi Yuxia, I did increase the parallelism to 16 but that is causing memory overflowing issues. Task manager heap memory collapses after a certain point when the job has run. I'm attaching the metrics, the flatmap converts jsons and parses them to comma separated strings. Could you suggest how to optimize it? [image: image.png] On Fri, May 20, 2022 at 2:39 PM yuxia wrote: > HI, I think you can increase the parallelism of the flat map operator. For > SQL job, you can refer the doc[1] to set parallelism. For datastream job, > you can set the parallelism in your code. > > > Also, if possible, you can try optimize your code in the flatmap node . > > [1]: > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/config/#table-exec-resource-default-parallelism > > Best regards, > Yuxia > > -- > *发件人: *"Zain Haider Nemati" > *收件人: *"User" , "dev" > *发送时间: *星期五, 2022年 5 月 20日 下午 4:51:14 > *主题: *Flatmap node at 100% > > Hi, > Im seeing this behaviour in my flink job, what can I do to remove this > bottleneck > > [image: image.png] > >
Re: Job Logs - Yarn Application Mode
Hi, Thanks for your response folks. Is it possible to access logs via flink UI in yarn application mode? Similar to how we can access them in standalone mode On Fri, May 20, 2022 at 11:06 AM Shengkai Fang wrote: > Thanks for Biao's explanation. > > Best, > Shengkai > > Biao Geng 于2022年5月20日周五 11:16写道: > >> Hi there, >> @Zain, Weihua's suggestion should be able to fulfill the request to check >> JM logs. If you do want to use YARN cli for running Flink applications, it >> is possible to check JM's log with the YARN command like: >> *yarn logs -applicationId application_xxx_yyy -am -1 -logFiles >> jobmanager.log* >> For TM log, command would be like: >> * yarn logs -applicationId -containerId >> -logFiles taskmanager.log* >> Note, it is not super easy to find the container id of TM. Some >> workaround would be to check JM's log first and get the container id for TM >> from that. You can also learn more about the details of above commands from >> *yarn >> logs -help* >> >> @Shengkai, yes, you are right the actual JM address is managed by YARN. >> To access the JM launched by YARN, users need to access YARN web ui to find >> the YARN application by applicationId and then click 'application master >> url' of that application to be redirected to Flink web ui. >> >> Best, >> Biao Geng >> >> Shengkai Fang 于2022年5月20日周五 10:59写道: >> >>> Hi. >>> >>> I am not familiar with the YARN application mode. Because the job >>> manager is started when submit the jobs. So how can users know the address >>> of the JM? Do we need to look up the Yarn UI to search the submitted job >>> with the JobID? >>> >>> Best, >>> Shengkai >>> >>> Weihua Hu 于2022年5月20日周五 10:23写道: >>> Hi, You can get the logs from Flink Web UI if job is running. Best, Weihua 2022年5月19日 下午10:56,Zain Haider Nemati 写道: Hey All, How can I check logs for my job when it is running in application mode via yarn
Task Manager Heap Memory Overflowing
Hi, I'm running a job with kafka source pulling in million of records with parallelism 16 and Im seeing heap memory on task manager overflowing after the job as ran for some time, no matter how much I increase the memory allocation. Can someone help me out in this regards? Job Stats and memory configs: [image: image.png] [image: image.png]
Json Deserialize in DataStream API with array length not fixed
Hi Folks, I have data coming in this format: { “data”: { “oid__id”: “61de4f26f01131783f162453”, “array_coordinates”:“[ { \“speed\” : \“xxx\“, \“accuracy\” : \“xxx\“, \“bearing\” : \“xxx\“, \“altitude\” : \“xxx\“, \“longitude\” : \“xxx\“, \“latitude\” : \“xxx\“, \“dateTimeStamp\” : \“xxx\“, \“_id\” : { \“$oid\” : \“xxx\” } }, { \“speed\” : \“xxx\“, \“isFromMockProvider\” : \“false\“, \“accuracy\” : \“xxx\“, \“bearing\” : \“xxx\“, \“altitude\” : \“xxx\“, \“longitude\” : \“xxx\“, \“latitude\” : \“xxx\“, \“dateTimeStamp\” : \“xxx\“, \“_id\” : { \“$oid\” : \“xxx\” } }]“, “batchId”: “xxx", “agentId”: “xxx", “routeKey”: “40042-12-01-2022", “__v”: 0 }, “metadata”: { “timestamp”:“2022-05-02T18:49:52.619827Z”, “record-type”: “data”, “operation”:“load”, “partition-key-type”: “primary-key”, “schema-name”: “xxx”, “table-name”: “xxx” } } Where length of array coordinates array varies is not fixed in the source is their any way to define a json deserializer for this? If so would really appreciate if I can get some help on this
Re: Flatmap node at 100%
Hi Your picture is not visible here. You can put it on one of the Drawing bed, can you send out the memory stack information Best JasonLee Replied Message | From | Zain Haider Nemati | | Date | 05/22/2022 13:41 | | To | yuxia | | Cc | User , dev | | Subject | Re: Flatmap node at 100% | Hi Yuxia, I did increase the parallelism to 16 but that is causing memory overflowing issues. Task manager heap memory collapses after a certain point when the job has run. I'm attaching the metrics, the flatmap converts jsons and parses them to comma separated strings. Could you suggest how to optimize it? On Fri, May 20, 2022 at 2:39 PM yuxia wrote: HI, I think you can increase the parallelism of the flat map operator. For SQL job, you can refer the doc[1] to set parallelism. For datastream job, you can set the parallelism in your code. Also, if possible, you can try optimize your code in the flatmap node . [1]: https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/config/#table-exec-resource-default-parallelism Best regards, Yuxia 发件人: "Zain Haider Nemati" 收件人: "User" , "dev" 发送时间: 星期五, 2022年 5 月 20日 下午 4:51:14 主题: Flatmap node at 100% Hi, Im seeing this behaviour in my flink job, what can I do to remove this bottleneck
[jira] [Created] (FLINK-27731) Cannot build documentation with Hugo docker image
Xintong Song created FLINK-27731: Summary: Cannot build documentation with Hugo docker image Key: FLINK-27731 URL: https://issues.apache.org/jira/browse/FLINK-27731 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 1.16.0 Reporter: Xintong Song Flink provides 2 ways for building the documentation: 1) using a Hugo docker image, and 2) using a local Hugo installation. Currently, 1) is broken due to the `setup_docs.sh` script requires a local Hugo installation. This was introduced in FLINK-27394. -- This message was sent by Atlassian Jira (v8.20.7#820007)