Re: Need suggestion on Flink-Kafka stream processing design

2020-05-13 Thread Arvid Heise
Hi Hemant, what you described is an aggregation. You aggregate 15 small records into one large record. The concept of aggregation goes beyond pure numeric operations; for example, forming one large string with CONCAT is also a type of aggregation. In your case, I'd still follow my general outline

Re: 回复:Re: Writing _SUCCESS Files (Streaming and Batch)

2020-05-13 Thread Guowei Ma
Hi, Theo Thank you for sharing your solution. >From your description, it seems that what you need is a listener that could notify the state change of the partition/bucket: created/updated/closed. (maybe you don't need the close notify). I am not familiar with Impala. So what I want to know is why y

Re: Flink Metrics in kubernetes

2020-05-13 Thread Averell
Hi Gary, Sorry for the false alarm. It's caused by a bug in my deployment - no metrics were added into the registry. Sorry for wasting your time. Thanks and best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Flink consuming rate increases slowly

2020-05-13 Thread Dawid Wysakowicz
Hi Eyal, Could you explain your job a bit more? Did you increase the parallelism of your job? What does it do? Does it perform any time based operations? How do you measure the processing rate? Best, Dawid On 10/05/2020 21:18, Chen Qin wrote: > Hi Eyal, > > It’s unclear what warmup phase does i

Re: Flink BLOB server port exposed externally

2020-05-13 Thread Dawid Wysakowicz
Hi Omar, Theoretically I think it could be possible to change the address on which the BlobServer runs (even to localhost). There is no configuration option for it now and the BlobServer always binds to the wildcard. One important aspect to consider here is that the BlobServer must be accessible f

Re: 回复:Re: Writing _SUCCESS Files (Streaming and Batch)

2020-05-13 Thread Theo Diefenthal
Hi Guowei, Impala is a database that can execute fast SQL Queries on parquet data. It has its own small metadata store for the parquet-tables created in there. In that store, it remembers the .parquet files in each partition and also stores some statistics like number of rows and so on. If I

Re: Statefun 2.0 questions

2020-05-13 Thread Wouter Zorgdrager
Dear Igal, all, Thanks a lot. This is very helpful. I understand the architecture a bit more now. We can just scale the stateful functions and put a load balancer in front and Flink will contact them. The only part of the scaling I don't understand yet is how to scale the 'Flink side'. So If I und

Re: Register time attribute while converting a DataStream to Table

2020-05-13 Thread Dawid Wysakowicz
Hi, Unfortunately support for consuming upsert stream is not supported yet. It's not as easy as adding the type information there as you suggested. Even if you do that it will still be considered to be an append message internally by the planner. There is an ongoing effort (FLIP-95[1]) to support

Re: Prometheus Pushgateway Reporter Can not DELETE metrics on pushgateway

2020-05-13 Thread Yun Tang
Hi >From our experience, instead of offering more resource for Prometheus >push-gateway and servers. We could leverage Flink' feature to avoid sending >unnecessary data (especially high-dimension tags, e,g task_attempt_id) after >Flink-1.10. In general, we could exclude >"operator_id;task_id;t

Re: Incremental state with purging

2020-05-13 Thread Yun Tang
Hi >From your description: "output the state every y seconds and remove old >elements", I think TTL [1] is the proper solution for your scenario. And you >could define the ttl of your state as y seconds so that processfunction could >only print elements in the last y seconds. [1] https://ci.

Re: Flink restart strategy on specific exception

2020-05-13 Thread Till Rohrmann
Yes, you are right Zhu Zhu. Extending the RestartBackoffTimeStrategyFactoryLoader to also load custom RestartBackoffTimeStrategies sound like a good improvement for the future. @Ken Krugler , the old RestartStrategy interface did not provide the cause of the failure, unfortunately. Cheers, Till

Re: Flink Streaming Job Tuning help

2020-05-13 Thread Senthil Kumar
Zhijiang, Thanks for your suggestions. We will keep it in mind! Kumar From: Zhijiang Reply-To: Zhijiang Date: Tuesday, May 12, 2020 at 10:10 PM To: Senthil Kumar , "user@flink.apache.org" Subject: Re: Flink Streaming Job Tuning help Hi Kumar, I can give some general ideas for further anal

Re: Flink Memory analyze on AWS EMR

2020-05-13 Thread Jacky D
Hi, Xintong Thanks for point it out, after I set up the log path it's working now . so , for conclusion . on emr , to set up jitwatch in flink-conf.yaml, we should not include quotes and give a path to output the jit log file . this is different from setting it on standalone cluster . example : e

[ANNOUNCE] Apache Flink 1.10.1 released

2020-05-13 Thread Yu Li
The Apache Flink community is very happy to announce the release of Apache Flink 1.10.1, which is the first bugfix release for the Apache Flink 1.10 series. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming a

Re: changing the output files names in Streamfilesink from part-00 to something else

2020-05-13 Thread dhurandar S
Yes we looked at it , The problem is the file name gets generated in a dynamic fashion, based on which organization data we are getting we generate the file name from the coming data. Is there any way we can achieve this ?? On Tue, May 12, 2020 at 8:38 PM Yun Gao wrote: > Hi Dhurandar: > >

[CVE-2020-1960] Apache Flink JMX information disclosure vulnerability

2020-05-13 Thread Chesnay Schepler
CVE-2020-1960: Apache Flink JMX information disclosure vulnerability Severity: Medium (CVSS:3.1/AV:L/AC:H/PR:L/UI:N/S:U/C:H/I:L/A:H) Vendor: The Apache Software Foundation Versions Affected: Flink 1.1.0 to 1.1.5 Flink 1.2.0 to 1.2.1 Flink 1.3.0 to 1.3.3 Flink 1.4.0 to 1.4.2 Flink 1.5.0 to 1.5.6

Flink Key based watermarks with event time

2020-05-13 Thread Gnanasoundari Soundarajan
Hi all, We have a requirement where we need to maintain key based watermarks with event time. Each sensor will communicate with different timestamp where we need to maintain watermark separately for each sensor. Is this requirement can be achieved with Flink? Thanks. Regards, Gnana

Watermarks and parallelism

2020-05-13 Thread Gnanasoundari Soundarajan
Hi all, I have below queries in flink. Could anyone help me to understand? Query: 1 Is watermark maintained globally at the operator level? 2 When we have a keyByOperator with parallelism >1, is there a single watermark maintained across all the parallel subtasks or for each of the parallel s

Re: Broadcast state vs data enrichment

2020-05-13 Thread Manas Kale
I see, thank you Roman! On Tue, May 12, 2020 at 4:59 PM Khachatryan Roman < khachatryan.ro...@gmail.com> wrote: > Thanks for the clarification. > > Apparently, the second option (with enricher) creates more load by adding > configuration to every event. Unless events are much bigger than the > co

Re: changing the output files names in Streamfilesink from part-00 to something else

2020-05-13 Thread Jingsong Li
Hi, Dhurandar, Can you describe your needs? Why do you need to modify file names flexibly? What kind of name do you want? Best, Jingsong Lee On Thu, May 14, 2020 at 2:05 AM dhurandar S wrote: > Yes we looked at it , > The problem is the file name gets generated in a dynamic fashion, based on >

Re: changing the output files names in Streamfilesink from part-00 to something else

2020-05-13 Thread Sivaprasanna
Hi Just shooting away my thoughts. Based on your what you had described so far, I think your objective is to have some unique way to identify/filter the output based on the organization. If that's the case, you can implement a BucketAssigner with the logic to create a bucket key based on the organ