Re:

2020-09-16 Thread Dawid Wysakowicz
It should work the way you're describing. Can you share a reproducible example? Best, Dawid On 14/08/2020 11:38, Jaswin Shah wrote: > Hi, > > I have a coProcessFunction which emits data to same side output from > processElement1 method and on timer method.But, data is not getting > emitted to si

Re: The rpc invocation size exceeds the maximum akka framesize when the job was re submitted.

2020-09-16 Thread shravan
Hi, We are observing the same error as well with regard to "The rpc invocation size exceeds the maximum akka framesize.", and have follow-up questions on the same. Why we face this issue, how can we know the expected size for which it is failing? The error message does not indicate that. Does the

Re: [ANNOUNCE] Apache Flink 1.11.2 released

2020-09-16 Thread Jingsong Li
Thanks ZhuZhu for driving the release. Best, Jingsong On Thu, Sep 17, 2020 at 1:29 PM Zhu Zhu wrote: > The Apache Flink community is very happy to announce the release of Apache > Flink 1.11.2, which is the second bugfix release for the Apache Flink 1.11 > series. > > Apache Flink® is an open-s

[ANNOUNCE] Apache Flink 1.11.2 released

2020-09-16 Thread Zhu Zhu
The Apache Flink community is very happy to announce the release of Apache Flink 1.11.2, which is the second bugfix release for the Apache Flink 1.11 series. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming

Re: FileSystemHaServices and BlobStore

2020-09-16 Thread Yang Wang
Hi Alexey, I have created FLIP-144[1] for the native Kubernetes HA support. Please have a look if you are interested. Frankly speaking, I am not against the "StatefulSet + PV + FileSystemHAService". Maybe in the future, we could have the both in Flink. [1]. https://cwiki.apache.org/confluence/dis

Re: [DISCUSS] FLIP-144: Native Kubernetes HA for Flink

2020-09-16 Thread Yang Wang
Hi Xintong and Stephan, Thanks a lot for your attention on this FLIP. I will address the comments inline. # Architecture -> One or two ConfigMaps Both of you are right. One ConfigMap will make the design and implementation easier. Actually, in my POC codes, I am using just one ConfigMap(e.g. "k8

Re: Python UDF's in DataStream API?

2020-09-16 Thread Dian Fu
Hi Edward, Do you mean that using Python UDF in the Java DataStream API job? If so, there is still no plan to support this directly in the Java DataStream API. However, there is one way to achieve this as it supports to use Python UDF in the Java Table API, so you can do as following: - Convert

Maximum query and refresh rate for metrics from REST API

2020-09-16 Thread Piper Piper
Hello, What is the recommended way to get metrics (such as CPU, Memory and user defined meters and gauges) at the highest frequency rate (i.e. with the highest/fastest refresh rate) such as every 500 milliseconds or less? Is there any rate limiting by default on querying the REST API for metrics?

global state and single stream

2020-09-16 Thread Adam Atrea
Hi, I am pretty new to Flink and I'm trying to implement - which seems to me - a pretty basic pattern: Say I have a single stream of *Price *objects encapsulating a price value and a symbol (for example A to Z) they are emitted at a very random interval all day - could be 1 /day or once a we

Re: Flink Table SQL, Kafka, partitions and unnecessary shuffling

2020-09-16 Thread Dan Hill
Hi Dawid! I see. Yea, this would break my job after I move away from the prototype. How do other Flink devs avoid unnecessary reshuffles when sourcing data from Kafka? Is the Table API early or not used often? On Wed, Sep 16, 2020 at 12:31 PM Dawid Wysakowicz wrote: > Hi Dan, > > I am afr

Re: Flink Table SQL, Kafka, partitions and unnecessary shuffling

2020-09-16 Thread Dawid Wysakowicz
Hi Dan, I am afraid there is no mechanism to do that purely in the Table API yet. Or I am not aware of one. If the reinterpretAsKeyedStream works for you, you could use this approach and convert a DataStream (with the reinterpretAsKeyedStream applied) to a Table[1] and then continue with the Table

Re: Error: Avro "CREATE TABLE" with nested rows and missing fields

2020-09-16 Thread Dan Hill
Interesting. How does schema evolution work with Avro and Flink? E.g. adding new fields or enum values. On Wed, Sep 16, 2020 at 12:13 PM Dawid Wysakowicz wrote: > Hi Dan, > > I'd say this is a result of a few assumptions. > >1. We try to separate the concept of format from the connector. >

Re: Error: Avro "CREATE TABLE" with nested rows and missing fields

2020-09-16 Thread Dawid Wysakowicz
Hi Dan, I'd say this is a result of a few assumptions. 1. We try to separate the concept of format from the connector. Therefore we did not make too many assumption which connector does a format work with. 2. Avro needs the original schema that the incoming record was serialized wit

Python UDF's in DataStream API?

2020-09-16 Thread Edward
Is there any plan to allow Python UDF's within the DataStream API (as opposed to an entire job defined in Python)? FLIP-130 discusses Python support for the DataStream API, but it's not clear whether this will include th

Re: Flink performance testing

2020-09-16 Thread mahesh salunkhe
I would like to do performance testing for my flink job specially related with volume, how my flink job perform if more streaming data coming to my source connectors and measure benchmark for various operators? On Wed, 16 Sep 2020 at 12:03, Piotr Nowojski wrote: > Hi, > > I'm not sure what you a

Re: Flink Table SQL, Kafka, partitions and unnecessary shuffling

2020-09-16 Thread Dan Hill
Hi Piotr! Yes, that's what I'm using with DataStream. It works well in my prototype. On Wed, Sep 16, 2020 at 8:58 AM Piotr Nowojski wrote: > Hi, > > Have you seen "Reinterpreting a pre-partitioned data stream as keyed > stream" feature? [1] However I'm not sure if and how can it be integrated

Re: Unknown datum type java.time.Instant: 2020-09-15T07:00:00Z

2020-09-16 Thread Piotr Nowojski
Hi, Could it be related to https://issues.apache.org/jira/browse/FLINK-18223 ? Also maybe as a workaround, is it working if you enable object reuse (`StreamExecutionEnvironment#getConfig()#enableObjectReuse())`)? Best regards Piotrek śr., 16 wrz 2020 o 08:09 Lian Jiang napisał(a): > Hi, > > i

Re: Flink performance testing

2020-09-16 Thread Piotr Nowojski
Hi, I'm not sure what you are asking for. We do not provide benchmarks for all of the operators. We currently have a couple of micro benchmarks [1] for some of the operators, and we are also setting up some adhoc benchmarks when implementing various features. If you want to benchmark something bes

Re: Flink Table SQL, Kafka, partitions and unnecessary shuffling

2020-09-16 Thread Piotr Nowojski
Hi, Have you seen "Reinterpreting a pre-partitioned data stream as keyed stream" feature? [1] However I'm not sure if and how can it be integrated with the Table API. Maybe someone more familiar with the Table API can help with that? Piotrek [1] https://ci.apache.org/projects/flink/flink-docs-st

Re: I have a job with multiple Kafka sources. They all contain certain historical data.

2020-09-16 Thread Piotr Nowojski
Hey, If you are worried about increased amount of buffered data by the WindowOperator if watermarks/event time is not progressing uniformly across multiple sources, then there is little you can do currently. FLIP-27 [1] will allow us to address this problem in more generic way. What you can curren

Flink performance testing

2020-09-16 Thread mahesh salunkhe
Team, What are the framework I should be using for Flink End-to-end Performance Testing? I would like to test performance of each flink operators, back pressure etc

Re: BufferTimeout throughput vs latency.

2020-09-16 Thread David Anderson
The performance loss being referred to there is reduced throughput. There's a blog post by Nico Kruber [1] that covers Flink's network stack in considerable detail. The last section on latency vs throughput gives some more insight on this point. In the experiment reported on there, the difference

BufferTimeout throughput vs latency.

2020-09-16 Thread Mazen Ezzeddine
Hi all, I have read the below in the documentation : "To maximize throughput, set setBufferTimeout(-1) which will remove the timeout and buffers will only be flushed when they are full. To minimize latency, set the timeout to a value close to 0 (for example 5 or 10 ms). A buffer timeout of 0 sh

Re: [DISCUSS] FLIP-144: Native Kubernetes HA for Flink

2020-09-16 Thread Stephan Ewen
This is a very cool feature proposal. One lesson-learned from the ZooKeeper-based HA is that it is overly complicated to have the Leader RPC address in a different node than the LeaderLock. There is extra code needed to make sure these converge and the can be temporarily out of sync. A much easie

Re: [DISCUSS] FLIP-144: Native Kubernetes HA for Flink

2020-09-16 Thread Xintong Song
Thanks for preparing this FLIP, @Yang. In general, I'm +1 for this new feature. Leveraging Kubernetes's buildtin ConfigMap for Flink's HA services should significantly reduce the maintenance overhead compared to deploying a ZK cluster. I think this is an attractive feature for users. Concerning t

Re: Connecting two streams and order of their processing

2020-09-16 Thread David Anderson
The details of what can go wrong will vary depending on the precise scenario, but no, Flink is unable to provide any such guarantee. Doing so would require being able to control the scheduling of various threads running on different machines, which isn't possible. Of course, if event A becomes ava

Re: Connecting two streams and order of their processing

2020-09-16 Thread Jaswin Shah
With Keyed dual stream processing, you make sure that events for same key to processElement 1 and 2 are received to same partition. However, when you receive an event in processElement1, you should store that in flinks state so that if an another event arrives on delay to processElement2, you ca

Connecting two streams and order of their processing

2020-09-16 Thread Mazen Ezzeddine
Hello all, If an event is available now to Flink keyedcoprocess operator, and if another event will be available 1 minute later to that operator (same key), as a result of connecting the two streams, Flink does not provide any guarantee that the event available now will be processed (processElemen

Error: Avro "CREATE TABLE" with nested rows and missing fields

2020-09-16 Thread Dan Hill
I might be misunderstanding Flink Avro support. I assumed not including a field in "CREATE TABLE" would work fine. If I leave out any field before a nested row, "CREATE TABLE" fails. If I include all of the fields, this succeeds. I assumed fields would be optional. I'm using Flink v1.11.1 with