Re: [QUESTION] How to parallelize with explicit punctuation in Flink?

2019-10-07 Thread Yun Gao
Hi Filip, I have one question on the problem: what is the expected behavior when the parallelism of the FlatMapFunction is increased to more than 1? Should each subtask maintains the partial sum of all values received, and whenever the barrier is received, then it just outputs

Re: Building Flink 1.6.4 fails with "object scala in compiler mirror not found"

2019-10-07 Thread Congxian Qiu
Hi if you just want to skip the test, do you try to add `-DskipTests` when executing maven command. Best, Congxian Aikio, Torste 于2019年10月7日周一 下午11:36写道: > Hi, > > I'm trying to build Flink 1.6.4 from source and some of the tests for > flink-scala module are failing for me. Are there some add

Apache Flink Kafka Stream get all messages and stop

2019-10-07 Thread k.lesha...@yandex.ru
Hello everyone! I created a question on SO https://stackoverflow.com/questions/58280077/apache-flink-kafka-stream-get-all-messages-and-stop I want to do: Using Flink DataStream API, create a Kafka consumer, get all messages from the topic up to the current moment, stop consumer (the main problem is

Re: Group by multiple fields

2019-10-07 Thread Congxian Qiu
Hi Miguel Maybe the doc[1] about how to specifying the keys can help. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/api_concepts.html#specifying-keys Best, Congxian Miguel Farrajota 于2019年10月8日周二 上午12:09写道: > Hi, > > I'm looking to do some result aggregations on a event

[QUESTION] How to parallelize with explicit punctuation in Flink?

2019-10-07 Thread Filip Niksic
Hi all, What would be a natural way to implement a parallel version of the following Flink program? Suppose I have a stream of items of type DataItem with two concrete implementations of DataItem: Value and Barrier. Let’s say that Value holds an integer value, and Barrier acts as explicit punctua

Kafka producer throws exceptions when publishing a keyed stream

2019-10-07 Thread Shahid Chohan
Hello, I am running into issues writing a keyed stream from sink subtasks to an output kafka topic. The job is of the form: source -> filter -> keyby(id) -> flatmap -> sink The exceptions are coming from the kafka producer and cause checkpointing to timeout: - FlinkKafkaException: Failed to

Re: kafka offset not working

2019-10-07 Thread Eduardo Winpenny Tejedor
Flink only uses the specified starting point consumer-side setting the first time the application is deployed. After that Flink will store the Kafka offsets as part of the (source) operator state and when restarted from a checkpoint/savepoint it'll continue to consume from where it left off. Hope

[SURVEY] How do people upgrade their Flink applications?

2019-10-07 Thread Konstantinos Kallas
Hi everyone, I am interested in how people safely upgrade their Flink applications when they apply a patch, or when they develop a new feature. How do you ensure that no bug was introduced? Do you run two versions of the application simultaneously (the old and the upgraded one), monitoring the

Group by multiple fields

2019-10-07 Thread Miguel Farrajota
Hi, I'm looking to do some result aggregations on a event stream using several fields (for example, country and city) but I'm having a hard time finding any information about how to do this programmatically in the official documentation. I've found a couple of examples using SQL, but I'm looking t

Re: Flink using Oozie in Kerberized cluster

2019-10-07 Thread Srivastava,Rajat
Sounds like a good idea. Thanks for your help! Best, Rajat Srivastava From: sri hari kali charan Tummala Date: Monday, October 7, 2019 at 8:11 AM To: "Srivastava,Rajat" Subject: Re: Flink using Oozie in Kerberized cluster please raise a ticket with cloudera its kerberos issue. On Sun, Oct 6,

kafka offset not working

2019-10-07 Thread Benjamin Cuthbert
All Our kafka consumer has the group.id property but when we receive messages on the channel when not connect and then reconnect we don't get the message. The kafka console consumer when running "-group testben" works perfectly fine. Is there some other place to setup flink to read from the late

Re: POJO serialization vs immutability

2019-10-07 Thread Arvid Heise
The POJOs that Flink supports follow the Java Bean style, so they are mutable. I agree that direct support for immutable types would be desirable, but in this case, we need to differentiate a bit more. Any mutable object can be effective immutable, if the state is not changed after a certain point

Building Flink 1.6.4 fails with "object scala in compiler mirror not found"

2019-10-07 Thread Aikio, Torste
Hi, I'm trying to build Flink 1.6.4 from source and some of the tests for flink-scala module are failing for me. Are there some additional dependencies that I need to install to get the tests pass? The essential part of Maven output is here: Running org.apache.flink.api.scala.runtime.Tuple

Re: POJO serialization vs immutability

2019-10-07 Thread Jan Lukavský
Having said that - the same logic applies to using POJO as keys in grouping operations, which heavily rely on hashCode() and equals(). That might suggest, that using mutable objects is not the best option there either. But that might  be very much subjective claim. Jan On 10/7/19 3:13 PM, Jan

Re[2]: POJO serialization vs immutability

2019-10-07 Thread Протченко Алексей
Sorry, but what about immutability in common? Seems like there is no way to have normal immutable chunks inside the stream (but mutable chunks inside stream seem to be some kind of «code smell»). Or I’m just missing something?   Best regards, Alex   >Понедельник, 7 октября 2019, 16:13 +03:00 от

Re: POJO serialization vs immutability

2019-10-07 Thread Jan Lukavský
Exactly. And that's why it is good for mutable data, because they are not suited for keys either. Jan On 10/7/19 2:58 PM, Chesnay Schepler wrote: The default hashCode implementation is effectively random and not suited for keys as they may not be routed to the same instance. On 07/10/2019 14

Re: POJO serialization vs immutability

2019-10-07 Thread Chesnay Schepler
The default hashCode implementation is effectively random and not suited for keys as they may not be routed to the same instance. On 07/10/2019 14:54, Jan Lukavský wrote: Hi Stephen, I found a very nice article [1], which might help you solve the issues you are concerned about. The elegant s

Re: POJO serialization vs immutability

2019-10-07 Thread Jan Lukavský
Hi Stephen, I found a very nice article [1], which might help you solve the issues you are concerned about. The elegant solution to this problem might be summarized as "do not implement equals() and hashCode() for POJO types, use Object's default implementation". I'm not 100% sure that this wi

Re: POJO serialization vs immutability

2019-10-07 Thread Chesnay Schepler
This question should only be relevant for cases where POJOs are used as keys, in which case they /must not/ return a class-constant nor effectively-random value, as this would break the hash partitioning. This is somewhat alluded to in the keyBy() documentation

Re: Flink KPL based on a custom class object

2019-10-07 Thread Chesnay Schepler
You have to implement a SerializationSchema and pass that into the FlinkKinesisProducer. The error message you received is caused by the compiler attempting to determine the generic type of the producer, but not being able to do so since your myObject class does not implement the correct inter

Re: Difficult to debug reason for checkpoint decline

2019-10-07 Thread Chesnay Schepler
There does indeed appear to be a code path in the StreamTask where an exception might not be logger on the TaskExecutor. (StreamTask#handleExecutionException) In FLINK-10753 the CheckpointCoordinator was adjusted to log the full stacktrace, and is part of 1.5.6. On 07/10/2019 09:51, Daniel Ha

Re: kinesis consumer metrics user variables

2019-10-07 Thread Chesnay Schepler
How do the shard name and id appear in the tags when you remove the metric groups? There should be name clashes within Flink for any consumer that reads from multiple shards, since the metrics for individual shards are no longer uniquely identified. Which reporter are you using? I would rec

Re: containThrowable missing in ExceptionUtils

2019-10-07 Thread Chesnay Schepler
The listed method no longer exists and was subsumed by ExceptionUtils#findThrowable, which also gives access to the Throwable if it could be found. I have filed FLINK-14334 for updating the documentation. On 02/10/2019 15:48, Nicholas Walton wrote: Hi, I’m trying to implement a failure handl

Re: kinesis consumer metrics user variables

2019-10-07 Thread Yitzchak Lieberman
Hi. yes, I prefer to have the option to remove new metric groups. It shouldn't do any name clashes as it appears on the tags. Right now I've compiled flink kinesis connector with boolean option to control it. On Mon, Oct 7, 2019 at 11:05 AM Chesnay Schepler wrote: > What exactly would you prefer

Re: kinesis consumer metrics user variables

2019-10-07 Thread Chesnay Schepler
What exactly would you prefer? Without the stream name and shard id you'd end up with name clashes all over the place. Why can you not aggregate them? Surely Datadog supports some way to define a wildcard when definying the tags to aggregate. On 03/10/2019 09:09, Yitzchak Lieberman wrote: H

Difficult to debug reason for checkpoint decline

2019-10-07 Thread Daniel Harper
We had an issue recently where no checkpoints were able to complete, with the following message in the job manager logs 2019-09-25 12:27:57,159 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Decline checkpoint 7041 by task 1f789ac3c5df655fe5482932b2255fd3 of job 214ccf9a

Re: Questions about how to use State Processor API

2019-10-07 Thread Chesnay Schepler
1. Only the Java API is supported. 2. As far as I can tell you are correct, the given checkpoint path isn't really used. On 04/10/2019 10:39, Tony Wei wrote: Hi, I'm recently trying to use State Processor API, but I have some questions during the development. 1. Does `OperatorTransformati