Re:Consume only a few of kafka topic partitions

2019-04-02 Thread shengjk1
Hi, Marke I understand that you want to consume only the specified partition of a topic of kafka. if so, current flink (1.7.x/1.8.x) is not support, you can see https://issues.apache.org/jira/browse/FLINK-11257 Best, Shengjk1 On 03/8/2019 01:03,Marke Builder wrote: Hi, it is possi

Re: Use different functions for different signal values

2019-04-02 Thread Hequn Cheng
Hi Marke, Ken is right. We can use split and select to achieve what you want. Besides, I am thinking if there is a lot of ingesting signals with unique Id's, why not use one function and process different logic in the function. For example, we can call different methods in the function according

Re: Flink - Type Erasure Exception trying to use Tuple6 instead of Tuple

2019-04-02 Thread Timothy Victor
Flink needs type information for serializing and deserializing objects, and that is lost due to Java type erasure. The only way to workaround this is to specify the return type of the function called in the lambda. Fabian's answer here explains it well. https://stackoverflow.com/questions/50945

Flink - Type Erasure Exception trying to use Tuple6 instead of Tuple

2019-04-02 Thread Vijay Balakrishnan
Hi, I am trying to use the KeyedStream with Tuple to handle diffrent types of Tuples including Tuple6. Keep getting the Exception: *Exception in thread "main" org.apache.flink.api.common.functions.InvalidTypesException: Usage of class Tuple as a type is not allowed. Use a concrete subclass (e.g. Tu

Re: Use different functions for different signal values

2019-04-02 Thread Ken Krugler
Hi Marke, You can use DataStream.split() to create a SplitStream, and then call SplitStream.select() to create the three different paths to the three functions. See https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/#datastream-transformations

Re: long lived standalone job session cluster in kubernetes

2019-04-02 Thread Till Rohrmann
Hi Heath, I think some of the PRs are already open and ready for review [1, 2]. [1] https://issues.apache.org/jira/browse/FLINK-10932 [2] https://issues.apache.org/jira/browse/FLINK-10935 Cheers, Till On Wed, Feb 27, 2019 at 10:48 AM Heath Albritton wrote: > Great, my team is eager to get sta

Re: Metrics for received records per TaskManager

2019-04-02 Thread Yun Tang
Hi Benjamin Flink could support to report its metrics to external system such as Prometheus, Graphite and so on [1]. And you could then use web front end such as Grafana to query those system. Take `numBytesInLocalPerSecond` metrics for example, it would have many metrics tags and one of them i

Re: BucketAssigner - Confusion

2019-04-02 Thread Jeff Crane
According to my IDE (Jetbrains), I get an error with getBucketID(IN, Context) signature requiring a return of string (Flink 1.7 libs), so I still don't think the BucketID is a variable type. I still don't understand the role of the:public SimpleVersionedSerializer getSerializer() {

Re: Metrics for received records per TaskManager

2019-04-02 Thread Benjamin Burkhardt
Hi Yun, thank you for the advice, but how would you suggest doing it to get the metrics also for each TaskManager? I do not urgently need to use REST because I’m running my code within Flink. Maybe there is another way to access it? Thanks a lot. Benjamin Am 2. Apr. 2019, 18:26 +0200 schrieb Y

Re: Metrics for received records per TaskManager

2019-04-02 Thread Yun Tang
Hi Benjamin Try this http://localhost:8081/jobs/{job-id}/vertices/{vertices-id}/subtasks/{subtask-index}/metrics?get=numBytesInLocalPerSecond You could GET http://localhost:8081/jobs/

Use different functions for different signal values

2019-04-02 Thread Marke Builder
Hi, I want to implement the following behavior: [image: image.png] There are a lot of ingest signals with unique Id's, I would use for each signal set a special function. E.g. Signal1, Signal2 ==> function1, Signal3, Signal4 ==> function2. What is the recommended way to implement this pattern? T

Re: How to run a job with job cluster mode on top of mesos?

2019-04-02 Thread Jacky Yin 殷传旺
Yes, it worked for me. However, just like what you said, it is not that straightforward, so I would like to learn from ` StandaloneJobClusterEntrypoint ` and try to enhance the ` MesosJobClusterEntrypoint`. 😊 Jacky Yin 发件人: Till Rohrmann 日期: 2019年4月2日 星期二 下午10:50 收件人: Jacky Yin 殷传旺 抄送: "user@f

Re: How to run a job with job cluster mode on top of mesos?

2019-04-02 Thread Till Rohrmann
By the way, did the Mesos job mode work for you in the end? On Tue, Apr 2, 2019 at 7:47 AM Till Rohrmann wrote: > Sure I will help with the review. Thanks for opening the PR Jacky. > > Cheers, > Till > > On Tue, Apr 2, 2019 at 2:17 AM Jacky Yin 殷传旺 wrote: > >> Hello Till, >> >> >> >> I submitte

Re: How to run a job with job cluster mode on top of mesos?

2019-04-02 Thread Till Rohrmann
Sure I will help with the review. Thanks for opening the PR Jacky. Cheers, Till On Tue, Apr 2, 2019 at 2:17 AM Jacky Yin 殷传旺 wrote: > Hello Till, > > > > I submitted a PR(#8084) for this issue. Could you help review it? > > > > Many thanks! > > > > *Jacky Yin* > > *发件人**: *Till Rohrmann > *日期*

Re: kafka corrupt record exception

2019-04-02 Thread Dominik Wosiński
Hey, As far as I understand the error is not caused by the deserialization but really by the polling of the message, so custom deserialization schema won't really help in this case. There seems to be an error in the messages in Your topic. You can see here

Re: Metrics for received records per TaskManager

2019-04-02 Thread Benjamin Burkhardt
Hi Yun, thanks for the hint. I tried to access the metric through the REST API calling  http://localhost:8081/taskmanagers/2264f296385854f2d1fb4d121495822a/metrics?get= numBytesInRemotePerSecond. Unfortunately the metric is not available... Only these are avaiblable: [{"id":"Status.Network.Avai

Re: Any suggestions about which GC collector to use in Flink?

2019-04-02 Thread qi luo
+1. It would be great if someone could benchmark between difference GC in Flink (we may do it in next few months). I’m told that the default parallel GC provides better throughput but longer pauses (we encountered 2min+ GC pauses in large dataset). Whereas the G1GC provides less pauses but also

End to End Performance Testing

2019-04-02 Thread WILSON Frank
Hi, I am looking for resources to help me test the performance of my Flink pipelines end-to-end. I want to verify that my pipelines meet throughput and latency requirements (so for a given number of events per second the latency of the output is under so many seconds). I read that Alibaba had d

InvalidProgramException when trying to sort a group within a dataset

2019-04-02 Thread Papadopoulos, Konstantinos
Hi all, I am trying to sort a group within a dataset using KeySelector as follows: in .groupBy("productId", "timePeriodId", "geographyId") .sortGroup(new KeySelector() { @Override public Double getKey(ThresholdAcvFact thresholdAcvFact) throws Exception { return Optional.ofNul

Re: How to run a job with job cluster mode on top of mesos?

2019-04-02 Thread Jacky Yin 殷传旺
Hello Till, I submitted a PR(#8084) for this issue. Could you help review it? Many thanks! Jacky Yin 发件人: Till Rohrmann 日期: 2019年3月29日 星期五 下午11:06 收件人: Jacky Yin 殷传旺 抄送: "user@flink.apache.org" 主题: Re: How to run a job with job cluster mode on top of mesos? Thanks a lot Jacky. Cheers, Till

Re: Metrics for received records per TaskManager

2019-04-02 Thread Yun Tang
Hi Benjamin I think 'numBytesInLocalPerSecond' and 'numBytesInRemotePerSecond' which indicate 'The number of bytes this task reads from a local source per second' and 'The number of bytes this task reads from a remote source per second' respectively could help you. If you want to track the info

Re: How can I get the right TaskExecutor in ProcessFunction

2019-04-02 Thread Andrey Zagrebin
Hi, What kind of information do you need about the TaskExecutor? This is usually quite low level type of information which might change randomly, e.g. after restore. What is the original problem why you need it? Maybe, there is another solution. E.g. you can get index of local parallel subtask of

Re: kafka corrupt record exception

2019-04-02 Thread Ilya Karpov
According to docs (here: https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#the-deserializationschema , last paragraph) that’s an expected behaviour. May be

Any suggestions about which GC collector to use in Flink?

2019-04-02 Thread 徐涛
Hi Experts, In my environment, when I submit the Flink program to yarn, I do not specify which GC collector to use, in the web monitor page, I found it uses PS_Scavenge as the young generation GC collector, PS_MarkSweep as the old generation GC collector, I wonder if I can use G1 as the

Metrics for received records per TaskManager

2019-04-02 Thread Benjamin Burkhardt
Hi all, I’m looking for a metric which allows me keeping track of the records or bytes each TaskManager has received or processed for the current task. Can anyone help me getting this? Thanks. Benjamin