Re: Flink on Kubernetes seems to ignore log4j.properties

2019-12-10 Thread Yang Wang
Could you find the logs under /opt/flink/log/jobmanager.log? If not, please share the commands the JobManager and TaskManager are using? If the command is correct and the log4j under /opt/flink/conf is expected, it is so curious why we could not get the logs. Best, Yang Li Peng 于2019年12月11日周三 下

Re: Event Timestamp corrupted by timezone

2019-12-10 Thread Jingsong Li
Hi Wojciech, You can try 1.9/1.10 with blink planner. As Timo said, the timestamp is TimestampType without time zone in new type system and Blink planner support it. And you should use LocalDateTime in sink/down stream, LocalDateTime has no time zone. Best, Jingsong Lee On Tue, Dec 10, 2019 at

Request for removal from subscription

2019-12-10 Thread L Jainkeri, Suman (Nokia - IN/Bangalore)
Unsubscribe

Re: Flink on Kubernetes seems to ignore log4j.properties

2019-12-10 Thread Li Peng
Ah I see. I think the Flink app is reading files from /opt/flink/conf correctly as it is, since changes I make to flink-conf are picked up as expected, it's just the log4j properties that are either not being used, or don't apply to stdout or whatever source k8 uses for its logs? Given that the pod

Re: Processing Events by custom rules kept in Broadcast State

2019-12-10 Thread vino yang
Hi KristoffSC, It seems the main differences are when to parse your rules and what could be put into the broadcast state. IMO, multiple solutions all can take effect. I prefer option 3. I'd like to parse the rules ASAP and let them be real rule event stream (not ruleset stream) in the source. The

Re: Flink on Kubernetes seems to ignore log4j.properties

2019-12-10 Thread Yun Tang
Sure, /opt/flink/conf is mounted as a volume from the configmap. Best Yun Tang From: Li Peng Date: Wednesday, December 11, 2019 at 9:37 AM To: Yang Wang Cc: vino yang , user Subject: Re: Flink on Kubernetes seems to ignore log4j.properties 1. Hey Yun, I'm calling /opt/flink/bin/standalone-job

Re: Thread access and broadcast state initialization in BroadcastProcessFunction

2019-12-10 Thread vino yang
Hi kristoffSC, >> I've noticed that all methods are called by the same thread. Would it be always the case, or could those methods be called by different threads? No, open/processXXX/close methods are called in the different stages of a task thread's life cycle. The framework must keep the call o

Re: Flink ML feature

2019-12-10 Thread vino yang
Hi Benoit, I can only try to ping @Till Rohrmann @Kurt Young who may know more information to answer this question. Best, Vino Benoît Paris 于2019年12月10日周二 下午7:06写道: > Is there any information as to whether Alink is going to be contributed to > Apache Flink as the official ML Lib? > > > On T

Re: Apache Flink - Retries for async processing

2019-12-10 Thread Zhu Zhu
Hi M Singh, I think you would be able to know the request failure cause and whether it is recoverable or not. You can handle the error as you like. For example, if you think the error is unrecoverable, you can complete the ResultFuture exceptionally to expose this failure to Flink framework. If th

Re: [DISCUSS] Adding e2e tests for Flink's Mesos integration

2019-12-10 Thread Yangze Guo
Thanks for the feedback, Yang. Some updates I want to share in this thread. I have built a PoC version of Meos e2e test with WordCount workflow.[1] Then, I ran it in the testing environment. As the result shown here[2]: - For pulling image from DockerHub, it took 1 minute and 21 seconds - For buil

Re: Flink on Kubernetes seems to ignore log4j.properties

2019-12-10 Thread Li Peng
1. Hey Yun, I'm calling /opt/flink/bin/standalone-job.sh and /opt/flink/bin/taskmanager.sh on my job and task managers respectively. It's based on the setup described here: http://shzhangji.com/blog/2019/08/24/deploy-flink-job-cluster-on-kubernetes/ . I haven't tried the configmap approach yet, doe

Re: KeyBy/Rebalance overhead?

2019-12-10 Thread Komal Mariam
Thank you so much for the detailed reply. I understand the usage for keyBy a lot better now. You are correct about the time variation too. We will apply different network settings and extend our datasets to check performance on different use cases. On Mon, 9 Dec 2019 at 20:45, Arvid Heise wrote:

Interval Join Late Record Metrics

2019-12-10 Thread Chris Gillespie
Hello Flink users, first time poster here. I'm using an interval join in my Flink project, however I haven't found where late records get logged in metrics. Window Joins have "numLateRecordsDropped" implemented

Re: Help to Understand cutoff memory

2019-12-10 Thread Theo Diefenthal
Hi Lu, I found this talk on last Flink Forward in Berlin very helpful in order to understand JVM RAM and cutoff memory [1]. Maybe it helps you understand that stuff better. In my experiences on YARN, the author was totally correct. I was able to reproduce that by assigning something about 12G

Processing Events by custom rules kept in Broadcast State

2019-12-10 Thread KristoffSC
Hi, I think this would be the very basic use case for Broadcast State Pattern but I would like to know what are the best approaches to solve this problem. I have an operator that extends BroadcastProcessFunction. The brodcastElement is an element sent as Json format message by Kafka. It describes

Help to Understand cutoff memory

2019-12-10 Thread Lu Niu
Hi, flink users I have some question regarding memory allocation. According to doc, containerized.heap-cutoff-ratio means: ``` Percentage of heap space to remove from containers (YARN / Mesos), to compensate for other JVM memory usage ``` However, I find cutoff memory is actually treated as "part

Thread access and broadcast state initialization in BroadcastProcessFunction

2019-12-10 Thread KristoffSC
Hi, I was playing around with BroadcastProcessFunction and I've observe a specific behavior. My setup: MapStateDescriptor ruleStateDescriptor = new MapStateDescriptor<>( "RulesBroadcastState", Types.VOID, TypeInformation.of(new TypeHint() {

Re: Need help using AggregateFunction instead of FoldFunction

2019-12-10 Thread Devin Bost
I did confirm that I got no resulting output after 20 seconds and after sending additional data after waiting over a minute between batches of data. My code looks like this: PulsarSourceBuilder builder = PulsarSourceBuilder .builder(new SimpleStringSchema()) .serviceUrl(SERVICE_URL)

Apache Flink - Clarifications about late side output

2019-12-10 Thread M Singh
Hi: I have a few questions about the side output late data.   Here is the API stream .keyBy(...) <- keyed versus non-keyed windows .window(...) <- required: "assigner" [.trigger(...)]<- optional: "trigger" (else default trigger) [.

Re: Flink 'Job Cluster' mode Ui Access

2019-12-10 Thread Jatin Banger
Yes, I did. On Tue, Dec 10, 2019 at 3:47 PM Arvid Heise wrote: > Hi Jatin, > > just to be sure. Did you increase the log level to debug [1] before > checking for *StaticFileServerHandler*? > > Best, > > Arvid > > [1] > https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/logging.htm

Order events by filed that does not represent time

2019-12-10 Thread KristoffSC
Hi, Is it possible to use an field that does not represent timestamp to order events in Flink's pipeline? In other words, I will receive a stream of events that will ha a sequence number (gaps are possible). Can I maintain the order of those events based on this field same as I would do for time r

Re: Event Timestamp corrupted by timezone

2019-12-10 Thread Timo Walther
Hi, I hope we can solve this issues with the new type system. The core problem is the old planner uses java.sql.Timestamp which depends on the timezone of the current machine. I would recommend to set everything to UTC if possible for now. Regards, Timo On 03.12.19 18:49, Lasse Nedergaard

Re: Job manager is failing to start with an S3 no key specified exception [1.7.2]

2019-12-10 Thread Andrey Zagrebin
`flink-2`Hi Harshith, Could you share your full log files from the job master? As I understand, this stack trace already belongs to a failover attempt, what was the original cause of failover? Do you still have any other job state in S3 for this cluster id `flink-2`? Have you tried the latest vers

Re: Side output question

2019-12-10 Thread Arvid Heise
There is no clear reference as it's not a use case that has occurred yet. I'd be careful with all metrics related to output. Shuffle service should be fine [1] as side-output also go over it. I wouldn't be surprised if currentOutputWatermark is not updated though. [1] https://ci.apache.org/project

Re: Apache Flink - Retries for async processing

2019-12-10 Thread M Singh
Thanks Jingsong for sharing your solution. Since both refreshing the token and the actual API request can fail with either recoverable and unrecoverable exceptions, are there any patterns for retrying both and making the code robust to failures. Thanks again. On Monday, December 9, 2019, 10:

Re: Side output question

2019-12-10 Thread M Singh
Thanks Arvid for your answer. Can you please point me to any documentation/reference as to which metrics might be impacted ? Also, let me know of any other pitfall. Once again, I appreciate your help. On Tuesday, December 10, 2019, 03:23:01 AM EST, Arvid Heise wrote: Hi Mans, there sho

Re: Flink ML feature

2019-12-10 Thread Benoît Paris
Is there any information as to whether Alink is going to be contributed to Apache Flink as the official ML Lib? On Tue, Dec 10, 2019 at 7:11 AM vino yang wrote: > Hi Chandu, > > AFAIK, there is a project named Alink[1] which is the Machine Learning > algorithm platform based on Flink, developed

Re: Basic question about flink programms

2019-12-10 Thread KristoffSC
Hi Arvid Heise-3, Thanks for your answer. I took this approach. I did not want to start a new thread since I wanted to avoid "subject duplication" :) Regards, Krzysztof -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Need help using AggregateFunction instead of FoldFunction

2019-12-10 Thread Arvid Heise
getResult will only be called when the window is triggered. For a fixed-time window, it triggers at the end of the window. However, for EventTimeSessionWindows you need to have gaps in the data. Can you verify that there is actually a 20sec pause inbetween data points for your keys? Additionally,

Re: Flink 'Job Cluster' mode Ui Access

2019-12-10 Thread Arvid Heise
Hi Jatin, just to be sure. Did you increase the log level to debug [1] before checking for *StaticFileServerHandler*? Best, Arvid [1] https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/logging.html#configuring-log4j On Mon, Dec 9, 2019 at 7:54 AM Jatin Banger wrote: > Hi, > >

Re: Basic question about flink programms

2019-12-10 Thread Arvid Heise
Hi KristoffSC, it would be better if you'd open up a new thread. It's very rare for users to check user lists after 1 year on a regular basis. In general, if you have a cache, you usually don't want to serialize it. So add the cache as a field inside the respective function (rewrite a lambda to a

Re: Flink authentication hbase use kerberos

2019-12-10 Thread Aljoscha Krettek
Hi, I believe that accessing a Kerberos-secured HBase only works from a kerberized YARN, because you need the key tab shipping. But I’m not 100 % sure. Best, Aljoscha > On 4. Dec 2019, at 07:41, venn wrote: > > Hi Guys: > I wonder about, it is work that flink on yarn deploy on no

Re: Side output question

2019-12-10 Thread Arvid Heise
Hi Mans, there should be no issue to only have side-outputs in your operator. There should also be no big drawbacks. I guess mostly some metrics will not be properly populated, but you can always populate them manually or add new ones. Best, Arvid On Mon, Dec 2, 2019 at 8:40 PM M Singh wrote: