Re: Apache Flink - Throttling stream flow

2019-11-29 Thread M Singh
Thanks Rong for your references.  >From what I can see, the rate limiter is initialized statically.  But if the >load on downstream services varies, is there a way to update the rater limiter >at runtime ?  Please let me know if I missed anything. Thanks again for your advice.On Wednesday,

Re: Apache Flink - Troubleshooting exception while starting the job from savepoint

2019-11-29 Thread M Singh
Thanks Congxian for your references.  Mans On Wednesday, November 27, 2019, 07:12:57 AM EST, Congxian Qiu wrote: Hi, As the doc[1] said we should assign uid to all the stateful operators. If you do not set uid for an operator, Flink will generate an operatorId for it, AFAIK, operatorI

Re: Some doubts about window start time and end time

2019-11-29 Thread Dawid Wysakowicz
Hi Jun, First of all how do you actually get the ranges? TimeWindow#getStart returns a long, which you must've interpreted somehow. I will try to describe my suspicions. I think the core problem is that you pass the input in one timezone, but interpret in a different one. (the processing time uses

[DISCUSSION] Using `DataStreamUtils.reinterpretAsKeyedStream` on a pre-partitioned Kafka topic

2019-11-29 Thread Robin Cassan
Hi all! We are trying to build a Flink job that consumes a Kafka topic, groups the incoming events in Session Windows according to a String that can be generated by parsing the message (we call it `SessionKey`) and does some processing on the windows before sending them to another Kafka topic. Our

Re: Read multiline JSON/XML

2019-11-29 Thread Flavio Pompermaier
Parallel files processing would be enough, inner file parallelism would be awesome but it's a plus On Fri, Nov 29, 2019 at 3:46 PM Arvid Heise wrote: > A while ago, I implemented XML and Json input formats. However, having > proper split support for structured formats without sync markers is not

Re: Read multiline JSON/XML

2019-11-29 Thread Arvid Heise
A while ago, I implemented XML and Json input formats. However, having proper split support for structured formats without sync markers is not that easy. Any split that has a random start offset need to figure out the start of the next record on its own, which is fragile by definition. That's why s

Re: Read multiline JSON/XML

2019-11-29 Thread Chesnay Schepler
I know that at least the Table API can read json, but I don't know how well this translates into other APIs. On 29/11/2019 12:09, Flavio Pompermaier wrote: Hi to all, is there any out-of-the-box opt

Re: Read multiline JSON/XML

2019-11-29 Thread Suneel Marthi
For XML, u could look at Mahout's XMLInputFormat (if u r using HadoopInput Format). On Fri, Nov 29, 2019 at 9:01 AM Chesnay Schepler wrote: > Why vino? > > He's specifically asking whether Flink offers something _like_ spark. > > On 29/11/2019 14:39, vino yang wrote: > > Hi Flavio, > > IMO, it w

Re: Read multiline JSON/XML

2019-11-29 Thread Chesnay Schepler
Why vino? He's specifically asking whether Flink offers something _like_ spark. On 29/11/2019 14:39, vino yang wrote: Hi Flavio, IMO, it would take more effect to ask this question in the Spark user mailing list. WDYT? Best, Vino Flavio Pompermaier > 于2019年11月

Re: Read multiline JSON/XML

2019-11-29 Thread vino yang
Hi Flavio, IMO, it would take more effect to ask this question in the Spark user mailing list. WDYT? Best, Vino Flavio Pompermaier 于2019年11月29日周五 下午7:09写道: > Hi to all, > is there any out-of-the-box option to read multiline JSON or XML like in > Spark? > It would be awesome to have something

Re: How to recover state from savepoint on embedded mode?

2019-11-29 Thread Biao Liu
Hi Reo, Maybe we could find another way. > why I am not use the standalnoe mode to run the job is because the running env haven't zookeeper, and would not install the zookeeper. So I need to depend on the embedded mode to run my job. You could set up a standalone cluster without zookeeper. Do no

Re: Apache Flink - High Available K8 Setup Wrongly Marked Dead TaskManager

2019-11-29 Thread Chesnay Schepler
Does this happen regularly? As in, the cluster initially runs fine and around the same time-frame runs into problems? Can you provide the full logs for the task and jobmanager? On 29/11/2019 08:42, Eray Arslan wrote: Hi Chesnay, Thank you for reply. I figure out that issue with using livenessP

Read multiline JSON/XML

2019-11-29 Thread Flavio Pompermaier
Hi to all, is there any out-of-the-box option to read multiline JSON or XML like in Spark? It would be awesome to have something like spark.read .option("multiline", true) .json("/path/to/user.json") Best, Flavio

Re: Flink 'Job Cluster' mode Ui Access

2019-11-29 Thread Chesnay Schepler
To clarify, you ran "mvn package -pl flink-dist -am" to build Fink? If so, could you run that again and provide us with the maven output? | | On 29/11/2019 11:23, Jatin Banger wrote: Hi, @vino yang   I am using flink 1.8.1 I am using the following procedure for th

Re: How to recover state from savepoint on embedded mode?

2019-11-29 Thread Dawid Wysakowicz
Hi, I would like to clarify previous responses a bit. 1. From the architectural point of view yes it is true it is possible to restore from a savepoint from a local jvm as long as this jvm has access to the checkpoint. 2. Unfortunately the configuration you pass to the ctor of LocalStreamEnviron

Re: ProcessFunction collect and close, when to use?

2019-11-29 Thread Chesnay Schepler
1) You should never call close() on the collector; Flink will do that automatically. 2) No, it shouldn't block anything. Flink will look at the next record to process, notice it's a barrier and pass it on immediately. On 29/11/2019 05:29, shuwen zhou wrote: Hi Community, In ProcessFunction cla

Temporary failure in name resolution on JobManager

2019-11-29 Thread David Maddison
I have a Flink 1.7 cluster using the "flink:1.7.2" (OpenJDK build 1.8.0_222-b10) image on Kubernetes. As part of a MasterRestoreHook (for checkpointing) the JobManager needs to communicate with an external security service. This all works well until there's a DNS lookup failure (due to network is

Re: Flink 'Job Cluster' mode Ui Access

2019-11-29 Thread Jatin Banger
Hi, @vino yang I am using flink 1.8.1 I am using the following procedure for the deployment: https://github.com/apache/flink/blob/master/flink-container/docker/README.md And i tried accessing the path you mentioned: # curl :4081/#/overview {"errors":["Not found."]} Best Regards, Jatin On Th

Re: ArrayIndexOutOfBoundException on checkpoint creation

2019-11-29 Thread Gyula Fóra
Hi Theo! I have not seen this error before however I have encountered many strange things when using Kryo for serialization. From the stack trace it seems that this might indeed be a Kryo related issue. I am not sure what it is but what I would try is to change the state serializers to a non Kryo

Re: ProcessFunction collect and close, when to use?

2019-11-29 Thread shuwen zhou
Thank you Jiayi, that helps a lot! On Fri, 29 Nov 2019 at 13:44, bupt_ljy wrote: > Hi Shuwen, > > > > When to call close() ? After every element processed? Or > on ProcessFunction.close() ? Or never to use it? > > > IMO, the #close() function is used to manage the lifecycle of #Collector > inste