Re: How would I use OneInputStreamOperator to deal with data skew?

2019-04-17 Thread Felipe Gutierrez
Thanks for the tip! I guess now it is working as it should be . Just one last question. Why did you decide to use "AbstractStreamOperator" instead o

kafka partitions, data locality

2019-04-17 Thread Smirnov Sergey Vladimirovich (39833)
Hello, We planning to use apache flink as a core component of our new streaming system for internal processes (finance, banking business) based on apache kafka. So we starting some research with apache flink and one of the question, arises during that work, is how flink handle with data locality

Fast restart of a job with a large state

2019-04-17 Thread Sergey Zhemzhitsky
Hi Flinkers, Operating different flink jobs I've discovered that job restarts with a pretty large state (in my case this is up to 100GB+) take quite a lot of time. For example, to restart a job (e.g. to update it) the savepoint is created, and in case of savepoints all the state seems to be pushed

Re: Scalaj vs akka as http client for Asyncio Flink

2019-04-17 Thread Andy Hoang
Hi Till, Sorry to bother you again, so I manage to build and work with akka http client in my local After deploy to yarn node, the actorsystem cant be connected. ``` PPLogger.getActivityLogger.info("### 1") implicit val system = ActorSystem("my-system") PPLogger.getActivityLogger.info("

Re: Scalaj vs akka as http client for Asyncio Flink

2019-04-17 Thread Till Rohrmann
Check the logs what Akka is logging and verify that the port you try to bind to is free. Cheers, Till On Wed, Apr 17, 2019 at 12:50 PM Andy Hoang wrote: > Hi Till, > > Sorry to bother you again, so I manage to build and work with akka http > client in my local > After deploy to yarn node, the a

Re: How would I use OneInputStreamOperator to deal with data skew?

2019-04-17 Thread Kurt Young
I mean no particular reason. Best, Kurt On Wed, Apr 17, 2019 at 7:44 PM Kurt Young wrote: > There is no reason for it, the operator and function doesn't rely on the > logic which AbstractUdfStreamOperator supplied. > > Best, > Kurt > > > On Wed, Apr 17, 2019 at 4:35 PM Felipe Gutierrez < > fel

Re: How would I use OneInputStreamOperator to deal with data skew?

2019-04-17 Thread Kurt Young
There is no reason for it, the operator and function doesn't rely on the logic which AbstractUdfStreamOperator supplied. Best, Kurt On Wed, Apr 17, 2019 at 4:35 PM Felipe Gutierrez < felipe.o.gutier...@gmail.com> wrote: > Thanks for the tip! I guess now it is working as it should be >

Re: 回复:Is it possible to handle late data when using table API?

2019-04-17 Thread Lasse Nedergaard
Hi Hequn Thanks for the details. I will give it a try. Med venlig hilsen / Best regards Lasse Nedergaard > Den 17. apr. 2019 kl. 04.09 skrev Hequn Cheng : > > Hi Lasse, > > > some devices can deliver data days back in time and I would like to have > > the results as fast as possible. > > W

Re: What is the best way to handle data skew processing in Data Stream applications?

2019-04-17 Thread Felipe Gutierrez
I guess I could implement a solution which is not static and extends the OneInputStreamOperator Flink operator. https://github.com/felipegutierrez/explore-flink/blob/master/src/main/java/org/sense/flink/examples/stream/MqttSensorDataSkewedCombinerByKeySkewedDAG.java#L84 Best, Felipe *--* *-- Feli

Service discovery on YARN - find out which port was dynamically assigned to the JobManager Web Interface

2019-04-17 Thread Olivier Solliec
Hello, I want to be able to register a flink cluster into a service discovery system (Consul in our case). This flink cluster is scheduled on YARN. Is there a way to know which port was assigned to the rest interface ? Via the rest API /jobmanager/config, I see a key "jobmanager.rpc.address

assignTimestampsAndWatermarks not work after KeyedStream.process

2019-04-17 Thread an00na
`assignTimestampsAndWatermarks` before `keyBy` works: ```java DataStream trips = env.addSource(consumer).assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor(Time.days(1)) { @Override public long extractTimestamp(Trip trip) { ret

BucketingSink compressed text exactlyonce

2019-04-17 Thread Shengnan YU
Hi guys Any good ideas to achieve exactly once BucketingSink for text file?truncating compressed binary file will corrupt the gzip file which means I need to -text that gzip and redirect to a text file then compressed it agan and finally upload to hdfs. Its really inefficient. Any other compress

Re: java.io.IOException: NSS is already initialized

2019-04-17 Thread Hao Sun
I think I found the root cause https://bugs.alpinelinux.org/issues/10126 I have to re-install nss after apk update/upgrade Hao Sun On Sun, Nov 11, 2018 at 10:50 AM Ufuk Celebi wrote: > Hey Hao, > > 1) Regarding Hadoop S3: are you using the repackaged Hadoop S3 > dependency from the /opt fold

Re: Service discovery on YARN - find out which port was dynamically assigned to the JobManager Web Interface

2019-04-17 Thread Rong Rong
As far as I know, the port will be set to random binding. Yarn actually have the ability to translate the proxy link to the right node/port. If your goal is trying to avoid going through the YARN rest proxy, this could be a problem: There's chances that the host/port will get changed by YARN witho

Re: assignTimestampsAndWatermarks not work after KeyedStream.process

2019-04-17 Thread Paul Lam
Hi, Could you check the watermark of the window operator? One possible situation would be some of the keys are not getting enough inputs, so their watermarks remain below the window end time and hold the window operator watermark back. IMO, it’s a good practice to assign watermark earlier in th

flink program in a spring bean can not consume from kafka

2019-04-17 Thread jszhouch...@163.com
hi, i met a strange issue, the same code running in a java class can consume kafka , but when i change the java class to a spring bean(annotation is @service) , the program can not consume kafka amymore. does anyone met the similar problems or how can i debug this problems? thanks a lot

[Discuss] Add JobListener (hook) in flink job lifecycle

2019-04-17 Thread Jeff Zhang
Hi All, I created FLINK-12214 for adding JobListener (hook) in flink job lifecycle. Since this is a new public api for flink, so I'd like to discuss it more widely in community to get more feedback. The background and motivation is that I am int

Flink Metrics

2019-04-17 Thread Brian Ramprasad
Hi, I am trying to profile my Flink job. For example I want to output the results of the TaskIOMetricGroup to a log file. Does anyone know if there is a way to access this object at runtime and execute the methods to get the data from within my user code that I submit to the Flink to start a j

Re: [Discuss] Add JobListener (hook) in flink job lifecycle

2019-04-17 Thread vino yang
Hi Jeff, I personally like this proposal. From the perspective of programmability, the JobListener can make the third program more appreciable. The scene where I need the listener is the Flink cube engine for Apache Kylin. In the case, the Flink job program is embedded into the Kylin's executable