Re: Flink ML feature

2019-12-09 Thread vino yang
Hi Chandu, AFAIK, there is a project named Alink[1] which is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform. FYI Best, Vino [1]: https://github.com/alibaba/Alink Tom Blackwood 于2019年12月10日周二 下午2:07写道: > You may try Spark ML, whi

Re: Flink ML feature

2019-12-09 Thread Tom Blackwood
You may try Spark ML, which is a production ready library for ML stuff. regards. On Tue, Dec 10, 2019 at 1:04 PM chandu soa wrote: > Hello Community, > > Can you please give me some pointers for implementing Machine Learning > using Flink. > > I see Flink ML libraries were dropped in v1.9. It l

Flink SQL Kafka topic DDL ,the kafka' json field conflict with flink SQL Keywords

2019-12-09 Thread LakeShen
Hi community, when I use Flink SQL DDL ,the kafka' json field conflict with flink SQL Keywords,my thought is that using the UDTF to solve it . Is there graceful way to solve this problem?

Re: SQL for Avro GenericRecords on Parquet

2019-12-09 Thread Peter Huang
Hi Hanan, I created a fix for the problem. Would you please try it from your side? https://github.com/apache/flink/pull/10371 Best Regards Peter Huang On Tue, Nov 26, 2019 at 8:07 AM Peter Huang wrote: > Hi Hanan, > > After investigating the issue by using the test case you provided, I think

Flink ML feature

2019-12-09 Thread chandu soa
Hello Community, Can you please give me some pointers for implementing Machine Learning using Flink. I see Flink ML libraries were dropped in v1.9. It looks like ML feature in Flink going to be enhanced. What is the recommended approach for implementing production grade ML based apps using Flink

Re: Emit intermediate accumulator state of a session window

2019-12-09 Thread chandu soa
Thank you all for your responses. I've created a custom trigger similar to flink provided EventTimeTrigger, with few changes. Fire event on onElement(), and do not fire event on onEventTime() to satisfy my requirement - whenever new event arrives fire incremental result(result of AggregateFunction

Re: Flink on Kubernetes seems to ignore log4j.properties

2019-12-09 Thread Yang Wang
Hi Li Peng, You are running standalone session cluster or per-job cluster on kubernetes. Right? If so, i think you need to check your log4j.properties in the image, not local. The log is stored to /opt/flink/log/jobmanager.log by default. If you are running active Kubernetes integration for a fre

Re: Apache Flink - Retries for async processing

2019-12-09 Thread Jingsong Li
Hi M Singh, Our internal has this scenario too, as far as I know, Flink does not have this internal mechanism in 1.9 too. I can share my solution: - In async function, start a thread factory. - Send the call to thread factory when this call has failed. Do refresh security token too. Actually, deal

Re: Flink on Kubernetes seems to ignore log4j.properties

2019-12-09 Thread Yun Tang
Hi Peng What kind of deployment of K8s did you try in flink-doc[1], if using session mode, you can control your log4j configuration via configmap [2]. From my experience, this could control the log4j well. If you did not override the command of flink docker, it will start-foreground the taskma

Re: StreamingFileSink doesn't close multipart uploads to s3?

2019-12-09 Thread Jingsong Li
Hi Kostas, I took a look to StreamingFileSink.close, it just delete all temporary files. I know it is for failover. When Job fail, it should just delete temp files for next restart. But for testing purposes, we just want to run a bounded streaming job. If there is no checkpoint trigger, no one wi

Re: Flink on Kubernetes seems to ignore log4j.properties

2019-12-09 Thread vino yang
Hi Li, A potential reason could be conflicting logging frameworks. Can you share the log in your .out file and let us know if the print format of the log is the same as the configuration file you gave. Best, Vino Li Peng 于2019年12月10日周二 上午10:09写道: > Hey folks, I noticed that my kubernetes flink

Flink on Kubernetes seems to ignore log4j.properties

2019-12-09 Thread Li Peng
Hey folks, I noticed that my kubernetes flink logs (reached via *kubectl logs *) completely ignore any of the configurations I put into /flink/conf/. I set the logger level to WARN, yet I still see INFO level logging from flink loggers like org.apache.flink.runtime.checkpoint.CheckpointCoordinator.

Apache Flink - Retries for async processing

2019-12-09 Thread M Singh
Hi Folks: I am working on a project where I will be using Flink's async processing capabilities.  The job has to make http request using a token.  The token expires periodically and needs to be refreshed. So, I was looking for patterns for handling async call failures and retries when the token

Re: StreamingFileSink doesn't close multipart uploads to s3?

2019-12-09 Thread Kostas Kloudas
Hi Li, This is the expected behavior. All the "exactly-once" sinks in Flink require checkpointing to be enabled. We will update the documentation to be clearer in the upcoming release. Thanks a lot, Kostas On Sat, Dec 7, 2019 at 3:47 AM Li Peng wrote: > > Ok I seem to have solved the issue by e

Re: User program failures cause JobManager to be shutdown

2019-12-09 Thread 김동원
Hi Robert, Yeah, I know. For the moment, I warned my colleagues not to call System.exit() :-) But it needs to be implemented for the sake of Flink usability as you described in the issue. Thanks a lot for taking care of this issue. Best, Dongwon > 2019. 12. 9. 오후 9:55, Robert Metzger 작성: >

Re: User program failures cause JobManager to be shutdown

2019-12-09 Thread Robert Metzger
Hey Dongwon, I filed a ticket: https://issues.apache.org/jira/browse/FLINK-15156 This does not mean it will be implemented anytime soon :) On Mon, Dec 9, 2019 at 2:25 AM Dongwon Kim wrote: > Hi Robert and Roman, > Yeah, letting users know System.exit() is called would be much more > appropriate

Re: Change Flink binding address in local mode

2019-12-09 Thread Andrea Cardaci
On Mon, 9 Dec 2019 at 12:54, Chesnay Schepler wrote: > At this point I would suggest to file a ticket Here it is: https://issues.apache.org/jira/browse/FLINK-15154

Re: Change Flink binding address in local mode

2019-12-09 Thread Chesnay Schepler
At this point I would suggest to file a ticket; these are the options that _should_ control the behavior but apparently aren't in all cases. On 08/12/2019 12:23, Andrea Cardaci wrote: Hi, Flink (or some of its services) listens on three random TCP ports during the local[1] execution, e.g., 399

Re: KeyBy/Rebalance overhead?

2019-12-09 Thread Arvid Heise
Hi Komal, as a general rule of thumb, you want to avoid network shuffles as much as possible. As vino pointed out, you need to reshuffle, if you need to group by key. Another frequent usecase is for a rebalancing of data in case of a heavy skew. Since neither applies to you, removing the keyby is

Job manager is failing to start with an S3 no key specified exception [1.7.2]

2019-12-09 Thread Kumar Bolar, Harshith
Hi all, I'm running a standalone Flink cluster with Zookeeper and S3 for high availability storage. All of a sudden, the job managers started failing with an S3 `UnrecoverableS3OperationException` error. Here is the full error trace - ``` java.lang.RuntimeException: org.apache.flink.runtime.cl

Re: KeyBy/Rebalance overhead?

2019-12-09 Thread vino yang
Hi Komal, Actually, the main factor about choosing the type of the partition depends on your business logic. If you want to do some aggregation logic based on a group. You must choose KeyBy to guarantee the correctness semantics. Best, Vino Komal Mariam 于2019年12月9日周一 下午5:07写道: > Thank you @vin

Re: Sample Code for querying Flink's default metrics

2019-12-09 Thread vino yang
Hi Pankaj, > Is there any sample code for how to read such default metrics? Is there any way to query the default metrics, such as CPU usage and Memory, without using REST API or Reporters? What's your real requirement? Can you use code to call REST API? Why does it not match your requirements?

Re: KeyBy/Rebalance overhead?

2019-12-09 Thread Komal Mariam
Thank you @vino yang for the reply. I suspect keyBy will beneficial in those cases where my subsequent operators are computationally intensive. Their computation time being > than network reshuffling cost. Regards, Komal On Mon, 9 Dec 2019 at 15:23, vino yang wrote: > Hi Komal, > > KeyBy(Hash