Re: flink on kubernetes

2018-08-13 Thread Till Rohrmann
Hi Mingliang, I'm currently writing the updated documentation for Flink's job cluster container entrypoint. It should be ready later today. In the meantime, you can checkout the `flink-container` module and its subdirectories. They already contain some information in the README.md on how to create

Re: Skip event in case of key extraction exception

2018-08-13 Thread Dawid Wysakowicz
Hi, You cannot filter out events in the KeyExtractor. What you can do though is to move the conversion logic to e.g. flatMap function and emit only those events that where successfully converted. Then your KeyExtractor would be a single getter for the UUID. Best, Dawid On 13/08/18 07:45, Jayant Am

Re: 1.5.1

2018-08-13 Thread Juho Autio
I also have jobs failing on a daily basis with the error "Heartbeat of TaskManager with id timed out". I'm using Flink 1.5.2. Could anyone suggest how to debug possible causes? I already set these in flink-conf.yaml, but I'm still getting failures: heartbeat.interval: 1 heartbeat.timeout: 10

Flink socketTextStream UDP connection

2018-08-13 Thread Soheil Pourbafrani
Flink socketTextStream received data using the TCP protocol. Is there any way to get data using the UDP protocol?

Error with Cassandra

2018-08-13 Thread yuvraj singh
Hi , I am getting java.lang.NoClassDefFoundError: Could not initialize class com.datastax.driver.core.Cluster please help with it . my pom is 1.5.0 3.7.0 1.8 2.11 2.9.4 3.5.1 1.7.5 1.16.18 2.0.4.RELEASE 2.1.4.RELEASE UTF-8 3.11.3 3.5.1

Re: Flink socketTextStream UDP connection

2018-08-13 Thread Fabian Hueske
Hi, ExecutionEnvironment.socketTextStream is deprecated and it is very likely that it will be removed because of its limited use. I would recommend to have at the implementation of the SourceFunction [1] and adapt it to your needs. Best, Fabian [1] https://github.com/apache/flink/blob/master/fli

Flink keyed stream windows

2018-08-13 Thread Garvit Sharma
Hi, I am working on a use case where I have a stream of users active locations and I want to store(by hitting an HTTP API) the latest active location for each of the unique users every 30 minutes. Since I have a lot of unique users(rpm 1.5 million), how to use Flink's timed windows on keyed strea

Re: Flink keyed stream windows

2018-08-13 Thread Garvit Sharma
Clarification: Its 30 Seconds not 30 minutes. On Mon, Aug 13, 2018 at 3:20 PM Garvit Sharma wrote: > Hi, > > I am working on a use case where I have a stream of users active locations > and I want to store(by hitting an HTTP API) the latest active location for > each of the unique users every 30

Re: Error with Cassandra

2018-08-13 Thread Deepak Sharma
can you try with this? 3.3.0 On Mon, Aug 13, 2018 at 1:52 PM yuvraj singh <19yuvrajsing...@gmail.com> wrote: > Hi , > > I am getting > > java.lang.NoClassDefFoundError: Could not initialize class > com.datastax.driver.core.Cluster > > please help with it . > > my pom is > > > > 1.5.0 >

Re: How to do test in Flink?

2018-08-13 Thread Chang Liu
Hi Hequn, Thanks fr your reply. Just to understand, these harness tests and the example your provided is actually having the same concept of the integration test with the example given here (https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/testing.html#integration-testing

Re: How to do test in Flink?

2018-08-13 Thread Chang Liu
Hi Dawid, Many Thanks :) Best regards/祝好, Chang Liu 刘畅 > On 13 Aug 2018, at 09:21, Dawid Wysakowicz wrote: > > Hi Chang, > > Just to add to how you could test the function you've posted. The Collector > is an interface so you can just implement a stub that will keep the results > in e.g.

Re: Select node for a Flink DataStream execution

2018-08-13 Thread Rafael Leira Osuna
Thanks for your replay Vino, I've checked your solution and it may solve my requirements. However it presents a newer issue: How can 2 different jobs interact with?As said, I'm new at this, and all I know is the basis to create and interconect diffent datastreams over the same job / java.class. I k

Re: How to do test in Flink?

2018-08-13 Thread Chang Liu
And another question: which library should I include in order to use these harnesses? I do have this flink-test-utils_2.11 in my pom, but I cannot find the harnesses. I also have the following in my pom: flink-core flink-clients_2.11 flink-scala_2.11 flink-streaming-java_2.11 flink-streaming-jav

Re: Flink keyed stream windows

2018-08-13 Thread vino yang
Hi Garvit, Please refer to the Flink official documentation for the window description. [1] In this scenario, you should use Tumbling Windows. [2] If you want to call your own API to handle the window, you can extend the process window function to achieve your needs.[3] [1]: https://ci.apache.org

Re: Select node for a Flink DataStream execution

2018-08-13 Thread vino yang
Hi Rafael, Flink does not support the interaction of DataStream in two Jobs. I don't know what your scene is. Usually if you need two stream interactions, you can import them into the same job. You can do this through the DataStream join/connect API.[1] [1]: https://ci.apache.org/projects/flink/f

Re: How to do test in Flink?

2018-08-13 Thread Hequn Cheng
Hi Change, Try org.apache.flink flink-streaming-java_2.11 ${flink.version} test-jar test . On Mon, Aug 13, 2018 at 6:42 PM, Chang Liu wrote: > And another question: which library should I include in order to use these > harnesses? I do have this flink-test-utils_2.11 in my pom, but I cannot >

Re: Select node for a Flink DataStream execution

2018-08-13 Thread Rafael Leira Osuna
Thanks Vino, As an example of what a we need: We want 2 different tasks executed on 2 different nodes and choose which one is going to execute what. Given what you have taught me:- A job can be assigned to a node in 1.6 with yarn- A set of tasks (datastream) are executed inside a job- Two different

Re: Select node for a Flink DataStream execution

2018-08-13 Thread vino yang
Hi Rafael, One thing in your needs is achievable : *want 2 different tasks executed on 2 different nodes*. A Node sets a Slot, disables the operator chain, but you can't choose which node to run which task. I don't understand why you need to artificially specify this mapping. Do you have a very di

Re: Select node for a Flink DataStream execution

2018-08-13 Thread Rafael Leira Osuna
Hi Vino, The cluster is not completely homogeneous in cpu/memory but quite similar. The real issue comes from the networking (Which is what we want to test).The interfaces on each node varies from: 100Mb/s, to 1Gb/s, 10Gb/s, 25Gb/s, 40Gb/s and 100Gb/s (all Ethernet). And each node have more than on

Kerberos Configuration Does Not Apply To Krb5LoginModule

2018-08-13 Thread Paul Lam
Hi, I built Flink from the latest 1.5.x source code, and got some strange outputs from the command line when submitting a Flink job to the YARN cluster. 2018-08-13 19:29:47,325 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - YARN application has been deployed successful

Flink 1.6 ExecutionJobVertex.getTaskInformationOrBlobKey OutOfMemoryError

2018-08-13 Thread 杨力
I used to runFlink SQL in streaming mode with more than 70 sqls in version 1.4. With so many sqls loaded, akka.framesize has to be set to 200 MB to submit the job. When I am trying to run the job with flink 1.6.0, the HTTP-based job submission works perfectly but an OutOfMemoryError is thrown when

Introduce Barriers in stream source

2018-08-13 Thread Darshan Singh
Hi, I am implementing a source and I want to use checkpointing and would like to restore the job from these external checkpoints. I used Kafka for my tests and it worked fine. However, I would like to know if I have my own source what do I need to do. I am sure that I will need to implement Check

Re: JDBCInputFormat and SplitDataProperties

2018-08-13 Thread Fabian Hueske
Hi Alexis, Yes, the job cannot be executed until the required number of processing slots becomes available. IIRC, there is a timeout and a job gets canceled once the waiting time exceeds the threshold. Best, Fabian 2018-08-10 15:35 GMT+02:00 Alexis Sarda : > It ended up being a wrong configurat

Re: Yahoo Streaming Benchmark on a Flink 1.5 cluster

2018-08-13 Thread Chesnay Schepler
@vino Please take a deeper look at the question before answering them. The benchmark implements custom operators that make use of several non-Public APIs, so API backwards compatibility simply doesn't apply here. Updating the benchmark will require a bit of work. Below are some hints i gather

Re: Tuning checkpoint

2018-08-13 Thread Fabian Hueske
Hi Mingliang, let me answer your second question first: > Another question is about the alignment buffer, I thought it was only used for multiple input stream cases. But for keyed process function , what is actually aligned? When a task sends records to multiple downstream tasks (task not operat

Re: Introduce Barriers in stream source

2018-08-13 Thread Fabian Hueske
Hi, It is sufficient to implement the CheckpointedFunction interface. Since SourceFunctions emit records in a separate thread, you need to ensure that not record is emitted while the shapshotState method is called. Flink provides a lock to synchronize data emission and state snapshotting. See the

Re: Kerberos Configuration Does Not Apply To Krb5LoginModule

2018-08-13 Thread Fabian Hueske
Hi Paul, Maybe Aljoscha (in CC) can help you with this question. AFAIK, he has some experience with Flink and Kerberos. Best, Fabian 2018-08-13 14:51 GMT+02:00 Paul Lam : > Hi, > > I built Flink from the latest 1.5.x source code, and got some strange > outputs from the command line when submitt

[1.6.0] Read content of Csv file with new table connectors

2018-08-13 Thread françois lacombe
Hi all, Following the relase of Flink 1.6.0, I'm trying to setup a simple csv file reader as to load its content in a sql table. I'm using the new table connectors and this code recommended in Jira FLINK-9813 and this doc : https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/table/conn

Re: Flink keyed stream windows

2018-08-13 Thread Ken Krugler
Hi Garvit, One other point - once you start making HTTP requests, you likely want to use an AsyncFunction to avoid the inefficiencies of your process spending most of its time waiting for the remote server to handle the request. Which means emitting the results (user + active location) from the

Limit on number of files to read for Dataset

2018-08-13 Thread Darshan Singh
Hi Guys, Is there a limit on number of files flink dataset can read? My question is will there be any sort of issues if I have say millions of files to read to create single dataset. Thanks

RE: flink telemetry/metrics

2018-08-13 Thread John O
I have tried two reporter types (Graphite, JMX) Graphite metrics.reporters: grph metrics.reporter.grph.class: org.apache.flink.metrics.graphite.GraphiteReporter metrics.reporter.grph.host: metrics.reporter.grph.port: 2003 metrics.reporter.grph.protocol: TCP What I see on graphite

Hive Integration

2018-08-13 Thread yuvraj singh
I want to know ,if FLink have support for hive . Thanks Yubraj Singh

Managed Keyed state update

2018-08-13 Thread Alexey Trenikhun
Let’s say I have Managed Keyed state - MapState> x, I initialize for state for “k0” - x.put(“k0”, new Tuple2<>(“a”, “b”)); Later I retried state Tuple2 v = x.get(“k0”); and change value: v.f0=“U”;, does it make state ‘dirty’? In other words, do I need to call x.put(“k0”, v) again or change will

Re: UTF-16 support for TextInputFormat

2018-08-13 Thread David Dreyfus
Hi Fabian, I've added FLINK-10134. FLINK-10134 . I'm not sure you'd consider it a blocker or that I've identified the right component. I'm afraid I don't have the bandwidth or knowledge to make the kind of pull request you really need. I do hope m

Re: cannot find TwoInputStreamOperatorTestHarness after upgrade to Flink 1.6.0

2018-08-13 Thread Dmitry Minaev
Thank you for your help, vino. I've resolved it, the issue was on my side, I forgot to include flink-streaming-java_2.11 with a type test-jar like the following: ``` org.apache.flink *flink-streaming-java_${scala.binary.version}* ${flink.version} *test-jar* ``` Once I included it, the

Re: Managed Keyed state update

2018-08-13 Thread Renjie Liu
Hi, Alexey: It depends on the state backend you use. If you use heap memory backend, then you don't need to do put again. However, if you use rocksdb state backend, then you need to do the put again so that it will be saved by the checkpoint. On Tue, Aug 14, 2018 at 4:58 AM Alexey Trenikhun wrote

Re: Hive Integration

2018-08-13 Thread Renjie Liu
Hi, yuvraj: Do you mean querying hive with sql? Or anything else? On Tue, Aug 14, 2018 at 3:52 AM yuvraj singh <19yuvrajsing...@gmail.com> wrote: > I want to know ,if FLink have support for hive . > > Thanks > Yubraj Singh > -- Liu, Renjie Software Engineer, MVAD

Re: Hive Integration

2018-08-13 Thread Will Du
This is a missing piece from Flink. Link now does not provide spark sql like integration to Hive. However, you can write ORC/Avro format which can read by Hive later. Thanks, Will > On Aug 13, 2018, at 7:34 PM, Renjie Liu wrote: > > Hi, yuvraj: > > Do you mean querying hive with sql? Or anyt

Re: Managed Keyed state update

2018-08-13 Thread Alexey Trenikhun
Clear. Thank you Get Outlook for iOS From: Renjie Liu Sent: Monday, August 13, 2018 4:33 PM To: Alexey Trenikhun Cc: user@flink.apache.org Subject: Re: Managed Keyed state update Hi, Alexey: It depends on the state backend you use. If you

Re: Flink 1.6 ExecutionJobVertex.getTaskInformationOrBlobKey OutOfMemoryError

2018-08-13 Thread Hequn Cheng
Hi, Have you ever increased the memory of job master? If you run a flink job on yarn, you can increase job master's memory by "-yjm 1024m"[1]. Best, Hequn [1] https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/yarn_setup.html#run-a-flink-job-on-yarn On Mon, Aug 13, 2018 at 10

Re: Flink 1.6 ExecutionJobVertex.getTaskInformationOrBlobKey OutOfMemoryError

2018-08-13 Thread 杨力
Thanks for the tip! It works. I forgot the job manager. Hequn Cheng 于 2018年8月14日周二 上午9:15写道: > Hi, > > Have you ever increased the memory of job master? > If you run a flink job on yarn, you can increase job master's memory by > "-yjm 1024m"[1]. > > Best, Hequn > > [1] > https://ci.apache.org/p

Re: Tuning checkpoint

2018-08-13 Thread 祁明良
Thank you for this great answer, Fabian. Regarding the yarn JVM heap size, I tried to change containerized.heap-cutoff-ratio: 0.25 And it somehow looks like working, but the actually memory needed for rocksdb still looks like a blackbox to me. I see there’s already a JIRA ticket talking about t

Re: Tuning checkpoint

2018-08-13 Thread Hequn Cheng
Hi mingliang, Considering your first question. I answered it on stack overflow[1]. Hope it helps. Best, Hequn [1] https://stackoverflow.com/questions/51832577/what-may-probably-cause-large-alignment-duration-for-flink-job On Tue, Aug 14, 2018 at 10:16 AM, 祁明良 wrote: > Thank you for this grea

Re: cannot find TwoInputStreamOperatorTestHarness after upgrade to Flink 1.6.0

2018-08-13 Thread vino yang
OK, it sounds great. Dmitry Minaev 于2018年8月14日周二 上午7:28写道: > Thank you for your help, vino. > > I've resolved it, the issue was on my side, I forgot to > include flink-streaming-java_2.11 with a type test-jar like the following: > > ``` > > org.apache.flink > *flink-streaming-java_${scala.b

Re: Limit on number of files to read for Dataset

2018-08-13 Thread vino yang
Hi Darshan, In a distributed scenario, there is no limit in theory, but there are still some real-world conditions that will cause some constraints, such as the size of your individual files, the memory size of your TM configuration, and so on. In addition, your "single" here is logical or physica

Flink SQL does not support rename after cast type

2018-08-13 Thread 徐涛
Hi All, I am working on a project based on Flink SQL, but found that I can`t rename column after casting, the code is as below: cast(json_type as INTEGER) as xxx And the following exception is reported: org.apache.calcite.runtime.CalciteContextException: Fr

Re: Flink SQL does not support rename after cast type

2018-08-13 Thread Hequn Cheng
Hi Henry, Flink does support rename column after casting. The exception is not caused by cast. It is caused by mixing of types, for example, the query > "CASE 1 WHEN 1 THEN *true* WHEN 2 THEN *'string'* ELSE NULL END" will throw the same exception since type of true and 'string' are not same.

Re: Standalone cluster instability

2018-08-13 Thread Shailesh Jain
Hi Piotrek, Thanks for your reply. I checked through the syslogs for that time, and I see this: Aug 8 13:20:52 smoketest kernel: [1786160.856662] Out of memory: Kill process 2305 (java) score 468 or sacrifice child Aug 8 13:20:52 smoketest kernel: [1786160.859091] Killed process 2305 (java) tot

Re: Limit on number of files to read for Dataset

2018-08-13 Thread Jörn Franke
It causes more overhead (processes etc) which might make it slower. Furthermore if you have them stored on HDFS then the bottleneck is the namenode which will have to answer millions of requests. The latter point will change in future Hadoop versions with http://ozone.hadoop.apache.org/ > On 1

Flink REST api for cancel with savepoint on yarn

2018-08-13 Thread vipul singh
Hello, I have a question about flink 1.5/1.6 REST endpoints. I was trying to see how the rest endpoints have changed wrt to cancelling with savepoint; it seems like now to cancel with savepoint one need to use POST api / jobs/:jobid/savepoints