Re: Clean way of expressing UNNEST operations

2019-06-03 Thread JingsongLee
Hi @Piyush Narang I tried again, if the type of advertiser_event.products is derived correctly. (ObjectTypeInfo(RowTypeInfo(fields...))) It will work. See more information in calcite code: SqlUnnestOperator.inferReturnType So I think maybe your type is not passed to the engine correctly. Best,

Re: Is there any way to get the ExecutionGraph of a Job

2019-06-03 Thread Debasish Ghosh
Thanks a lot for the clarification. On Tue, Jun 4, 2019 at 8:37 AM Yun Gao wrote: > Hi Debasish, > > You cannot get ExecutionGraph since it resides in the JobMaster, > which is not in the same process with Client. > > In my opinion, currently you may not be able to stop the job or quer

Re: Clean way of expressing UNNEST operations

2019-06-03 Thread JingsongLee
Hi @Piyush Narang It seems that Calcite's type inference is not perfect, and the fields of return type can not be inferred in UNNEST. (Errors were reported during the Calcite Validate phase.) But UDTF supports this usage, and if it's convenient, you might consider writing a UDTF with similar UN

Re: Flink job server with HA

2019-06-03 Thread Xintong Song
If that is the case, then I would suggest you to check the following two things: 1. Is the HA mode configured properly in Flink configuration? There should be a config option "high-availability" in your flink-conf.yarml. If not configured, the default value would be "NONE". 2. It "ClassPathJobGraph

Re: Is there any way to get the ExecutionGraph of a Job

2019-06-03 Thread Yun Gao
Hi Debasish, You cannot get ExecutionGraph since it resides in the JobMaster, which is not in the same process with Client. In my opinion, currently you may not be able to stop the job or query the job status using the Client API. The community is currently trying to enhance the Clie

Read file from S3 and write to kafka

2019-06-03 Thread anurag
Hi All, I am searched a lot on google but could not find how I can achieve writing a flink function which reads a file in S3 and for each line in the file write a message to kafka. Thanks a lot , much appreciated. I am sorry if I did not searched properly. Thanks, Anurag

Re: Flink job server with HA

2019-06-03 Thread Boris Lublinsky
I am running on k8 Job master runs as a deployment of 1, so just killing a pod restarts it Boris Lublinsky FDP Architect boris.lublin...@lightbend.com https://www.lightbend.com/ > On Jun 3, 2019, at 9:46 PM, Xintong Song wrote: > > So here are my questions: > 1. What environment do you run Flin

Re: Flink job server with HA

2019-06-03 Thread Xintong Song
So here are my questions: 1. What environment do you run Flink in? Is it locally, on Yarn or Mesos? 2. How do you trigger "restart a Job Master"? Thank you~ Xintong Song On Tue, Jun 4, 2019 at 10:35 AM Boris Lublinsky < boris.lublin...@lightbend.com> wrote: > Thanks, > Thats what I thought in

Re: Flink job server with HA

2019-06-03 Thread Boris Lublinsky
Thanks, Thats what I thought initially. The issue is that because of this, during restart, it does not know which job was running before (it is obtained from submitted job graph store). Because this is empty, there is no restarted jobs and the cluster does not even try to restore checkpoints. I c

Re: Flink job server with HA

2019-06-03 Thread Xintong Song
Hi Boris, I think what you described that putJobGraph is not invoked in Flink job cluster is by design and should not cause a failure of job recovering. For a Flink job cluster, there is only one job graph to execute. Instead of uploading job graph to an already running cluster (like in a session

Is it possible to configure Flink pre-flight type serialization scanning?

2019-06-03 Thread John Tipper
Flink performs significant scanning during the pre-flight phase of a Flink application (https://ci.apache.org/projects/flink/flink-docs-stable/dev/types_serialization.html). The act of creating sources, operators and sinks causes Flink to scan the data types of the objects that are used within

Controlling the amount of checkpoint files

2019-06-03 Thread Boris Lublinsky
Is there a way to limit the amount of checkpoint file? The parameter that I set : state.checkpoints.num-retained: 5 does not seem to have any effect. Is there anything else I can set to prevent infinite growth of checkpointing info? Boris Lublinsky FDP Architect boris.lublin...@lightbend.com http

Issue using Flink on EMR

2019-06-03 Thread Ayush Verma
Hello, We have a Flink on EMR setup following this guide . YARN, apparently changes the io.tmp.dirs property to /mnt/yarn & /mnt1/yarn. When using these directories, the flink job gets the following error. 2019-05-22 12:23:12,515 INF

Clean way of expressing UNNEST operations

2019-06-03 Thread Piyush Narang
Hi folks, I’m using the SQL API and trying to figure out the best way to unnest and operate on some data. My data is structured as follows: Table: Advertiser_event: * Partnered: Int * Products: Array< Row< price: Double, quantity: Int, … > > * … I’m trying to unnest the products arr

Re: Pipeline TimeOut with Beam SideInput

2019-06-03 Thread bjbq4d
On 2019/05/30 14:12:35, bjb...@gmail.com wrote: > Hi everyone, > > I've made a Beam pipeline that makes use of a SideInput which in my case is a > Map of key/values. I'm running Flink (1.7.1) on yarn (hadoop 2.6.0). I've > found that if my map is small enough everything works fine but if I

Flink job server with HA

2019-06-03 Thread Boris Lublinsky
I am trying to experiment with Flink Job server with HA and I am noticing, that in this case method putJobGraph in the class SubmittedJobGraphStore Is never invoked. (I can see that it is invoked in the case of session cluster when a job is added) As a result, when I am trying to restart a Job Ma

Re: Restore state class not found exception in 1.8

2019-06-03 Thread Lasse Nedergaard
Hi Gordon To us it looks like the env.registerclass is needed when we write the save point. If we have an existing save point without the classes registered it doesn’t work. We have only seen the exception in our own sink that store pending data in operator state through CheckpointedFunction

Re: count(DISTINCT) in flink SQL

2019-06-03 Thread Fabian Hueske
Hi Vinod, IIRC, support for DISTINCT aggregates was added in Flink 1.6.0 (released August, 9th 2018) [1]. Also note that by default, this query will accumulate more and more state, i.e., for each grouping key it will hold all unique event_ids. You could configure an idle state retention time to cl

Re: Restore state class not found exception in 1.8

2019-06-03 Thread Tzu-Li (Gordon) Tai
Hi Lasse, This is indeed a bit odd. I'll need to reproduce this locally before I can figure out the root problem. Please bear with me for a while, will get back to you on this. Meanwhile, you mentioned that you only had some jobs failing with the posted exception. Did you figure out any more deta

Re: Table API and nested JSON

2019-06-03 Thread Pramit Vamsi
I looked at the implementation in TableSourceUtil's resolveInputField method. It seems like the fieldName lookup happens at the json top level and does not look at nested structure. The returnType references RowTypeInfo which reflects the correct nested schema. Any pointers here? On Mon, Jun 3,

Is there any way to get the ExecutionGraph of a Job

2019-06-03 Thread Debasish Ghosh
Hello - I am trying to build an API that can start, control and stop a Flink Job programmatically. When I do an executionEnv.execute() where executionEnv is an StreamExecutionEnvironment, I get back a JobExecutionResult. I find no way to stop the job (say based on some timeout) from a JobExecutio

Re: How to build dependencies and connections between stream jobs?

2019-06-03 Thread Konstantin Knauf
Hi Henry, Apache Kafka or other message queue like Apache Pulsar or AWS Kinesis are in general the most common way to connect multiple streaming jobs. The dependencies between streaming jobs are in my experience of a different nature though. For batch jobs, it makes sense to schedule one after the