Flink SQL

2018-11-29 Thread Steve Bistline
Hi, I have a silly question about Flink SQL that I cannot seem to find a clear answer to. If I have the following code. Will the "result" from the sql SELECT statement only return and the data then be written to S3 if and only if the statement returns data that matches the criteria? Does "nothing

Flink window operation based on event time is triggered when watermark is less than the end of window ends

2018-11-29 Thread X L
Please refer to the stackoverflow . Thanks. -- Thanks. · Lx wlxwol...@gmail.com

Re: how to override s3 key config in flink job

2018-11-29 Thread Tony Wei
Hi Andrey, Thanks for your detailed answer, and I have created a JIRA issue to discuss it [1]. Please check the description and help me to fill the details, like component/s, since I'm not sure where it should be put. Thank you very much. Best, Tony Wei [1] https://issues.apache.org/jira/browse/

long lived standalone job session cluster in kubernetes

2018-11-29 Thread Derek VerLee
I'm looking at the job cluster mode, it looks great and I and considering migrating our jobs off our "legacy" session cluster and into Kubernetes. I do need to ask some questions because I haven't found a lot of details in the documentation about how it works y

Re: Stream in loop and not getting to sink (Parquet writer )

2018-11-29 Thread Avi Levi
Thanks a lot Kostas, but the file not created . what am I doing wrong? BTW how can you set the encoding etc' in Flink's Avro - Parquet writer? object Tester extends App { val env = StreamExecutionEnvironment.getExecutionEnvironment def now = System.currentTimeMillis() val path = new Path(s"t

Re: SQL Query named operator exceeds 80 characters

2018-11-29 Thread shkob1
OK, thanks for the help -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Table exception

2018-11-29 Thread Timo Walther
Hi Michael, this dependency issue should have been fixed recently. Which Flink version are you using? Regards, Timo Am 29.11.18 um 16:01 schrieb TechnoMage: I have a simple test for looking at Flink SQL and hit an exception reported as a bug.  I wonder though if it is a missing dependency.

Table exception

2018-11-29 Thread TechnoMage
I have a simple test for looking at Flink SQL and hit an exception reported as a bug. I wonder though if it is a missing dependency. Michael Error in test org.apache.flink.runtime.client.JobExecutionException: Job execution failed. at org.apache.flink.runtime.jobmaster.JobResult.t

Re: number of files in checkpoint directory grows endlessly

2018-11-29 Thread Andrey Zagrebin
Could you share the logs to check possible failures to subsume or remove previous checkpoints? What is the sizes of the files? It can help to understand how compaction goes. Could you also provide more details how you setup TtlDb with Flink? Best, Andrey > On 29 Nov 2018, at 11:34, Andrey Zagreb

Re: Dataset column statistics

2018-11-29 Thread Fabian Hueske
I'd try to tune it in a single query. If that does not work, go for as few queries as possible, splitting by column for better projection push-down. This is the first time I hear somebody requesting ANALYZE TABLE. I don't see a reason why it shouldn't be added in the future. Am Do., 29. Nov. 20

Will Zookeeper HA work when the cluster is run in standalonejob mode?

2018-11-29 Thread Sergei Poganshev
If I run the clustiner in "standalonejob" mode (by providing the job arguments to the job manager upon starting it) and configure HA using Zookeeper will the job restore correctly after the job manager restarts with the same "standalonejob" arguments? Will restart the job (due to job arguments pas

Re: Dulicated messages in kafka sink topic using flink cancel-with-savepoint operation

2018-11-29 Thread Piotr Nowojski
Hi Nastaran, When you are checking for duplicated messages, are you reading from kafka using `read_commited` mode (this is not the default value)? https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-producer-partitioning-scheme > Semantic.EXACTLY_ONCE: uses Ka

Re: Check-pointing error

2018-11-29 Thread Chesnay Schepler
Would it be possible for you to try this with 1.6-SNAPSHOT? This issue may have been fixed with https://issues.apache.org/jira/browse/FLINK-10839. On 29.11.2018 12:11, Felipe Quirce wrote: Hi I'm using the flink 1.6.2, and full stack trace is java.io.IOException: Exception while applying Aggr

Re: Stream in loop and not getting to sink (Parquet writer )

2018-11-29 Thread Kostas Kloudas
Sorry, previously I got confused and I assumed you were using Flink's StreamingFileSink. Could you try to use Flink's Avro - Parquet writer? StreamingFileSink.forBulkFormat( Path...(MY_PATH), ParquetAvroWriters.forGenericRecord(MY_SCHEMA)) .build() Cheers, Kostas On Thu, Nov 29, 20

Re: Stream in loop and not getting to sink (Parquet writer )

2018-11-29 Thread Avi Levi
Thanks. yes, the *env.execute* is called and enabled checkpoints I think the problem is where to place the *writer.close *to flush the cache If I'll place on the sink after the write event e.g addSink{ writer.write writer.close } in this case only the first record will be included in the file but n

Re: Questions about UDTF in flink SQL

2018-11-29 Thread Timo Walther
Hi Wangsan, currently, UDFs have very strict result type assumptions. This is necessary to determine the serializers for the cluster. There were multiple requests for more flexible handling of types in UDFs. Please have a look at: - [FLINK-7358] Add implicitly converts support for User-define

Re: Check-pointing error

2018-11-29 Thread Felipe Quirce
Hi I'm using the flink 1.6.2, and full stack trace is java.io.IOException: Exception while applying AggregateFunction in aggregating state 4308 at org.apache.flink.runtime.state.heap.HeapAggregatingState.add(HeapAggregatingState.java:107) 4309 at org.apache.flink.streaming.runti

Re: Dataset column statistics

2018-11-29 Thread Flavio Pompermaier
What do you advice to compute column stats? Should I run multiple job (one per column) or try to compute all at once? Are you ever going to consider supporting ANALYZE TABLE (like in Hive or Spark) in Flink Table API? Best, Flavio On Thu, Nov 29, 2018 at 9:45 AM Fabian Hueske wrote: > Hi, > >

Re: Check-pointing error

2018-11-29 Thread Chesnay Schepler
Please provide the full exception stack trace and version of Flink that you are using. On 29.11.2018 11:55, Felipe Quirce wrote: Hi, I have found a problem during the checkpoint. Could anyone help me or help me to debug it? Exception: 1804 2018-11-29 11:31:00,448 INFO org.apache.flink

Re: Looking for relevant sources related to connecting Apache Flink and Edgent.

2018-11-29 Thread Felipe Gutierrez
thanks Kostas for the quick reply, yes. It is related to my previous question. When you said "But if you know what operation to push down" -> This is what I am trying to search on Flink code. I want to know the operation on the fly. The component on Flink that will say to me that there is a filte

Check-pointing error

2018-11-29 Thread Felipe Quirce
Hi, I have found a problem during the checkpoint. Could anyone help me or help me to debug it? Exception: > 1804 2018-11-29 11:31:00,448 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph > - keyedstats-processor-165 -> map2alert-165 -> Process -> Sink: > sink-level165 (1/2) (d

Re: SQL Query named operator exceeds 80 characters

2018-11-29 Thread Chesnay Schepler
This is a safeguard in the metric system to prevent extremely long names (as these could cause the reporting to fail); so long as the prefix is unique you can safely ignore this warning. On 29.11.2018 10:40, Timo Walther wrote: Unfortunetely, renaming of operators is not supported so far. We

Re: Checkpointing to gcs taking too long

2018-11-29 Thread Chesnay Schepler
Please provide the full Exception stack trace and the configuration of your job (parallelism, number of stateful operators). Have you tried using the gcs-connector in isolation? This may not be an issue with Flink. On 28.11.2018 07:01, prakhar_mathur wrote: I am trying to run flink on kubernet

Re: Looking for relevant sources related to connecting Apache Flink and Edgent.

2018-11-29 Thread Kostas Kloudas
Hi again, I forgot to say that, unfortunately, I am not familiar with Apache Edgent, but if you can write your filter in Edgent's programming model, Then you can push your data from Edgent to a third party storage system (e.g. Kafka, HDFS, etc) and use Flink's connectors, instead of having to impl

Re: Looking for relevant sources related to connecting Apache Flink and Edgent.

2018-11-29 Thread Kostas Kloudas
Hi Felipe, This seems related to your previous question about a custom scheduler that knows which task to run on which machine. As Chesnay said, this is a rather involved and laborious task, if you want to do it as a general framework. But if you know what operation to push down, then why not dec

Re: number of files in checkpoint directory grows endlessly

2018-11-29 Thread Andrey Zagrebin
If you use incremental checkpoints, state backend stores raw RocksDB SST files which represent all state data. Each checkpoint adds SST files with new updates which are not present in previous checkpoint, basically their difference. One of the following could be happening: - old keys are not exp

Re: Memory does not be released after job cancellation

2018-11-29 Thread Kostas Kloudas
Hi Nastaran, Can you specify what more information do you need? >From the discussion that you posted: 1) If you have batch jobs, then Flink does its own memory management (outside the heap, so it is not subject to JVM's GC) and although when you cancel the job, you do not see the memory being

AW: number of files in checkpoint directory grows endlessly

2018-11-29 Thread Bernd.Winterstein
Hi We use Flink 1..6.2. As for the checkpoint directory there is only one chk-xxx directory. Therefore if would expect only one checkpoint remains. The value of 'state.checkpoints.num-retained’ is not set explicitly. The problem is not the number of checkpoints but the number of files in the "sh

Re: number of files in checkpoint directory grows endlessly

2018-11-29 Thread Andrey Zagrebin
Hi Bernd, Did you change 'state.checkpoints.num-retained’ in flink-conf.yaml? By default, only one checkpoint should be retained. Which version of Flink do you use? Can you check Job Master logs whether you see there warning like this: `Fail to subsume the old checkpoint`? Best, Andrey > On 29

Re: SQL Query named operator exceeds 80 characters

2018-11-29 Thread Timo Walther
Unfortunetely, renaming of operators is not supported so far. We are currently thinking about a way of having fine-grained control about properties of SQL operators but this is in an early design phase and might take a while. Regards, Timo Am 29.11.18 um 10:32 schrieb Kostas Kloudas: Hi, I

Re: SQL Query named operator exceeds 80 characters

2018-11-29 Thread Kostas Kloudas
Hi, I think that you cannot set it explicitly. The reason that I would say that is because SQL query gets parsed through Calcite and then get translated to a DataStream program through a process that is rather opaque to the user. That said, I also cc'ed Fabian and Timo who know more on the topic

Looking for relevant sources related to connecting Apache Flink and Edgent.

2018-11-29 Thread Felipe Gutierrez
Hi, I am trying to design a little prototype with Flink and Apache Edgent ( http://edgent.apache.org/) and I would like some help on the direction for it. I am running Flink at my laptop and Edgent on my Raspberry Pi with a simple filter for a proximity sensor ( https://github.com/felipegutierrez/

Re: number of files in checkpoint directory grows endlessly

2018-11-29 Thread Kostas Kloudas
Hi Bernd, I think the Till, Stefan or Stephan (cc'ed) are the best to answer your question. Cheers, Kostas

Re: Stream in loop and not getting to sink (Parquet writer )

2018-11-29 Thread Kostas Kloudas
Hi again Avi, In the first example that you posted (the one with the Kafka source), do you call env.execute()? Cheers, Kostas On Thu, Nov 29, 2018 at 10:01 AM Kostas Kloudas wrote: > Hi Avi, > > In the last snippet that you posted, you have not activated checkpoints. > > Checkpoints are needed

Re: Stream in loop and not getting to sink (Parquet writer )

2018-11-29 Thread Kostas Kloudas
Hi Avi, In the last snippet that you posted, you have not activated checkpoints. Checkpoints are needed for the StreamingFileSink to produce results, especially in the case of BulkWriters (like Parquet) where the part file is rolled upon reception of a checkpoint and the part is finalised (i.e. "

Re: Dataset column statistics

2018-11-29 Thread Fabian Hueske
Hi, You could try to enable object reuse. Alternatively you can give more heap memory or fine tune the GC parameters. I would not consider it a bug in Flink, but might be something that could be improved. Fabian Am Mi., 28. Nov. 2018 um 18:19 Uhr schrieb Flavio Pompermaier < pomperma...@okkam.

how to stop hung job manager

2018-11-29 Thread Ali, Kasif
Hello, Is there any config which can be set to stop/kill the job manager if the client has crashed due to some exception/error? We see this in our application where client has crashed due to some issue like making rest call or the application crash. In such cases job manager continue to run ex

number of files in checkpoint directory grows endlessly

2018-11-29 Thread Bernd.Winterstein
I have a flink job running with the following settings: * CheckpointingMode.EXACTLY_ONCE * RocksDB backend (Modified with TtlDB usage) * CheckpointConfig.ExternalizedCheckpointCleanup.DELETE_ON_CANCELLATION * 60 sec interval * Asnyc snapshots * Incremental checkp