Re: Backquote in SQL dialect

2020-09-17 Thread Timo Walther
Hi Satyam, this has historical reasons. In the beginning all SQL queries were embedded in Java programs and thus Java strings. So single quote was handy for declaring SQL strings in a Java string and backticks for escaping keywords. But I agree that we should make this configurable. Feel free

Automatically restore from checkpoint

2020-09-17 Thread Arpith P
Hi, I'm running Flink job in distributed mode deployed in Yarn; I've enabled externalized checkpoint to save in Hdfs, but I don't have access to read checkpoints folder. To restart Flink job from the last saved checkpoint is it possible to do without passing "-s :checkpointPath". If this is not po

Re: I hit a bad jobmanager address when trying to use Flink SQL Client

2020-09-17 Thread Dan Hill
Hi Robert! Sorry for the delay. This worked! Thanks! I used slightly different deployment parameters. deployment: gateway-address: flink-jobmanager gateway-port: 8081 On Mon, Sep 14, 2020 at 6:21 AM Robert Metzger wrote: > Hi Dan, > > I don't think the SQL Client officially supports run

Re: Unit Test for KeyedProcessFunction with out-of-core state

2020-09-17 Thread Alexey Trenikhun
Unfortunately it looks like impossible to change backend of AbstractStreamOperatorTestHarness without resorting to reflection, stateBackend initialized in constructor as `this.stateBackend = new MemoryStateBackend();`, since it is protected, I can change it in derived class, but checkpointStora

Re: Speeding up CoGroup in batch job

2020-09-17 Thread Ken Krugler
Hi Robert, Thanks for the input. I did increase the amount of managed memory, and confirmed that both SSDs (on each slave) are being used for temp data. I haven’t been able to figure out why the server CPU usage is low, but I did notice that it fluctuated from very low (10%) on up to 95+%, with

Re: FileSystemHaServices and BlobStore

2020-09-17 Thread Alexey Trenikhun
Hi Yang, I saw this FLIP, it is very good feature, I think overall for Kubernetes, it is preferred over “StatefulSet + PV + FileSystemHAService” approach, when it will be available we plan to use it. On other hand looks like FileSystemHAService is easier to implement, I thought about contributin

Backquote in SQL dialect

2020-09-17 Thread Satyam Shekhar
Hello, I have been happily using Flink as the SQL engine for running streaming and batch queries. I am curious to understand the rationale behind Flink using backticks (`) for quoting purposes instead of standard double quotes ("). Is double-quote reserved for some other usage? Regards, Satyam

Re: Flink Table SQL, Kafka, partitions and unnecessary shuffling

2020-09-17 Thread Dan Hill
Hi Godfrey! I'll describe the overall setup and then I'll describe the joins. One of the goals of my Flink jobs is to join incoming log records (User, Session, PageView, Requests, Insertions, Impressions, etc) and do useful things with the joined results. Input = Kafka. Value = batch log recor

Re: Error: Avro "CREATE TABLE" with nested rows and missing fields

2020-09-17 Thread Dan Hill
Thanks Dawid! You answered most of my questions: 1) Kafka to Flink - Is the most common practice to use the Confluent Schema Registry and then use ConfluentRegistryAvroDeserializationSchema? 2) Flink to State - great. 3) Flink to Avro file output - great. 4) Avro file output to Flink (batch) - Yes

Support for gRPC in Flink StateFun 2.x

2020-09-17 Thread Dalmo Cirne
Hi, In the latest Flink Forward, from April 2020, there were mentions that adding support to gRPC, in addition to HTTP, was in the works and would be implemented in the future. Looking into the flink-statefun repository on GitHub, one can see that ther

Re: Unknown datum type java.time.Instant: 2020-09-15T07:00:00Z

2020-09-17 Thread Dawid Wysakowicz
Thanks for the update. First of all, why did you decide to build your own DeserializationSchema instead of using ConfluentRegistryDeserializationSchema? Your implementation is quite inefficient you do deserialize > serialize > deserialize. Serialization/deserialization is usually one of the heavie

Re: sideOutputLateData doesn't work with map()

2020-09-17 Thread Chesnay Schepler
This is working as intended, but is admittedly inconvenient. The reason why the original version does not work is that the side-output is scoped to the DataStream that the process function creates; the Map function creates another DataStream though that does not retain the side-output of the pr

Re: Unknown datum type java.time.Instant: 2020-09-15T07:00:00Z

2020-09-17 Thread Lian Jiang
Piotr/Dawid, Thanks for the reply. FLINK-18223 seems not to related to this issue and I double checked that I am using Flink 1.11.0 instead of 1.10.0. My mistake. StreamExecutionEnvironment#getConfig()#enableObjectReuse()) solved the issue. I am not using ConfluentRegistryDeserializationSchema. I

Re: sideOutputLateData doesn't work with map()

2020-09-17 Thread Ori Popowski
Turns out that this is the way to solve this problem: val senv = StreamExecutionEnvironment.getExecutionEnvironment senv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime) val tag = OutputTag[Tuple1[Int]]("late") val stream = senv .addSource(new SourceFunction[Int] { override def run(

sideOutputLateData doesn't work with map()

2020-09-17 Thread Ori Popowski
Hi, I have this simple flow: val senv = StreamExecutionEnvironment.getExecutionEnvironment senv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime) val tag = OutputTag[Tuple1[Int]]("late") val stream = senv .addSource(new SourceFunction[Int] { override def run(ctx: SourceFunction.Sour

Re: Flink Table SQL, Kafka, partitions and unnecessary shuffling

2020-09-17 Thread godfrey he
Hi Dan, What kind of joins [1] you are using? Currently, only temporal join and join with table function do not reshuffle the input data in Table API and SQL, other joins always reshuffle the input data based on join keys. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table

Re: Error: Avro "CREATE TABLE" with nested rows and missing fields

2020-09-17 Thread Dawid Wysakowicz
Hi Dan, It depends which part of the system you have in mind. Generally though Avro itself does need the original schema of the record it was written with. There are a couple of alternatives. You have RegistryAvroDeserializationSchema for DataStream, which looks up the old schema in schema registr

Re: Flink multiple task managers setup

2020-09-17 Thread Yangze Guo
Sorry that the community decided to not maintain it anymore, you could take a look at [1]. [1] https://lists.apache.org/thread.html/r7693d0c06ac5ced9a34597c662bcf37b34ef8e799c32cc0edee373b2%40%3Cdev.flink.apache.org%3E Best, Yangze Guo On Thu, Sep 17, 2020 at 5:21 PM saksham sapra wrote: > > T

Re: Unknown datum type java.time.Instant: 2020-09-15T07:00:00Z

2020-09-17 Thread Dawid Wysakowicz
Hi, Could you share exactly how do you configure avro & kafka? Do you use Table API or DataStream API? Do you use the ConfluentRegistryDeserializationSchema that comes with Flink or did you built custom DeserializationSchema? Could you maybe share the code for instantiating the source with us? It

Re: Flink performance testing

2020-09-17 Thread Piotr Nowojski
Hi, But what are you asking for? Is it possible to do such benchmarks? Yes, it is possible. People are doing it all the time. Start a cluster, feed the data, measure the throughput (either via custom diagnostic operators, or via metrics [1]). Is there some framework to do it? Not that I know of.

Re: Flink multiple task managers setup

2020-09-17 Thread Yangze Guo
Hi, It seems you run it in Windows. In that case, only start-cluster.bat could be used. However, this script could only start one TM[1] no matter how you configure the slaves/workers. [1] https://github.com/apache/flink/blob/release-1.9/flink-dist/src/main/flink-bin/bin/start-cluster.bat Best,

Re: Flink multiple task managers setup

2020-09-17 Thread Yangze Guo
Hi, > I wasnt having "workers" file in conf/workers so i created one, but i have > "slaves" file in conf/workers, so i edited both two localhost like > screenshot given below : Yes, for 1.9.3, you need to edit the 'slaves' file. I think we need more information to figure out what happened. - W

Re: Maximum query and refresh rate for metrics from REST API

2020-09-17 Thread Chesnay Schepler
By default metrics are only updated every 10 seconds; this can be controlled via https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/config.html#metrics-fetcher-update-interval. On 9/17/2020 12:22 AM, Piper Piper wrote: Hello, What is the recommended way to get metrics (such as C

Re: Flink multiple task managers setup

2020-09-17 Thread Yangze Guo
Hi, >From my understanding, you want to set up a standalone cluster in your local machine. If that is the case, you could simply edit the $FLINK_DIST/conf/workers, in which each line represents a TM host. By default, there is only one TM in localhost. In your case, you could add a line 'localhost'

Flink multiple task managers setup

2020-09-17 Thread saksham sapra
Hi , I am unable to set two task managers in my local machine and neither any documentation provided for the same. I want to run a parallel job in two task managers using flink. kindly help me with the same, how can i set up in my local without using any zookeeper or something. Thanks & Regards