Re: Flink SQL update-mode set to retract in env file.

2019-09-26 Thread Terry Wang
Hi srikanth~ The Flink SQL update-mode is inferred from the target table type. For now, there are three StreamTableSink type, `AppendStreamTableSink` `UpsertStreamTableSink` and `RetractStreamTableSink`. If the target table is a type of Kafka which implments AppendStreamTableSink, the update-mo

CSV Table source as data-stream in environment file

2019-09-26 Thread Nishant Gupta
Hi Team, How do we define csv table source as a data-stream instead of data-set in environment file.? Whether or not i mention update-mode: append or not... I takes only csv file as data-set. Is there any detailed reference to environment file configuration where sinks and sources are defined.

Re: Flink SQL update-mode set to retract in env file.

2019-09-26 Thread srikanth flink
Hi Terry Wang, Thanks for quick reply. I would like to understand more on your line " If the target table is a type of Kafka which implments AppendStreamTableSink, the update-mode will be append only". If your statement defines retract mode could not be used for Kafka sinks as it implements Appen

Re: Multiple Job Managers in Flink HA Setup

2019-09-26 Thread Yang Wang
Hi Steven, I have test the standalone cluster on kubernetes with 2 jobmanager. Using active jobmanager webui or standby jobmanager webui to submit flink jobs could both work fine. So i think maybe the problem is about your HAProxy. Does your HAProxy will forward traffic to both active and standby

Problems with java.utils

2019-09-26 Thread Nicholas Walton
I’m having a problem using ArrayList in Scala . The code is below import org.apache.flink.core.fs._ import org.apache.flink.streaming.api._ import org.apache.flink.streaming.api.scala._ import org.apache.flink.table.api._ import org.apache.flink.table.api.scala._ import org.apache.flink.table.sink

Re: Challenges Deploying Flink With Savepoints On Kubernetes

2019-09-26 Thread Yang Wang
Hi, Aleksandar Savepoint option in standalone job cluster is optional. If you want to always recover from the latest checkpoint, just as Aleksandar and Yun Tang said you could use the high-availability configuration. Make sure the cluster-id is not changed, i think the job could recover both at ex

Re: Flink SQL update-mode set to retract in env file.

2019-09-26 Thread Terry Wang
Hi, Srikanth~ In your code, DataStream outStreamAgg = tableEnv.toRetractStream(resultTable, Row.class).map(t -> {}); has converted the resultTable into a DataStream that’s unrelated with tableApi, And the following code `outStreamAgg.addSink(…)` is just a normall stream write to a FlinkKafka

Re: Flink SQL update-mode set to retract in env file.

2019-09-26 Thread srikanth flink
Awesome, thanks! On Thu, Sep 26, 2019 at 5:50 PM Terry Wang wrote: > Hi, Srikanth~ > > In your code, > DataStream outStreamAgg = tableEnv.toRetractStream(resultTable, > Row.class).map(t -> {}); has converted the resultTable into a DataStream > that’s unrelated with tableApi, > And the following

Re: Problems with java.utils

2019-09-26 Thread Nicholas Walton
I’ve shrunk the problem down to a minimal size. The code package org.example import org.apache.flink.table.api._ import org.apache.http.HttpHost import java.util.ArrayList object foo { val httpHosts = new ArrayList[HttpHost] httpHosts.add(new HttpHost("samwise.local", 9200, "http")) } wi

Re: Challenges Deploying Flink With Savepoints On Kubernetes

2019-09-26 Thread Sean Hester
thanks to everyone for all the replies. i think the original concern here with "just" relying on the HA option is that there are some disaster recovery and data center migration use cases where the continuity of the job managers is difficult to preserve. but those are admittedly very edgy use case

Flink- Heap Space running out

2019-09-26 Thread Nishant Gupta
am running a query to join a stream and a table as below. It is running out of heap space. Even though it has enough heap space in flink cluster (60GB * 3) Is there an eviction strategy needed for this query ? *SELECT sourceKafka.* FROM sourceKafka INNER JOIN DefaulterTable ON sourceKafka.CC=Def

Re: Problems with java.utils

2019-09-26 Thread Dian Fu
Hi Nick, >> [error] ……/src/main/scala/org/example/Job.scala:30:13: object util is >> not a member of package org.apache.flink.table.api.java >> [error] import java.util.ArrayList The error message shows that it tries to find "util.ArrayList" under package "org.apache.flink.table.api.java".

Re: Problems with java.utils

2019-09-26 Thread Nicholas Walton
Dian That fixed the problem thanks you. It would appear that someone has taken it upon themselves to redefine part of the Java standard library in org.apache.flink.table.api._ NIck > On 26 Sep 2019, at 15:16, Dian Fu wrote: > > Hi Nick, > >>> [error] ……/src/main/scala/org/example/Job.sc

Re: Problems with java.utils

2019-09-26 Thread Dian Fu
Hi Nick, There is a package named "org.apache.flink.table.api.java" in flink and so the import of "org.apache.flink.table.api._" causes "org.apache.flink.table.api.java" imported. Then all the import of package starting with "java" such as "import java.util.ArrayList" will try to find the cla

Re: Flink- Heap Space running out

2019-09-26 Thread miki haiat
You can configure the task manager memory in the config.yaml file. What is the current configuration? On Thu, Sep 26, 2019, 17:14 Nishant Gupta wrote: > am running a query to join a stream and a table as below. It is running > out of heap space. Even though it has enough heap space in flink clu

Re: Flink- Heap Space running out

2019-09-26 Thread Fabian Hueske
Hi, I don' think that the memory configuration is the issue. The problem is the join query. The join does not have any temporal boundaries. Therefore, both tables are completely stored in memory and never released. You can configure a memory eviction strategy via idle state retention [1] but you

Re: Flink job manager doesn't remove stale checkmarks

2019-09-26 Thread Clay Teeter
I see, I'll try turning off incremental checkpoints to see if that helps. re: Diskspace, i could see a scenario with my application where i could get 10,000+ checkpoints, if the checkpoints are additive. I'll let you know what i see. Thanks! Clay On Wed, Sep 25, 2019 at 5:40 PM Fabian Hueske

UnitTests and ProcessTimeWindows - Missing results

2019-09-26 Thread Clay Teeter
What is the best way to run unit tests on streams that contain ProcessTimeWindows? Example: def bufferDataStreamByProcessId(ds: DataStream[MaalkaRecord]): DataStream[MaalkaRecord] = { ds.map { r => println(s"data in: $r") // Data shows up here r }.keyBy { mr => val r = mr.asInstan

Debugging slow/failing checkpoints

2019-09-26 Thread Steven Nelson
I am working with an application that hasn't gone to production yet. We run Flink as a cluster within a K8s environment. It has the following attributes 1) 2 Job Manager configured using HA, backed by Zookeeper and HDFS 2) 4 Task Managers 3) Configured to use RocksDB. The actual RocksDB files are

Re: Flink job manager doesn't remove stale checkmarks

2019-09-26 Thread Clay Teeter
I looked into the disk issues and found that Fabian was on the right path. The checkpoints that were lingering were in-fact in use. Thanks for the help! Clay On Thu, Sep 26, 2019 at 8:09 PM Clay Teeter wrote: > I see, I'll try turning off incremental checkpoints to see if that helps. > > re:

** Help need w.r.t parallelism settings in flink **

2019-09-26 Thread Akshay Iyangar
Hi So we are running a beam pipeline that uses flink as its execution engine. We are currently on flink1.8 So per the flink documentation I see that there is an option that allows u to set Parallelism and maxParallelism. We actually want to set both so that we can dynamically scale the pipeline

Re: ** Help need w.r.t parallelism settings in flink **

2019-09-26 Thread Zhu Zhu
Hi Akshay, For your questions, 1. One main purpose of maxParallelism is to decide the count of keyGroup. keyGroup is the bucket for keys when doing keyBy partitioning. So a larger maxParallelism indicates a finer granularity for key distribution. No matter it's a stateful operator or not. 2. You

Re: Debugging slow/failing checkpoints

2019-09-26 Thread Congxian Qiu
Hi Steve 1. Do you use exactly once or at least once? 2. Do you use incremental or not 3. Do you have any timer, and where does the timer stored(Heap or RocksDB), you can ref the config here[1], you can try store the timer in RocksDB. 4. Does the align time too long 5. You can check if it is sync

Re: 订阅邮件

2019-09-26 Thread Dian Fu
To subscribe to the mailing list, you need send email to the following address dev-subscr...@flink.apache.org , user-subscr...@flink.apache.org and user-zh-subscr...@flink.apache.org

Re: CSV Table source as data-stream in environment file

2019-09-26 Thread Dian Fu
You need to add the following configuration to configure it run in streaming mode[1]. execution: type: streaming Regards, Dian [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/sqlClient.html#environment-files

Reading Key-Value from Kafka | Eviction policy.

2019-09-26 Thread srikanth flink
Hi, My data source is Kafka, all these days have been reading the values from Kafka stream to a table. The table just grows and runs into a heap issue. Came across the eviction policy that works on only keys, right? Have researched to configure the environment file(Flink SLQ) to read both key an

Re: Reading Key-Value from Kafka | Eviction policy.

2019-09-26 Thread miki haiat
I'm sure there is several ways to implement it. Can you elaborate more on your use case ? On Fri, Sep 27, 2019, 08:37 srikanth flink wrote: > Hi, > > My data source is Kafka, all these days have been reading the values from > Kafka stream to a table. The table just grows and runs into a heap iss

Re: Challenges Deploying Flink With Savepoints On Kubernetes

2019-09-26 Thread Vijay Bhaskar
Suppose my cluster got crashed and need to bring up the entire cluster back? Does HA still helps to run the cluster from latest save point? Regards Bhaskar On Thu, Sep 26, 2019 at 7:44 PM Sean Hester wrote: > thanks to everyone for all the replies. > > i think the original concern here with "ju

Re: Challenges Deploying Flink With Savepoints On Kubernetes

2019-09-26 Thread Vijay Bhaskar
I don't think HA will help to recover from cluster crash, for that we should take periodic savepoint right? Please correct me in case i am wrong Regards Bhaskar On Fri, Sep 27, 2019 at 11:48 AM Vijay Bhaskar wrote: > Suppose my cluster got crashed and need to bring up the entire cluster > back?

Re: Reading Key-Value from Kafka | Eviction policy.

2019-09-26 Thread srikanth flink
Hi Miki, What are those several ways? could you help me with references? Use case: We have a continuous credit card transaction stream flowing into a Kafka topic, along with a set of defaulters of credit card in a .csv file(which gets updated every day). Thanks Srikanth On Fri, Sep 27, 2019