Re: Flink long state TTL Concerns

2020-03-19 Thread Matthew Rafael Magsombol
And another additional followup! ( Sorry if there's a lot of followups! We've ran a flink consumer but these are very basic consumers without state! ). Suppose I want to use a MapState[String, ]... in order to make that happen, following this link https://ci.apache.org/projects/flink/flink-docs-re

Re: Flink long state TTL Concerns

2020-03-19 Thread Matthew Rafael Magsombol
Also as a follow up question with respect to state cleanup, I see that there's an incremental cleanup option: https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#incremental-cleanup It has notes indicating that if no access happens to that state/no records processed,

Re: Streaming kafka data sink to hive

2020-03-19 Thread Jingsong Li
Hi wanglei, > 1 Is there any flink-hive-connector that i can use to write to hive streamingly? "Streaming kafka data sink to hive" is under discussion.[1] And POC work is ongoing.[2] We want to support it in release-1.11. > 2 Since HDFS is not friendly to frequently append and hive's data is s

Savepoint Location from Flink REST API

2020-03-19 Thread Aaron Langford
Hey Flink Community, I'm combing through docs right now, and I don't see that a savepoint location is returned or surfaced anywhere. When I do this in the CLI, I get a nice message that tells me where in S3 it put my savepoint (unique savepoint ID included). I'm looking for that same result to be

How can i set the value of taskmanager.network.numberOfBuffers ?

2020-03-19 Thread forideal
Hi community This parameter makes me confused. |taskmanager.network.numberOfBuffers| 70 | In my job, i use 700 slots, but ,i have to set the this parameter to 70.If not,i will get a exception. java.io.IOException: Insufficient n

Streaming kafka data sink to hive

2020-03-19 Thread wangl...@geekplus.com.cn
We have many app logs on our app server and want to parse the logs to structed table format and then sink to hive. Seems it is good to use batch mode. The app log is hourly compressed and it is convenience to do partitioning. We want to use streaming mode. Tail the app logs to Kafka, then use

RE: Need help on timestamp type conversion for Table API on Pravega Connector

2020-03-19 Thread B.Zhou
Hi, Thanks for the reference, Jark. In Pravega connector, user will define Schema first and then create the table with the descriptor using the schema, see [1] and error also came from this test case. We also tried the recommended `bridgedTo(Timestamp.class)` method in the schema construction,

Re: Help with flink hdfs sink

2020-03-19 Thread Jingsong Li
Hi Nick, You can try "new Path("hdfs:///tmp/auditlog/")". There is one additional / after hdfs://, which is a protocol name. Best, Jingsong Lee On Fri, Mar 20, 2020 at 3:13 AM Nick Bendtner wrote: > Hi guys, > I am using flink version 1.7.2. > I am trying to write to hdfs sink from my flink jo

Re: Flink long state TTL Concerns

2020-03-19 Thread Matthew Rafael Magsombol
I see... The way we run our setup is that we run these in a kubernetes cluster where we have one cluster running one job. The total parallelism of the whole cluster is equal to the number of taskmanagers where each task manager has 1 core cpu accounting for 1 slot. If we add a state ttl, do you hav

Re: KafkaConsumer keeps getting InstanceAlreadyExistsException

2020-03-19 Thread Becket Qin
Hi Rong, The issue here is that the PartitionDiscoverer has an internal KafkaConsumer which reuses the client.id set by the users for the actual fetching KafkaConsumer. Different KafkaConsumers distinguish their metrics by client.id, therefore if there are two KafkaConsumers in the same JVM with t

Re: Flink long state TTL Concerns

2020-03-19 Thread Andrey Zagrebin
Hi Matt, Generally speaking, using state with TTL in Flink should not differ a lot from just using Flink with state [1]. You have to provision your system so that it can keep the state of size which is worth of 7 days. The existing Flink state backends provide background cleanup to automatically

Help with flink hdfs sink

2020-03-19 Thread Nick Bendtner
Hi guys, I am using flink version 1.7.2. I am trying to write to hdfs sink from my flink job. I setup HADOOP_HOME. Here is the debug log for this : 2020-03-19 18:59:34,316 DEBUG org.apache.flink.runtime.util.HadoopUtils - Cannot find hdfs-default configuration-file path in Flin

java.lang.AbstractMethodError when implementing KafkaSerializationSchema

2020-03-19 Thread Steve Whelan
Hi, I am attempting to create a Key/Value serializer for the Kafka table connector. I forked `KafkaTableSourceSinkFactoryBase`[1] and other relevant classes, updating the serializer. First, I created `JsonRowKeyedSerializationSchema` which implements `KeyedSerializationSchema`[2], which is deprec

Flink long state TTL Concerns

2020-03-19 Thread Matt Magsombol
Suppose I'm using state stored in-memory that has a TTL of 7 days max. Should I run into any issues with state this long other than potential OOM? Let's suppose I extend this such that we add rocksdb...any concerns with this with respect to maintenance? Most of the examples that I've been seein

Re: Issues with Watermark generation after join

2020-03-19 Thread Dominik Wosiński
I have created a simple minimal reproducible example that shows what I am talking about: https://github.com/DomWos/FlinkTTF/tree/sql-ttf It contains a test that shows that even if the output is in order which is enforced by multiple sleeps, then for parallelism > 1 there is no output and for paral

Re: Can't create a savepoint with State Processor API

2020-03-19 Thread Dmitry Minaev
Yep, that works! Many thanks David, really appreciate it! -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Can't create a savepoint with State Processor API

2020-03-19 Thread David Anderson
You are very close. I got your example to work by switching from the MemoryStateBackend to the FsStateBackend, and adding bEnv.execute(); at the end of main(). I'm not sure why either of those might be necessary, but it doesn't seem to work without both changes. See https://gist.github.com/alpi

The question about the FLIP-45

2020-03-19 Thread LakeShen
Hi community, Now I am reading the FLIP-45 Reinforce Job Stop Semantic, I have three questions about it : 1. What the command to use to stop the Flink task, stop or cancel? 2. If use stop command to stop filnk task , but I see the flink source code , the stop command we can set the savepoint dir

Re: Need help on timestamp type conversion for Table API on Pravega Connector

2020-03-19 Thread Jark Wu
This maybe a similar issue to [1], we continue the discussion there. Best, Jark [1]: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/SQL-Timetamp-types-incompatible-after-migration-to-1-10-td33784.html#a33791 On Tue, 17 Mar 2020 at 18:05, Till Rohrmann wrote: > Thanks for

Re: SQL Timetamp types incompatible after migration to 1.10

2020-03-19 Thread Jark Wu
Hi Brian, Could you share the full exception stack of `Unsupport cast from LocalDateTime to Long` in the PR? In 1.10 DDL, the conversion class or TypeInformation for TIMESTAMP becomes `LocalDateTime`. Maybe your prolem is related to this change? If the connector doesn't support `LocalDateTime`,

RE: SQL Timetamp types incompatible after migration to 1.10

2020-03-19 Thread B.Zhou
Hi Jark, I saw this mail and found this is a similar issue I raised to the community several days ago.[1] Can you have a look to see if it’s the same issue as this. If yes, there is a further question. From the Pravega connector side, the issue is raised in our Batch Table API which means users

Re: SQL Timetamp types incompatible after migration to 1.10

2020-03-19 Thread Jark Wu
Hi Paul, Are you using old planner? Did you try blink planner? I guess it maybe a bug in old planner which doesn't work well on new types. Best, Jark On Thu, 19 Mar 2020 at 16:27, Paul Lam wrote: > Hi, > > Recently I upgraded a simple application that inserts static data into a > table from 1.

Re: Timestamp Erasure

2020-03-19 Thread Jark Wu
Hi Dom, The output elements from ProcessingTime timer in BroadcastProcessFunction or KeyedCoProcessFunction will be erased timestamp. So you have to assign a new `*assignTimestampsAndWatermarks` *after that, or use EventTime timer. Best, Jark On Thu, 19 Mar 2020 at 16:40, Dominik Wosiński wrote

Re: FlinkCEP - Detect absence of a certain event

2020-03-19 Thread Humberto Rodriguez Avila
Thanks David, I will look at your references 👍🏻 > On 19 Mar 2020, at 09:51, David Anderson wrote: > > Humberto, > > Although Flink CEP lacks notFollowedBy at the end of a Pattern, there is a > way to implement this by exploiting the timeout feature. > > The Flink training includes an exercis

Re: FlinkCEP - Detect absence of a certain event

2020-03-19 Thread Humberto Rodriguez Avila
Thanks David, I will look at your references 👍🏻 > On 19 Mar 2020, at 09:51, David Anderson > wrote: > > Humberto, > > Although Flink CEP lacks notFollowedBy at the end of a Pattern, there is a > way to implement this by exploiting the timeout feature. > > The Flin

Re: FlinkCEP - Detect absence of a certain event

2020-03-19 Thread David Anderson
Humberto, Although Flink CEP lacks notFollowedBy at the end of a Pattern, there is a way to implement this by exploiting the timeout feature. The Flink training includes an exercise [1] where the objective is to identify taxi rides with a START event that is not followed by an END event within tw

Re: Timestamp Erasure

2020-03-19 Thread Dominik Wosiński
Yes, I understand this completely, but my question is a little bit different. The issue is that if I have something like : *val firstStream = dataStreamFromKafka* *.assignTimestampAndWatermarks(...)* *val secondStream = otherStreamFromKafka* *.assignTimestampsAndWatermarks(...)* *.broadcast(...)*

SQL Timetamp types incompatible after migration to 1.10

2020-03-19 Thread Paul Lam
Hi, Recently I upgraded a simple application that inserts static data into a table from 1.9.0 to 1.10.0, and encountered a timestamp type incompatibility problem during the table sink validation. The SQL is like: ``` insert into kafka.test.tbl_a # schema: (user_name STRING, user_id INT, login

Re: FlinkCEP - Detect absence of a certain event

2020-03-19 Thread Humberto Rodriguez Avila
Hello Fuji and Zhijiang Thanks for time! Fuji thanks for the links 👍🏻 I hope that in the future similar scenarios can be implemented in FlinkCEP. Other CEPs like ESPER support this particular type of negated patterns. Best regards, Humberto > On 19 Mar 2020, at 04:55, Shuai Xu wrote: > >