Re: RocksDB local snapshot sliently disappears and cause checkpoint to fail

2019-03-27 Thread Paul Lam
Hi, It turns out that under certain circumstances rocksdb statebackend mistakenly uses the default filesystem scheme, which is specified to hdfs in the new cluster in my case. I’ve filed a Jira to track this[1]. [1] https://issues.apache.org/jira/browse/FLINK-12042

questions regarding offset

2019-03-27 Thread Avi Levi
Hi Guys, I understood that offset is kept as part of the checkpoint and persisted in the state (please correct me if I'm wrong) 1. If I copy my persisted state to another cluster (different kafka servers as well) how is the offset handled ? 2. In a stateless job how is the offset managed ? since t

Re: Do we have an example of setting up Queryable state ( proxies, client etc ) on k8s ?

2019-03-27 Thread Vishal Santoshi
I think I got a handle on this. For those who might want to do this Here are the steps ( I could share the Jetty/Jersey REST code too is required ) *1.* Create a side car container on each pod that has a TM. I wrote a simple Jetty/Jersey REST based server that execute queries against the loca

Re: Help debugging Kafka connection leaks after job failure/cancelation

2019-03-27 Thread Fritz Budiyanto
Thank you ! > On Mar 26, 2019, at 6:51 PM, Steven Wu wrote: > > it might be related to this issue > https://issues.apache.org/jira/browse/FLINK-10774 > > > On Tue, Mar 26, 2019 at 4:35 PM Fritz Budiyanto > wrote:

What are savepoint state manipulation support plans

2019-03-27 Thread Sergei Poganshev
What are the plans to support savepoint state manipulation with batch jobs natively in core Flink? I've tried using the bravo tool [1]. It's pretty good at reading savepoints, but writing seems hacky. For example I wonder what exactly happens with the following lines: val newOpState = writer.writ

Calcite SQL Map to Pojo Map

2019-03-27 Thread shkob1
Im trying to convert a SQL query that has a select map[..] into a pojo with Map (using tableEnv.toRestractedStream ) It seems to fail when the field requestedTypeInfo is GenericTypeInfo with GenericType while the field type itself is MapTypeInfo with Map Exception in thread "main" org.apache.flin

Support for Parquet schema evolution (a.k.a mergeSchema)

2019-03-27 Thread Rafi Aroch
Hi, In my job I want to read Parquet files from buckets by a date range. For that i'm using the Hadoop Compatibility features to use *ProtoParquetInputFormat*. If in the processed date range the Parquet schema underwent changes (even valid ones). Job fails with *IncompatibleSchemaModificationExcep

Re: Streaming Protobuf into Parquet file not working with StreamingFileSink

2019-03-27 Thread Rafi Aroch
Thanks Piotr & Kostas. Really looking forward to this :) Rafi On Wed, Mar 27, 2019 at 10:58 AM Piotr Nowojski wrote: > Hi Rafi, > > There is also an ongoing effort to support bounded streams in DataStream > API [1], which might provide the backbone for the functionalists that you > need. > >

RocksDB local snapshot sliently disappears and cause checkpoint to fail

2019-03-27 Thread Paul Lam
Hi,I’m using Flink 1.6.4 and recently I ran into a weird issue of rocksdb statebackend. A job that runs fine on a YARN cluster keeps failing on checkpoint after migrated to a new one (with almost everything the same but better machines), and even a clean restart doesn’t help. The root cause is Ille

Re: Setting source vs sink vs window parallelism with data increase

2019-03-27 Thread Piotr Nowojski
No problem and it’s good to hear that you managed to solve the problem. Piotrek > On 23 Mar 2019, at 12:49, Padarn Wilson wrote: > > Well.. it turned out I was registering millions of timers by accident, which > was why garbage collection was blowing up. Oops. Thanks for your help again. > >

Re: Streaming Protobuf into Parquet file not working with StreamingFileSink

2019-03-27 Thread Piotr Nowojski
Hi Rafi, There is also an ongoing effort to support bounded streams in DataStream API [1], which might provide the backbone for the functionalists that you need. Piotrek [1] https://issues.apache.org/jira/browse/FLINK-11875 > On 25 Mar 2019

Details on Checkpointing if there are multiple/different sources grouped together.

2019-03-27 Thread anaray
Hi, Please help me to understand checkpointing if there multiple/different sources and which is grouped together. | || KAFKA TOPIC 1 | SOURCE1(DataStream1) |