date:20210317

How does Flink SQL read Avro union?

2021-03-17 Thread Vincent Dong

Hi All, How does Flink SQL read Kafka Avro message which has union field? For me, avro schema is defined as following, ``` record ItemRow { string num_iid; string has_showcase; string jdp_created; } record RefundRow { string refund_id; string status; string jd

Re: EOFException on attempt to scale up job with RocksDB state backend

2021-03-17 Thread Alexey Trenikhun

Hi Yun, How underlying storage explains fact that without re-scale I can restore from savepoint? Does Flink write file once or many times, if many times, then potentially could be problem with 50,000 blocks per blob limit, I'm I right? Should I try block blob with compaction like described in [1

Re: EOFException on attempt to scale up job with RocksDB state backend

2021-03-17 Thread Yun Tang

Hi Alexey, I am not familiar with azure blob storage and I cannot load the "_metadata" with your given file locally. Currently, I highly suspect this strange rescaling behavior is related with your underlying storage, could you try to use block blob instead of page blob [1] to see whether this

Re: Python UDF environment variables

2021-03-17 Thread Dian Fu

AFAIK, the system environment variables will be passed to the Python process by default. See [1] for more details. [1] https://github.com/apache/flink/blob/97bfd049951f8d52a2e0aed14265074c4255ead0/flink-python/src/main/java/org/apache/flink/streaming/api/operators/python/AbstractPythonFunctionO

Re: EOFException on attempt to scale up job with RocksDB state backend

2021-03-17 Thread Yun Tang

Hi Alexey, I tried to load your _metadata as checkpoint via Checkpoints#loadCheckpointMetadata [1] but found this file is actually not a savepoint meta, have you ever uploaded the correct files? Moreover, I noticed that both size of 77e77928-cb26-4543-bd41-e785fcac49f0 and _metadata are 128MB w

Re: Get side output (events where join didn't succeed) for IntervalJoin

2021-03-17 Thread Guowei Ma

Hi, AFAIK, the `intervalJoin` of the DataStream could not do that. But I think you could try the SQL's intervalJoin[1] and you could find some examples in the `IntervalJoinITCase`[2]. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/sql/queries.html#joins [2] https://gith

Re: Handle late message with flink SQL

2021-03-17 Thread Yi Tang

Thanks Timo. The whole idea is also based on the side output and output tag. Let me explain it in detail: 1. Introduce a VirtualTableScan(or SideOutputTableScan), which can be optimized as Physical RelNode. Then we can create a source catalog table which will be converted to a VirtualTableScan, a

Flink History server ( running jobs )

2021-03-17 Thread Vishal Santoshi

Hello folks, Does fliink server not provide for running jobs ( like spark history does ) ? Regards.

Re: EOFException on attempt to scale up job with RocksDB state backend

2021-03-17 Thread Alexey Trenikhun

Hi Yun, I've copied 77e77928-cb26-4543-bd41-e785fcac49f0 and _metadata to Google drive: https://drive.google.com/drive/folders/1J3nwvQupLBT5ZaN_qEmc2y_-MgFz0cLb?usp=sharing Compression was never enabled (docs says that RocksDB's incremental checkpoints always use snappy compression, not sure doe

Python UDF environment variables

2021-03-17 Thread Soren Macbeth

Hello, It seems that the environment of my Task Manager container is not available to my python process with executing a python api UDF. Is there some way that I can allow it to access them or some configuration I can set to enable this? I'm testing this my printing out `os.environ` in my UDF. TI

Re: Evenly Spreading Out Source Tasks

2021-03-17 Thread Aeden Jameson

There may be a slight misunderstanding: all the FlinkSql tasks _were_ set at a parallelism of 72 -- 18 nodes 4 slots. I was hoping that the setting cluster.evenly-spread-out-slots would spread out the active kafka consumers evenly among the TM's given the topic has 36 partitions, but I now realize

Re: Extracting state keys for a very large RocksDB savepoint

2021-03-17 Thread Andrey Bulgakov

I guess there's no point in making it a KeyedProcessFunction since it's not going to have access to context, timers or anything like that. So it can be a simple InputFormat returning a DataSet of key and value tuples. On Wed, Mar 17, 2021 at 8:37 AM Andrey Bulgakov wrote: > Hi Gordon, > > I thin

Re: Checkpoint fail due to timeout

2021-03-17 Thread Alexey Trenikhun

According to [1] checkpoints do not support Flink specific features like rescaling, but I can try. Thank you for suggestions [1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/checkpoints.html#difference-to-savepoints Apache Flink 1.12 Documentation: Checkpoints

Re: Extracting state keys for a very large RocksDB savepoint

2021-03-17 Thread Andrey Bulgakov

Hi Gordon, I think my current implementation is very specific and wouldn't be that valuable for the broader public. But I think there's a potential version of it that could also retrieve values from a savepoint in the same efficient way and that would be something that other people might need. I'

Re: Saved State in FSstate Backend

2021-03-17 Thread Yun Tang

Hi You could refer to [1] to know more details about checkpoint directory structure. If you are using FsStateBackend, all checkpointed data should be found under 'exclusive' folder, and nothing would exist if keyed state handle smaller than memory threshold [2] (checkpointed data would be sent

Re: Application cluster - Best Practice

2021-03-17 Thread Tamir Sagi

Hey Chesnay 1. Would you please explain what are the business considerations for making ApplicationClusterDeployer/ApplicationDeployer Internal? 2. May we provide a public client implementation that allows developers to run the cluster programmatically. [https://my-email-signature.link/s

Re: Application cluster - Best Practice

2021-03-17 Thread Tamir Sagi

Hey Till, Since the client provides a way to instantiate the ApplicationClusterDeployer its already considered 'public'. IMHO, as long as it's achievable, it must be added to the documentations, because they are incomplete. Right now, we can proceed with the ApplicationDeployer. I really hope t

Re: Application cluster - Best Practice

2021-03-17 Thread Till Rohrmann

Concerning making the ApplicationDeployer interface public, I think we need a community discussion. At the moment this interface is marked as internal. However, I can see the benefits of exposing this interface and respective implementation. I guess the main question is up to which level do we want

Re: [Flink SQL] Leniency of JSON parsing

2021-03-17 Thread Magri, Sebastian

Thanks a lot Timo, I will check those links out and create an issue with more information. Best Regards, Sebastian From: Timo Walther Sent: Tuesday, March 16, 2021 15:29 To: Magri, Sebastian ; ro...@apache.org Cc: user Subject: Re: [Flink SQL] Leniency of JSON

Saved State in FSstate Backend

2021-03-17 Thread Abdullah bin Omar

Hi, I used the FSstate backend to save the state. I just got a folder named similar to JobID (attached image). Inside the folder, there are two more folders named by shared and task owned. However, there is nothing in those folders. How can I see the saved state? or, where is the state saved? Th

Get side output (events where join didn't succeed) for IntervalJoin

2021-03-17 Thread Daksh Talwar

Hi, I'm exploring Interval Join as a probable solution for joins between two streams of data, with a long lookback period. While going through the official documentation, I couldn't figure out

Re: EOFException on attempt to scale up job with RocksDB state backend

2021-03-17 Thread Yun Tang

Hi Alexey, Thanks for your quick response. I have checked two different logs and still cannot understand why this could happen. Take "wasbs://gsp-st...@gspstatewestus2dev.blob.core.windows.net/gsp/savepoints/savepoint-00-67de6690143a/77e77928-cb26-4543-bd41-e785fcac49f0" for example, the k

Re: Checkpoint fail due to timeout

2021-03-17 Thread 陳昌倬

On Wed, Mar 17, 2021 at 05:45:38AM +, Alexey Trenikhun wrote: > In my opinion looks similar. Were you able to tune-up Flink to make it work? > I'm stuck with it, I wanted to scale up hoping to reduce backpressure, but to > rescale I need to take savepoint, which never completes (at least take

How does Flink SQL read Avro union?

Re: EOFException on attempt to scale up job with RocksDB state backend

Re: EOFException on attempt to scale up job with RocksDB state backend

Re: Python UDF environment variables

Re: EOFException on attempt to scale up job with RocksDB state backend

Re: Get side output (events where join didn't succeed) for IntervalJoin

Re: Handle late message with flink SQL

Flink History server ( running jobs )

Re: EOFException on attempt to scale up job with RocksDB state backend

Python UDF environment variables

Re: Evenly Spreading Out Source Tasks

Re: Extracting state keys for a very large RocksDB savepoint

Re: Checkpoint fail due to timeout

Re: Extracting state keys for a very large RocksDB savepoint

Re: Saved State in FSstate Backend

Re: Application cluster - Best Practice

Re: Application cluster - Best Practice

Re: Application cluster - Best Practice

Re: [Flink SQL] Leniency of JSON parsing

Saved State in FSstate Backend

Get side output (events where join didn't succeed) for IntervalJoin

Re: EOFException on attempt to scale up job with RocksDB state backend

Re: Checkpoint fail due to timeout

23 matches

Site Navigation

Mail list logo

Footer information