Re: FlinkSQL submit query and then the jobmanager failed.

2021-01-24 Thread Matthias Pohl
Hi, thanks for reaching out to the community. I'm not an Hive nor Orc format expert. But could it be that this is a configuration problem? The error is caused by an ArrayIndexOutOfBounds exception in ValidReadTxnList.readFromString on an array generated by splitting a String using colons as separat

Re:Re: Re: Test failed in flink-end-to-end-tests/flink-end-to-end-tests-common-kafka

2021-01-24 Thread Smile@LETTers
Hi Matthias, Sorry for my miss leading. I mean kafka-schema-serializer rather than kafka-avro-serializer. io.confluent.kafka.serializers.AbstractKafkaSchemaSerDe is in kafka-schema-serializer and kafka-schema-serializer should be a dependency of kafka-avro-serializer according to their pom.xml

Re: Initializing broadcast state

2021-01-24 Thread Guowei Ma
Hi,Nick Normally you could not iterate all the keyed states, but the `BroadCastState` & `applyTokeyedState` could do that. For example, before you get the broadcast side elements you might choose to cache the non-broadcast element to the keyed state. After the broadcast elements arrive you need to

Re: Initializing broadcast state

2021-01-24 Thread Nick Bendtner
Thanks Guowei. Another question I have is, what is the use of a broadcast state when I can update a map state or value state inside of the process broadcast element method and use that state to do a lookup in the process element method like this example https://stackoverflow.com/questions/58307154/

Unable to query/print the incomplete bucket state

2021-01-24 Thread Falak Kansal
Hi, greetings I am applying window operations on a datastream. Then I apply some transformation (it could be anything). Let's say I keep the window size to 1 minute and data is coming in a strictly increasing timestamp and let's say watermark is 1 ms (checkpointing is also enabled). There would be

Re: Initializing broadcast state

2021-01-24 Thread Guowei Ma
Hi, Nick You might need to handle it yourself If you have to process an element only after you get the broadcast state. For example, you could “cache” the element to the state and handle it when the element from the broadcast side elements are arrived. Specially if you are using the `KeyedBroad

Initializing broadcast state

2021-01-24 Thread Nick Bendtner
Hi guys, What is the way to initialize broadcast state(say with default values) before the first element shows up in the broadcasting stream? I do a lookup on the broadcast state to process transactions which come from another stream. The problem is the broadcast state is empty until the first elem

Re: Where should a secondary flow for late events processing be defined?

2021-01-24 Thread Guowei Ma
Hi, Jose What I understand your question is Your job has two stages. You want to handle the first stage differently according to the event time of the Stream A. It means that if the event time of Stream A is “too late” then you would enrich Stream A with the external system and or you would en

Re: Flink 1.11 checkpoint compatibility issue

2021-01-24 Thread Lu Niu
Hi, Thanks all for replying. 1. The code uses data stream api only. In the code, we use env.setMaxParallsim() api but not use any operator.setMaxParallsim() api. We do use setParallsim() on each operator. 2. We did set uids for each operator and we can find uids match in two savepoints. 3. quote

Re: Job execution graph state - INITIALIZING

2021-01-24 Thread Chesnay Schepler
INITIALIZING is the very first state a job is in. It is the state of a job that has been accepted by the JobManager, but the processing of said job has not started yet. In other words, INITIALIZING = submitted job, CREATED = data-structures and components required for scheduling have been create

Where should a secondary flow for late events processing be defined?

2021-01-24 Thread Jose Velasco
Hi everybody! I'm so excited to be here asking a first question about Flink DataStream API. I have a basic enrichment pipeline (event time). Basically, there's a main stream A (Kafka source) being enriched with the info of 2 other streams: B and C. (Kafka sources as well). Basically, the enrichme

Re: Flink to BigTable

2021-01-24 Thread Niels Basjes
Hi, I haven't tried it myself yet but there is a Flink connector for HBase and I remember someone telling me that Google has made a library available which is effectively the HBase client which talks to BigTable in the backend. Like I said: I haven't tried this yet myself. Niels Basjes Op zo 24

Flink to BigTable

2021-01-24 Thread Pierre Oberholzer
Dear Community, I would like to use BigTable as a sink for a Flink job: 1) Is there a connector out-of-the-box ? 2) Can I use Datastream API ? 3) How can I optimally pass a sparse object (99% sparsity), i.e. ensure no key/value are created in BigTable for nulls ? I have searched the documentation

Job execution graph state - INITIALIZING

2021-01-24 Thread Nikola Hrusov
Hello, I have looked into this issue: https://issues.apache.org/jira/browse/FLINK-16866 which supposedly adds "INITIALIZING" state. I tried to find the documentation here: - https://ci.apache.org/projects/flink/flink-docs-release-1.12/internals/job_scheduling.html#jobmanager-data-structures - htt

Re: unsubscribe

2021-01-24 Thread Matthias Pohl
Hi Abhishek, unsubscribing works by sending an email to user-unsubscr...@flink.apache.org as stated in [1]. Best, Matthias [1] https://flink.apache.org/community.html#mailing-lists On Sun, Jan 24, 2021 at 3:06 PM Abhishek Jain wrote: > unsubscribe >

unsubscribe

2021-01-24 Thread Abhishek Jain
unsubscribe

FlinkSQL submit query and then the jobmanager failed.

2021-01-24 Thread 赵一旦
As the title, my query sql is very simple, it just select all columns from a hive table(version 1.2.1; orc format). When the sql is submitted, after several seconds, the jobmanager is failed. Here is the Jobmanager's log. Does anyone can help to this problem? 2021-01-24 04:41:24,952 ERROR org.apa

Re: Flink 1.11 checkpoint compatibility issue

2021-01-24 Thread 赵一旦
I think you need provide all the parallelism information, such like the operator info 'Id: b0936afefebc629e050a0f423f44e6ba, maxparallsim: 4096'. What is the parallelism, the maxparallism maybe be generated from the parallelism you have set. Arvid Heise 于2021年1月22日周五 下午11:03写道: > Hi Lu, > > if y