Hi,
thanks for reaching out to the community. I'm not an Hive nor Orc format
expert. But could it be that this is a configuration problem? The error is
caused by an ArrayIndexOutOfBounds exception in
ValidReadTxnList.readFromString on an array generated by splitting a String
using colons as separat
Hi Matthias,
Sorry for my miss leading. I mean kafka-schema-serializer rather than
kafka-avro-serializer.
io.confluent.kafka.serializers.AbstractKafkaSchemaSerDe is in
kafka-schema-serializer and kafka-schema-serializer should be a dependency of
kafka-avro-serializer according to their pom.xml
Hi,Nick
Normally you could not iterate all the keyed states, but the
`BroadCastState` & `applyTokeyedState` could do that.
For example, before you get the broadcast side elements you might choose to
cache the non-broadcast element to the keyed state. After the broadcast
elements arrive you need to
Thanks Guowei. Another question I have is, what is the use of a broadcast
state when I can update a map state or value state inside of the process
broadcast element method and use that state to do a lookup in the process
element method like this example
https://stackoverflow.com/questions/58307154/
Hi, greetings
I am applying window operations on a datastream. Then I apply some
transformation (it could be anything). Let's say I keep the window size to
1 minute and data is coming in a strictly increasing timestamp and let's
say watermark is 1 ms (checkpointing is also enabled). There would be
Hi, Nick
You might need to handle it yourself If you have to process an element
only after you get the broadcast state.
For example, you could “cache” the element to the state and handle it
when the element from the broadcast side elements are arrived. Specially if
you are using the `KeyedBroad
Hi guys,
What is the way to initialize broadcast state(say with default values)
before the first element shows up in the broadcasting stream? I do a lookup
on the broadcast state to process transactions which come from another
stream. The problem is the broadcast state is empty until the first elem
Hi, Jose
What I understand your question is
Your job has two stages. You want to handle the first stage differently
according to the event time of the Stream A. It means that if the event
time of Stream A is “too late” then you would enrich Stream A with the
external system and or you would en
Hi,
Thanks all for replying.
1. The code uses data stream api only. In the code, we use
env.setMaxParallsim() api but not use any operator.setMaxParallsim() api.
We do use setParallsim() on each operator.
2. We did set uids for each operator and we can find uids match in two
savepoints.
3. quote
INITIALIZING is the very first state a job is in.
It is the state of a job that has been accepted by the JobManager, but
the processing of said job has not started yet.
In other words, INITIALIZING = submitted job, CREATED = data-structures
and components required for scheduling have been create
Hi everybody!
I'm so excited to be here asking a first question about Flink DataStream
API.
I have a basic enrichment pipeline (event time). Basically, there's a main
stream A (Kafka source) being enriched with the info of 2 other streams: B
and C. (Kafka sources as well).
Basically, the enrichme
Hi,
I haven't tried it myself yet but there is a Flink connector for HBase and
I remember someone telling me that Google has made a library available
which is effectively the HBase client which talks to BigTable in the
backend.
Like I said: I haven't tried this yet myself.
Niels Basjes
Op zo 24
Dear Community,
I would like to use BigTable as a sink for a Flink job:
1) Is there a connector out-of-the-box ?
2) Can I use Datastream API ?
3) How can I optimally pass a sparse object (99% sparsity), i.e. ensure no
key/value are created in BigTable for nulls ?
I have searched the documentation
Hello,
I have looked into this issue:
https://issues.apache.org/jira/browse/FLINK-16866 which supposedly adds
"INITIALIZING" state.
I tried to find the documentation here:
-
https://ci.apache.org/projects/flink/flink-docs-release-1.12/internals/job_scheduling.html#jobmanager-data-structures
-
htt
Hi Abhishek,
unsubscribing works by sending an email to user-unsubscr...@flink.apache.org
as stated in [1].
Best,
Matthias
[1] https://flink.apache.org/community.html#mailing-lists
On Sun, Jan 24, 2021 at 3:06 PM Abhishek Jain wrote:
> unsubscribe
>
unsubscribe
As the title, my query sql is very simple, it just select all columns from
a hive table(version 1.2.1; orc format). When the sql is submitted, after
several seconds, the jobmanager is failed. Here is the Jobmanager's log.
Does anyone can help to this problem?
2021-01-24 04:41:24,952 ERROR
org.apa
I think you need provide all the parallelism information, such like the
operator info 'Id: b0936afefebc629e050a0f423f44e6ba, maxparallsim: 4096'.
What is the parallelism, the maxparallism maybe be generated from the
parallelism you have set.
Arvid Heise 于2021年1月22日周五 下午11:03写道:
> Hi Lu,
>
> if y
18 matches
Mail list logo