Hi,
What Fabian mentioned is true. Flink Kafka Consumer’s exactly-once guarantee
relies on offsets checkpoints as Flink state, and doesn’t rely on the committed
offsets in Kafka.
What we found is that Flink acks Kafka immediately before even writing to S3.
What you mean by ack here is the offs
Our parser.parse() function has a one-to-one mapping between an input byte[]
to a List
Ideally, this should be handled within the KeyedDeserializationSchema passed to
your Kafka consumer. That would then avoid the need of an extra “parser map
function” after the source.
Were you suggesting a fl
Glad to hear it’s working!
Yes, normally you should avoid using the lib folder to resolve these dependency
issues and rely only on user jar packaging when working with Flink connectors.
- Gordon
On 17 July 2017 at 9:44:20 PM, Fabian Wollert (fabian.woll...@zalando.de) wrote:
TL;DR: remove all
Hi,
To expand on Fabian's answer, there's a few API for join.
* connect - you have to provide a CoprocessFunction.
* window join/cogroup - you provide key selector functions, a time window and
a join/cogroup function.
With the first method, you have to write more code, in exchange for much mo
Hi Pedro,
Seems like a memory leak. The only issue I’m currently aware of that may be
related is [1]. Could you tell if this JIRA relates to what you are bumping
into?
The JIRA mentions Kafka 09, but a fix is only available for Kafka 010 once we
bump our Kafka 010 dependency to the latest versi
No. This is the thread that answers my question -
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Does-FlinkKafkaConsumer010-care-about-consumer-group-td14323.html
Moiz
—
sent from phone
On 19-Jul-2017, at 10:04 PM, Ted Yu wrote:
This was the other thread, right ?
http://
Hi Fran,
is the DataTimeBucketer acts like a memory buffer and does't managed by flink's
state? If so, then i think the problem is not about Kafka, but about the
DateTimeBucketer. Flink won't take snapshot for the DataTimeBucketer if it not
in any state.
Best,
Sihua Zhou
At 2017-07-20 03
Hi,
Gordon (in CC) knows the details of Flink's Kafka consumer.
He might know how to solve this issue.
Best, Fabian
2017-07-19 20:23 GMT+02:00 PedroMrChaves :
> Hello,
>
> Whenever I submit a job to Flink that retrieves data from Kafka the memory
> consumption continuously increases. I've chang
Yes, it was fixed in FLINK-5109, but re-introduced again in FLINK-5705.
I will fix this again in FLINK-7226 and add tests to prevent it from
appearing again.
On 19.07.2017 20:18, Will Du wrote:
Thanks. Seems this has been fixed before in FLINK-5109
Sent from my iPhone
On Jul 19, 2017, at 0
Hi,
unfortunately, it is not possible to convert a DataStream into a DataSet.
Flink's DataSet and DataStream APIs are distinct APIs that cannot be used
together.
The FlinkML library is only available for the DataSet API.
There is some ongoing work to add a machine learning library for streaming
u
Hi Fran,
did you observe actual data loss due to the problem you are describing or
are you discussing a possible issue based on your observations?
AFAIK, Flink's Kafka consumer keeps track of the offsets itself and
includes these in the checkpoints. In case of a recovery, it does not rely
on the
Hi,
there are basically two operations to merge streams.
1. Union simply merges the input streams such that the resulting stream has
the records of all input streams. Union is a built-in operator in the
DataStream API. For that all streams must have the same data type.
2. Join connects records of
Hello,
Whenever I submit a job to Flink that retrieves data from Kafka the memory
consumption continuously increases. I've changed the max heap memory from
2gb, to 4gb, even to 6gb but the memory consumption keeps reaching the
limit.
An example of a simple Job that shows this behavior is depicte
Thanks. Seems this has been fixed before in FLINK-5109
Sent from my iPhone
> On Jul 19, 2017, at 07:47, Chesnay Schepler wrote:
>
> Hello,
>
> Looks like you stumbled upon a bug in our REST API and use a client that is
> stricter than others.
>
> I will create a JIRA for this.
>
> Regards,
Hello -
I've been successful working with Flink in Java, but have some trouble trying
to leverage the ML library, specifically with KNN.
>From my understanding, this is easier in Scala [1] so I've been converting my
>code.
One issue I've encountered is - How do I get a DataSet[Vector] from a
Da
This was the other thread, right ?
http://search-hadoop.com/m/Flink/VkLeQ0dXIf1SkHpY?subj=Re+Does+job+restart+resume+from+last+known+internal+checkpoint+
On Wed, Jul 19, 2017 at 9:02 AM, Moiz Jinia wrote:
> Yup! Thanks.
>
> Moiz
>
> —
> sent from phone
>
> On 19-Jul-2017, at 9:21 PM, Aljoscha K
Yup! Thanks.
Moiz
—
sent from phone
On 19-Jul-2017, at 9:21 PM, Aljoscha Krettek [via Apache Flink User Mailing
List archive.] wrote:
This was now answered in your other Thread, right?
Best,
Aljoscha
>> On 18. Jul 2017, at 11:37, Moiz Jinia <[hidden email]> wrote:
>>
>> Aljoscha Krettek wr
This was now answered in your other Thread, right?
Best,
Aljoscha
> On 18. Jul 2017, at 11:37, Moiz Jinia wrote:
>
> Aljoscha Krettek wrote
>> Hi,
>> zero-downtime updates are currently not supported. What is supported in
>> Flink right now is a savepoint-shutdown-restore cycle. With this, you
Hi Timo,
I just modified AvroOutputFormatTest to test this and it works fine!. I
don't plan to use it to key by, but it is a good point. Thanks.
Regards,
Vishnu
On Wed, Jul 19, 2017 at 10:57 AM, Timo Walther wrote:
> We have similar checks in our KafkaAvroTableSource, but I could not find
> su
Hi,
We have a Flink job running on AWS EMR sourcing a Kafka topic and
persisting the events to S3 through a DateTimeBucketer. We configured the
bucketer to flush to S3 with an inactivity period of 5 mins.The rate at
which events are written to Kafka in the first place is very low so it is
easy for
Raw state can only be used when implementing an operator, not a function.
For functions you have to use Managed Operator State. Your function will
have to implement
the CheckpointedFunction interface, and create a ValueStateDescriptor
that you register in initializeState.
On 19.07.2017 15:28,
Thanks for the reply, but I am not using it for managed state, but rather for
the raw state
In my implementation I have the following
class DataProcessorKeyed extends CoProcessFunction[WineRecord, ModelToServe,
Double]{
// The managed keyed state see
https://ci.apache.org/projects/flink/flin
Hello,
Looks like you stumbled upon a bug in our REST API and use a client
that is stricter than others.
I will create a JIRA for this.
Regards,
Chesnay
On 19.07.2017 13:31, Will Du wrote:
Hi folks,
I am using a java rest client - unirest lib to GET from flink rest API to get a
Job status
Hi folks,
I am using a java rest client - unirest lib to GET from flink rest API to get a
Job status. I got exception-unsupported content encoding -UTF8.
Do you guys known how to resolve it? I use postman client working fine.
Thanks,
Will
Hello,
I assume you're passing the class of your serializer in a
StateDescriptor constructor.
If so, you could add a breakpoint in
Statedescriptor#initializeSerializerUnlessSet,
and check what typeInfo is created and which serializer is created as a
result.
One thing you could try right aw
Hello,
this problem is described in
https://issues.apache.org/jira/browse/FLINK-6689.
Basically, if you want to use the LocalFlinkMiniCluster you should use a
TestStreamEnvironment instead.
The RemoteStreamEnvironment only works with a proper Flink cluster.
Regards,
Chesnay
On 14.07.2017 1
Hi Vishnu,
I took a look into the code. Actually, we should support it. However,
those types might be mapped to Java Objects that will be serialized with
our generic Kryo serializer. Have you tested it?
Regards,
Timo
Am 19.07.17 um 06:30 schrieb Martin Eden:
Hey Vishnu,
For those of us on
Great thanks that was very helpful.
One last question -
> If your job code hasn’t changed across the restores, then it should be
> fine even if you didn’t set the UID.
What kind of code change? What if the operator pipeline is still the same
but there's a some business logic change?
On Wed, J
Does this mean I can use the same consumer group G1 for the newer version A'?
And inspite of same consumer group, A' will receive messages from all
partitions when its started from savepoint?
Yes. That’s true. Flink internally uses static partition assignment, and the
clients are assigned whate
Does this mean I can use the same consumer group G1 for the newer version
A'? And inspite of same consumer group, A' will receive messages from all
partitions when its started from savepoint?
I am using Flink 1.2.1. Does the above plan require setting uid on the
Kafka source in the job?
Thanks,
M
Hi!
The only occasions which the consumer group is used is:
1. When committing offsets back to Kafka. Since Flink 1.3, this can be disabled
completely (both when checkpointing is enabled or disabled). See [1] on details
about that.
2. When starting fresh (not starting from some savepoint), if yo
Below is a plan for downtime-free upgrade of a Flink job. The downstream
consumer of the Flink job is duplicate proof.
Scenario 1 -
1. Start Flink job A with consumer group G1 (12 slot job)
2. While job A is running, take a savepoint AS.
3. Start newer version of Flink job A' from savepoint AS wit
I have three data streams
1. app exposed and click
2. app download
3. app install
How can i merge the streams to create a unified stream,then compute it on
time-based windows
Thanks
33 matches
Mail list logo