Hello,
I am working on a ParquetSink writer, which will convert a kafka stream to
parquet format. I am having some weird issues in deploying this application
to a yarn cluster. I am not 100% sure this falls into a flink related
error, but I wanted to reach out to folks here incase it might be.
I
Hi,
I'm not a Kafka expert but I think you need to have more than 1 Kafka
partition to process multiple documents at the same time. Make also sure
to send the documents to different partitions.
Regards,
Timo
Am 10/2/17 um 6:46 PM schrieb r. r.:
Hello
I'm running a job with "flink run -p5"
Hi,
I would recommend to implement your custom trigger in this case. You can
override the default trigger of your window:
https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/windows.html#triggers
This is the interface where you can control the triggering:
https://
Hi Björn,
I don't know if I get your example correctly, but I think your
explanation "All events up to and equal to watermark should be handled
in the prevoius window" is not 100% correct. Watermarks just indicate
the progress ("until here we have seen all events with lower timestamp
than X"
We are planning to work on this clean shut down after releasing Flink 1.4.
Implementing this properly would require some work, for example:
- adding some checkpoint options to add information about “closing”/“shutting
down” event
- add clean shutdown to source functions API
- implement handling o
Hello
I'm running a job with "flink run -p5" and additionally set
env.setParallelism(5).
The source of the stream is Kafka, the job uses FlinkKafkaConsumer010.
In Flink UI though I notice that if I send 3 documents to Kafka, only one
'instance' of the consumer seems to receive Kafka's record and
Thanks Timo, watching the video now.
I did try out the method with iteration in a simple prototype and
it works. But you are right, combining it with the other
requirements into a single process function has so far resulted in
more complexity than I'd like, and it'
Thanks Piotr for your answer, we sadly can't use kafka 0.11 for now (and
until a while).
We can not afford tens of thousands of duplicated messages for each
application upgrade, can I help by working on this feature ?
Do you have any hint or details on this part of that "todo list" ?
Le lun. 2 o
Hi,
For failures recovery with Kafka 0.9 it is not possible to avoid duplicated
messages. Using Flink 1.4 (unreleased yet) combined with Kafka 0.11 it will be
possible to achieve exactly-once end to end semantic when writing to Kafka.
However this still a work in progress:
https://issues.apach
Is it possible to limit session windowing to a max of n seconds/hours etc?
i.e. I would like a session window, but if a session runs for an
unacceptable amount of time, I would like to close it.
Thanks,
I have a complex alg implemented using the DataSet api and by default it
runs with parallel 90 for good performance. At the end I want to perform a
clustering of the resulting data and to do that correctly I need to pass
all the data through a single thread/process.
I read in the docs that as long
Hi,
When taking a savepoint on AWS EMR I get the following error
[hadoop@ip-10-12-169-172 ~]$ flink savepoint
e14a6402b6f1e547c4adf40f43861c27
Retrieving JobManager.
The program finished with the following exception:
org.apache.flink
Hi,
I'm working on a flink streaming app with a kafka09 to kafka09 use case
which handles around 100k messages per seconds.
To upgrade our application we used to run a flink cancel with savepoint
command followed by a flink run with the previous saved savepoint and the
new application fat jar as
Nice idea. Actually we are looking at connect for other parts of our solution
in which the latency is less critical.
A few considerations of not using ‘connect’ in this case were:
1. To isolate the two streams from each other to reduce complexity,
simplify debugging etc…. – since we are
Hello all,
This is Bo, I met some problems when I tried to use flink in my mesos
cluster (1 master, 2 slaves (cpu has 32 cores)).
I tried to start the mesos-appmaster.sh in marathon, the job manager is
started without problem.
mesos-appmaster.sh -Djobmanager.heap.mb=1024 -Dtaskmanager.heap.mb=1024
Thanks, Chesnay, that was indeed the problem.
It also explains why -p5 was not working for me from the cmdline
Best regards
Robert
> Оригинално писмо
>От: Chesnay Schepler ches...@apache.org
>Относно: Re: how many 'run -c' commands to start?
>До: "r. r."
>Изпратено
How about connecting two streams of data, one from the reference data and
one from the main data (I assume using key streams as you mention
QueryableState) and keep state locally within the operator.
The idea is to have a local sub-copy of the reference data within the
operator that is updated from
Thank you.
"If SourceFunction.run methods returns without an exception Flink assumes
that it has cleanly shutdown and that there were simply no more elements to
collect/create by this task. "
This sentence solve my confusion.
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.
Hi Timo,
thank you for your response. Just yesterday I tried using the jdbc
connector and unfortunately I found out that HivePreparedStatement and
HiveStatement implementations still don't have an addBatch implementation,
whose interface is being used in the connector. The first dirty solution
tha
Hi,
I have a question regarding timing of events.
According to;
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/event_time.html#event-time-and-watermarks
All events up to and equal to watermark should be handled in "the prevoius
window".
In my case I use event-timestamp.
I'm t
We have an operator in our streaming application that needs to access
'reference data' that is updated by another Flink streaming application. This
reference data has about ~10,000 entries and has a small footprint. This
reference data needs to be updated ~ every 100 ms. The required latency for
Hi Federico,
would it help to buffer events first and perform batches of insertions
for better throughtput? I saw some similar work recently here:
https://tech.signavio.com/2017/postgres-flink-sink
But I would first try the AsyncIO approach, because actually this is a
use case it was made fo
Hi Derek,
maybe the following talk can inspire you, how to do this with joins and
async IO: https://www.youtube.com/watch?v=Do7C4UJyWCM (around the 17th
min). Basically, you split the stream and wait for an Async IO result in
a downstream operator.
But I think having a transient guava cache
As a followup:
the flink job has currently an uptime of almost 24 hours, with no
checkpoint failed or restart whereas, with async snapshots, it would have
already crashed 50 or so times.
Regards,
Federico
2017-09-30 19:01 GMT+02:00 Federico D'Ambrosio <
federico.dambro...@smartlab.ws>:
> Thank
Hi, I've implemented a sink for Hive as a RichSinkFunction, but once I've
integrated it in my current flink job, I noticed that the processing of the
events slowed down really bad, I guess because of some blocking calls that
need to be when interacting with hive streaming api.
So, what can be done
Hi Marcus,
from a first glance your pipeline looks correct. It should not be
executed with a parallelism of one, if not specified explicitly. Which
time semantics are you using? If it is event-time, I would check your
timestamps and watermarks assignment. Maybe you can also check in the
web f
26 matches
Mail list logo