Hi Fabian,
AsyncFunction and ProcessFunction do help!
I assume per event timers I created in implement RichProcessFunction will
be part of key grouped states & cached in memory during runtime right? I am
interested in this because we are targeting large deployment of million TPS
event source. I w
Issue reported:
https://issues.apache.org/jira/browse/FLINK-5633
Sorry for taking so long
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-get-help-on-ClassCastException-when-re-submitting-a-job-tp10972p11277.html
Sent from the Apach
Hey Cliff
You can upload a jar file using http post with the file data sent under a
form field 'jarfile'.
Can you also please open a jira for fixing the documentation?
- Sachin
On Jan 25, 2017 06:55, "Cliff Resnick" wrote:
> The 1.2 release documentation (https://ci.apache.org/project
> s/fli
Hi,
I am trying to setup flink with Yarn on Mapr cluster. I built flink
(flink-1.3-SNAPSHOT) as follows:
mvn clean install -DskipTests -Pvendor-repos
-Dhadoop.version=2.7.0-mapr-1607
The build is successful. Then I try to run ./bin/yarn-session.sh -n 4
(without changing any config or whatsoever
The 1.2 release documentation (https://ci.apache.org/
projects/flink/flink-docs-release-1.2/monitoring/rest_api.html) states "It
is possible to upload, run, and list Flink programs via the REST APIs and
web frontend". However there is no documentation about uploading a jar via
REST api. Does this
Hi @Fabian, @Gabor, and @Aljoscha,
Thank you for your help! It works as expected.
Best regards,
Ivan.
On Tue, 24 Jan 2017 at 17:04 Fabian Hueske wrote:
> Aljoscha, you are right.
> The second mapPartition() needs to have parallelism(1), but the
> sortPartition() as well:
>
>
> dataset // assum
The performance hit due to decoding the JSON is expected and there is not a
lot (except for changing the encoding that I can do about that). Alright.
When joining the above stream with another stream I get another performance
hit by ~80% so that in the end I have only 1k msgs/s remaining. Do you k
Hi Aljoscha,
Thanks.
Yes, we are using Event Time.
Yes, Flink program is kept running in the IDE, i.e. eclipse and not closed,
after the first batch of events and when adding the second batch.
Yes, We do have acustom timestamp/watermark assigner, implemented as
BoundedOutOfOrdernessGenerator2
A
Hi Stephan,
Thank you for answering my question.
I try option 2 and it gives me correct results reading several sources,
while using ParallelSourceFunction it gives 4 times redundancy (same as my
number of threads).
Can I ask what would be the reason causing the difference? I think I don't
under
Thanks for the clarification. I'm not familiar enough with the
internals of flink to offer any technical suggestions, but it'd be
nice to have some more documentation around testing flink and possible
pitfalls like this.
For anybody with the same issue, note that IngestionTime also works,
and is s
I am running 1.1.4. It does look like there were problems with the connection
to Zookeeper due to overworking the network. I'm not sure what I can do about
it (not sure what happens when a JM loses leadership), but ideally a
cluster-wide failure would not result in losing all the jobs, checkpoin
One thing you can try and do is to enable object reuse in the execution
config.
That should get rid of the overhead when passing the JSON objects from
function to function.
On Tue, Jan 24, 2017 at 6:00 PM, Aljoscha Krettek
wrote:
> Hi,
> I think MyJsonDecoder is the bottleneck and I'm also afrai
I haven't seen it yet, I'll let you know if I do.
My last whole-cluster failure seems to have been caused by placing too much
load on the cluster. We had a job that got up to 12GB in checkpoint size.
Current cluster is 6x c3.2xlarge. The logs show a lot of
"java.net.SocketException: Connection
Hi,
a bit more information would be useful. Are you using event-time? Is the
Flink program kept running after adding the first batch of events and when
adding the second batch or is it to invocations of your Flink program? Do
you have a custom timestamp/watermark assigner?
Cheers,
Aljoscha
On Tue
Aljoscha, you are right.
The second mapPartition() needs to have parallelism(1), but the
sortPartition() as well:
dataset // assuming some partitioning that can be reused to avoid a shuffle
.sortPartition(1, Order.DESCENDING)
.mapPartition(new ReturnFirstTen())
.sortPartition(1, Order.DESCEN
Hi,
I think MyJsonDecoder is the bottleneck and I'm also afraid there is
nothing to do because parsing Strings to Json is simply slow.
I think you would see the biggest gains if you had a binary representation
that can quickly be serialised/deserialised to objects and you use that
instead of Strin
@Fabian, I think there's a typo in your code, shouldn't it be
dataset // assuming some partitioning that can be reused to avoid a shuffle
.sortPartition(1, Order.DESCENDING)
.mapPartition(new ReturnFirstTen())
.sortPartition(1, Order.DESCENDING)
.mapPartition(new ReturnFirstTen()).parallel
Hi,
I'm afraid there is no way of making this work with the current
implementation. Especially getting this to work in a distributed setting
seems hard.
I'm very open for suggestions on this topic, though. :-)
Cheers,
Aljoscha
On Mon, 23 Jan 2017 at 23:19 Steven Ruppert wrote:
> Hi,
>
> I'm at
Hello Fabian,
First, I would like to thank you for your suggestion and the additional
information on determinism and partition policies. As I mentioned on my initial
email, I am new to Flink and every additional piece of advice makes my
“learning curve” less steep. In addition, I am aware that
Hi,
that wording is from a time where no-one though about VMs with virtual
cores. IMHO this maps directly to virtual cores so you should set it
according to the number of virtual cores of your VMs.
Cheers,
Aljoscha
On Mon, 23 Jan 2017 at 11:51 Nancy Estrada
wrote:
> Hi all,
>
> I have been read
+Till Rohrmann , do you know what can be used to
access a HA cluster from that setting.
Adding Till since he probably knows the HA stuff best.
On Sun, 22 Jan 2017 at 15:58 Maciek Próchniak wrote:
> Hi,
>
> I have standalone Flink cluster configured with HA setting (i.e. with
> zookeeper recover
Hi Shannon!
I was wondering if you still see this issue in Flink 1.1.4?
Just thinking that another possible cause for the issue could be that there
is a connection leak somewhere (Flink code or user code or vendor library)
and thus the S3 connector's connection pool starves.
For Flink 1.2, there
Hi!
I think there were some issues in the HA recovery of 1.1.3 that were fixed
in 1.1.4 and 1.2.0.
What version are you running on?
Stephan
On Sat, Jan 21, 2017 at 4:58 PM, Ufuk Celebi wrote:
> Hey Shannon,
>
> the final truth for recovery is in ZooKeeper. Can you check whether
> there also r
Hi!
I think the best way to get away from Kryo is to write types that go
through Flink's own serialization stack:
Have a look here for a bit of background:
https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/types_serialization.html#flinks-typeinformation-class
BTW: Is the "hot map" i
Hi,
in fact, this was just merged:
https://issues.apache.org/jira/browse/FLINK-5582. It will be released as
part of Flink 1.3 in roughly 4 months. The feature is still a bit rough
around the edges and needs some follow-up work, however.
Cheers,
Aljoscha
On Thu, 12 Jan 2017 at 11:10 Xingcan wrote
Hi Billy,
the stack trace seems to indicate that there is a problem at the point
where the data sink is trying to read the input elements so it doesn't seem
to be related to the source. Could you also post what sinks you have and
what the type of the input elements of these sinks are?
Cheers,
Aljo
Hi,
We are using a sliding window function to process data read from Kafka
Stream. We are using FlinkKafkaConsumer09 to read the data. The window
function and sink are running correctly.
To test the program, we are generating a stream of data from command line.
This works when we add set of recor
Hi Till,
thank you for the very helpful hints. You are right, I already see
backpressure. In my case, that’s ok because it throttles the Kafka source.
Speaking of which: You mentioned putting the rate limiting mechanism into the
source. How can I do this with a Kafka source? Just extend the Pro
Hi,
I've just added my custom MsgPack serializers hoping to see performance
increase. I covered all data types in between chains.
However this Kryo method still takes a lot of CPU: IdentityObjectIntMap.get
Is there something else should be configured?
Or is there no way to get away from Kryo ove
Hello!I'm reposting this since the other thread had some formatting issues
apparently. I hope this time it works.I'm having performance problems with a
Flink job. If there is anything valuable missing, please ask and I will try
to answer ASAP. My job looks like this:First, I read data from Kafka. T
Hi !
I have the same problem on my laptop and on my desk at work.
I have also tested, it appears under private browsing.
Regards,
Guillaume
2017-01-24 11:49 GMT+01:00 Stephan Ewen :
> Hi!
>
> Is this a dashboard caching issue? Can you try to "force refresh" the
> dashboard?
>
> Please let us
RC1 creation is in progress ...
On Mon, Jan 23, 2017 at 10:33 AM, Robert Metzger
wrote:
> Hi all,
>
> I would like to do a proper voting RC1 early this week.
> From the issues mentioned here, most of them have pull requests or were
> changed to a lower priority.
> Once we've merged all outstandi
I don't even have images in there :O Will delete this thread and create a new
one.
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Improving-Flink-performance-tp11211p11245.html
Sent from the Apache Flink User Mailing List archive. mailing li
You are of course right Gabor.
@Ivan, you can use a heap in the MapPartitionFunction to collect the top 10
elements (note that you need to create deep-copies if object reuse is
enabled [1]).
Best, Fabian
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/batch/index.html#operati
Hi!
Is this a dashboard caching issue? Can you try to "force refresh" the
dashboard?
Please let us know if that solves the issue
(+Chesnay)
Stephan
On Tue, Jan 24, 2017 at 11:08 AM, Salou Guillaume wrote:
> Hello flink users !
>
> I'm using flink Apache Flink 1.2.0 RC0, the tasks status in
Hello,
Btw. there is a Jira about this:
https://issues.apache.org/jira/browse/FLINK-2549
Note that the discussion there suggests a more efficient approach,
which doesn't involve sorting the entire partitions.
And if I remember correctly, this question comes up from time to time
on the mailing lis
Hi Ivan,
I think you can use MapPartition for that.
So basically:
dataset // assuming some partitioning that can be reused to avoid a shuffle
.sortPartition(1, Order.DESCENDING)
.mapPartition(new ReturnFirstTen())
.sortPartition(1, Order.DESCENDING).parallelism(1)
.mapPartition(new Return
Hi Chen,
if you plan to implement your application on top of the upcoming Flink
1.2.0 release, you might find the new AsyncFunction [1] and the
ProcessFunction [2] helpful.
AsyncFunction can be used for non-blocking calls to external services and
maintains the checkpointing semantics.
ProcessFunct
Hi Nikos,
Flink's windows require a KeyedStream because they use the keys to manage
their internal state (each in-progress window has some state that needs to
be persisted and checkpointed).
Moreover, Flink's event-time window operators return a deterministic
result. In your use-case, the result o
Hello flink users !
I'm using flink Apache Flink 1.2.0 RC0, the tasks status in Apache web
dashboard are CANCELED, but my tasks are yet running and doing their job.
The status in the main page is RUNNING
I have the same problem with 2 different jobs.
Regards,
Guillame
Hi,
I have a dataset of tuples with two fields ids and ratings and I need to
find 10 elements with the highest rating in this dataset. I found a
solution, but I think it's suboptimal and I think there should be a better
way to do it.
The best thing that I came up with is to partition dataset by r
41 matches
Mail list logo