I have couple more questions regarding flink's jvm memory.
In a streaming application what is managed memory used for? I read from a
blog that all objects created inside the user function will go into
unmanaged memory. Where does the managed key/ operator state state reside?
Also when does the st
Hi There ,
I'm using custom writer with hourly Rolling Bucket sink . I'm seeing two
issue
first one if write the same file on s3 all the files
gets committed , however when I write the same on HDFS I see its remains on
.pending state , could be related to second problem below
Second issue : My c
Hi Ted,
I believe HDFS-6584 is more of an HDFS feature supporting archive use case
through some policy configurations.
My ask is that I have two distinct HCFS File systems which are independent but
the Flink job will decide which one to use for sink while the Flink
infrastructure is by default c
Hi Team,
I am using the apache flink with java for below problem statement
1.where i will read a csv file with field delimeter character ;
2.transform the fields
3.write back the data back to csv
my doubts are as below
1. if i need to read the csv file of size above 50 gb what would be the app
Would HDFS-6584 help with your use case ?
On Wed, Aug 23, 2017 at 11:00 AM, Vijay Srinivasaraghavan <
vijikar...@yahoo.com.invalid> wrote:
> Hello,
> Is it possible for a Flink cluster to use multiple HDFS repository (HDFS-1
> for managing Flink state backend, HDFS-2 for syncing results from user
Hello,
Is it possible for a Flink cluster to use multiple HDFS repository (HDFS-1 for
managing Flink state backend, HDFS-2 for syncing results from user job)?
The scenario can be viewed in the context of running some jobs that are meant
to push the results to an archive repository (cold storage)
Hi Steven,
Yes, GC is a big overhead, it may cause your CPU utilization to reach
100%, and every process stopped working. We ran into this a while too.
How much memory did you assign to TaskManager? How much the your CPU
utilization when your taskmanager is considered 'killed'?
Bowen
O
Till,
Once our job was restarted for some reason (e.g. taskmangaer container got
killed), it can stuck in continuous restart loop for hours. Right now, I
suspect it is caused by GC pause during restart, our job has very high
memory allocation in steady state. High GC pause then caused akka timeout
Hi!
The sink is merely a union of the result of the co-group and the one data
source.
Can't you just make to distinct pipelines out of that? One with co-group ->
data sink pipeline and one with the source->sink pipeline?
They could even be part of the same job...
Best,
Stephan
On Wed, Aug 23, 2
Hi,
The reason is that there are two (or more) different Threads doing the reading.
As an illustration, consider first this case:
DataSet input = ...
input.map(new MapA()).map(new MapB())
Here, MapB is technically "wrapped" by MapA and when MapA emits data this is
directly going the the map()
You could enable object reuse [0] if you application allows that. Also
adjusting the managed memory size [1] can help.
Are you using Flink's graph library Gelly?
[0]
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/index.html#object-reuse-enabled
[1]
https://ci.apache.org
Hi Anugrah,
you can track the progress at the accompanying jira issue:
https://issues.apache.org/jira/browse/FLINK-6916
Currently, roughly half of the tasks are done with a few remaining in PR
review stage. Note that the actual implementation differs a bit from what was
proposed in FLIP 19 thoug
Thanks Aljoscha for the prompt response.
Can you explain the technical reason for the single predecessor rule? This
makes what we are trying to do much more expensive. Really what we’re doing is
reading a parquet file, doing several maps/filters on the records and writing
to the parquet. There
Does someone has a current performance test based on PageRank or an idea why
Flink lost the comparison?
> Am 18.08.2017 um 19:51 schrieb Kaepke, Marc :
>
> Hi everyone,
>
> I compared Flink and Spark by using PageRank. I guessed Flink will beat Spark
> or have the same level. But Spark is up
I think Nico in CC knows more about it.
Am 23.08.17 um 14:34 schrieb anugrah nayar:
Hey,
I was reading through the flink documentation and found that there
were some plans around stand alone blob server. I found the details
here
https://cwiki.apache.org/confluence/display/FLINK/FLIP-19%3A+Imp
Hey,
I was reading through the flink documentation and found that there were
some plans around stand alone blob server. I found the details here
https://cwiki.apache.org/confluence/display/FLINK/FLIP-19%3A+Improved+BLOB+storage+architecture.
Just wanted to know whether this was done.
regards,
Anugr
Hi Tony,
Thank you for this thorough explanation. Helps a lot!
Kind Regards,
Tomasz
On 23 August 2017 at 11:30, Tony Wei wrote:
> Hi Tomasz,
>
> Actually, window is not a real operator shared by your operators created by
> reduce() and apply() function.
> Flink implemented WindowOperator by bin
Hi Tomasz,
Actually, window is not a real operator shared by your operators created by
reduce() and apply() function.
Flink implemented WindowOperator by binding window(), trigger() and
evictor() as well with the WindowFunction.
It is more like the prior operator sent elements to two following ope
Hi Tony,
Won't that increase the amount of processing Flink has to do? It would
have to window twice, right?
Thanks,
Tomasz
On 23 August 2017 at 11:02, Tony Wei wrote:
> Hi Tomasz,
>
> In my opinion, I would move .window() function down to these two DataStream.
> (rawEvent.window().reduce().map
Hi Tomasz,
In my opinion, I would move .window() function down to these two
DataStream. (rawEvent.window().reduce().map(), and so does metrics)
It makes sure that they won't share the same constructor.
Regards,
Tony Wei
2017-08-23 17:51 GMT+08:00 Tomasz Dobrzycki :
> Hi Tony,
>
> Thank you for
Hi Tony,
Thank you for your answer, it definitely helps with understanding this
situation.
Is there any reliable way to split the stream so I get 2 outputs that
avoids this behaviour? Eventually I want to have 2 sinks that output
different data (one being just a copy of the stream, but organised i
Hi Tomasz,
I think this is because .window() is a lazy operator.
It just creates a WindowedStream class but not create a corresponding
operator.
The operator will be created after you called .reduce() and .apply().
rawEvents and metrics actually shared the same object to create their own
operator
Thanks James for sharing your experience. I find it very interesting :-)
On Tue, Aug 22, 2017 at 9:50 PM, Hao Sun wrote:
> Great suggestions, the etcd operator is very interesting, thanks James.
>
>
> On Tue, Aug 22, 2017, 12:42 James Bucher wrote:
>>
>> Just wanted to throw in a couple more de
Hi Jerry,
You can learn about Flink's windowing mechanics in this blog (
https://flink.apache.org/news/2015/12/04/Introducing-windows.html).
To my understanging, window() defines how Flink use WindowAssigner to
insert an element to the right windows, trigger() defines when to fire a
window and e
24 matches
Mail list logo