Hi Stephan, this looks like a bug to me. Shouldn't the memory manager
switch to out of managed area if it is out of memory space?
- Henry
On Thu, Aug 20, 2015 at 3:09 AM, Stephan Ewen wrote:
> Actually, you ran out of "Flink Managed Memory", not user memory. User
> memory shortage manifests itse
Hi Arnaud,
I suspect the "HdfsTools" are something internal from your company?
Are they doing any kerberos-related operations?
Is the local cluster mode also reading files from the secured HDFS cluster?
Flink is taking care of sending the authentication tokens from the client
to the jobManager a
Hi Gyula,
I have a couple of operator on the pipeline. Filter, mapper, flatMap, and
each of these operator contains some cache data.
So i think that means for every other operator on the pipeline, i will need
to add a new stream to update each cache data.
Cheers
On Thu, Aug 20, 2015 at 5:33 PM
Hello,
My application handles as input and output some HDFS files in the jobs and in
the driver application.
It works in local cluster mode, but when I’m trying to submit it to a yarn
client, when I try to use a HadoopInputFormat (that comes from a HCatalog
request), I have the following error:
Hi,
I don't think I fully understand your question, could you please try to be
a little more specific?
I assume by caching you mean that you keep the current model as an operator
state. Why would you need to add new streams in this case?
I might be slow to answer as I am currently on vacation wi
Using Stephan advice make things goin on! Then I had another exception
about a non existing vertex but that is another story :)
Thanks to all for the support!
On Thu, Aug 20, 2015 at 12:09 PM, Stephan Ewen wrote:
> Actually, you ran out of "Flink Managed Memory", not user memory. User
> memory
Because the broadcasted variable is completely stored at each operator.
If you use a hash join, then both inputs can be hash partitioned. This reduces
the amount of memory needed for each operator, I think.
> Am 20.08.2015 um 12:14 schrieb hagersaleh :
>
> why this is not good broadcast v
why this is not good broadcast variable use in bigdata
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/when-use-broadcast-variable-and-run-on-bigdata-display-this-error-please-help-tp2455p2468.html
Sent from the Apache Flink User Mailing List
As you can see from the exceptions your broadcast variable is too large to fit
into the main memory.
I think storing that amount of data in a broadcast variable is not the best
approach. Try to use a dataset for this, I would suggest.
> Am 20.08.2015 um 11:56 schrieb hagersaleh :
>
> pleas
BTW: This is becoming a dev discussion, maybe should move to that list...
On Thu, Aug 20, 2015 at 12:12 PM, Stephan Ewen wrote:
> Yes, one needs exactly a mechanism to seek the output stream back to the
> last checkpointed position, in order to overwrite duplicates.
>
> I think HDFS is not going
Yes, one needs exactly a mechanism to seek the output stream back to the
last checkpointed position, in order to overwrite duplicates.
I think HDFS is not going to make this easy, there is basically no seek for
write. Not sure how to solve this, other then writing to tmp files and
copying upon suc
Actually, you ran out of "Flink Managed Memory", not user memory. User
memory shortage manifests itself as Java OutofMemoryError.
At this point, the Delta iterations cannot spill. They additionally suffer
a bit from memory fragmentation.
A possible workaround is to use the option "setSolutionSetUn
Hi Flavio,
These kinds of exceptions generally arise from the fact that you ran out of
`user` memory. You can try to increase that a bit.
In your flink-conf.yaml try adding
# The memory fraction allocated system -user
taskmanager.memory.fraction: 0.4
This will give 0.6 of the unit of memory to th
Hi to all,
I tried to run my gelly job on Flink 0.9-SNAPSHOT and I was having an
EOFException, so I tried on 0.10-SNAPSHOT and now I have the following
error:
Caused by: java.lang.RuntimeException: Memory ran out. Compaction failed.
numPartitions: 32 minPartition: 73 maxPartition: 80 number of ov
please help
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/when-use-broadcast-variable-and-run-on-bigdata-display-this-error-please-help-tp2455p2461.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at
Nabble.c
My ideas for checkpointing:
I think writing to the destination should not depend on the checkpoint
mechanism (otherwise the output would never be written to the destination if
checkpointing is disabled). Instead I would keep the offsets of written and
Checkpointed records. When recovering you w
Yes, this seems like a good approach. We should probably no reuse the
KeySelector for this but maybe a more use-case specific type of function
that can create a desired filename from an input object.
This is only the first part, though. The hard bit would be implementing
rolling files and also int
I'm thinking about implementing this.
After looking into the flink code I would basically subclass FileOutputFormat
in let's say KeyedFileOutputFormat, that gets an additional KeySelector object.
The path in the file system is then appended by the string, the KeySelector
returns.
U think thi
18 matches
Mail list logo