Re: Gelly ran out of memory

2015-08-20 Thread Henry Saputra
Hi Stephan, this looks like a bug to me. Shouldn't the memory manager switch to out of managed area if it is out of memory space? - Henry On Thu, Aug 20, 2015 at 3:09 AM, Stephan Ewen wrote: > Actually, you ran out of "Flink Managed Memory", not user memory. User > memory shortage manifests itse

Re: Using HadoopInputFormat files from Flink/Yarn in a secure cluster gives an error

2015-08-20 Thread Robert Metzger
Hi Arnaud, I suspect the "HdfsTools" are something internal from your company? Are they doing any kerberos-related operations? Is the local cluster mode also reading files from the secured HDFS cluster? Flink is taking care of sending the authentication tokens from the client to the jobManager a

Re: Keep Model in Operator instance up to date

2015-08-20 Thread Welly Tambunan
Hi Gyula, I have a couple of operator on the pipeline. Filter, mapper, flatMap, and each of these operator contains some cache data. So i think that means for every other operator on the pipeline, i will need to add a new stream to update each cache data. Cheers On Thu, Aug 20, 2015 at 5:33 PM

Using HadoopInputFormat files from Flink/Yarn in a secure cluster gives an error

2015-08-20 Thread LINZ, Arnaud
Hello, My application handles as input and output some HDFS files in the jobs and in the driver application. It works in local cluster mode, but when I’m trying to submit it to a yarn client, when I try to use a HadoopInputFormat (that comes from a HCatalog request), I have the following error:

Re: Keep Model in Operator instance up to date

2015-08-20 Thread Gyula Fóra
Hi, I don't think I fully understand your question, could you please try to be a little more specific? I assume by caching you mean that you keep the current model as an operator state. Why would you need to add new streams in this case? I might be slow to answer as I am currently on vacation wi

Re: Gelly ran out of memory

2015-08-20 Thread Flavio Pompermaier
Using Stephan advice make things goin on! Then I had another exception about a non existing vertex but that is another story :) Thanks to all for the support! On Thu, Aug 20, 2015 at 12:09 PM, Stephan Ewen wrote: > Actually, you ran out of "Flink Managed Memory", not user memory. User > memory

Re: when use broadcast variable and run on bigdata display this error please help

2015-08-20 Thread Rico Bergmann
Because the broadcasted variable is completely stored at each operator. If you use a hash join, then both inputs can be hash partitioned. This reduces the amount of memory needed for each operator, I think. > Am 20.08.2015 um 12:14 schrieb hagersaleh : > > why this is not good broadcast v

Re: when use broadcast variable and run on bigdata display this error please help

2015-08-20 Thread hagersaleh
why this is not good broadcast variable use in bigdata -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/when-use-broadcast-variable-and-run-on-bigdata-display-this-error-please-help-tp2455p2468.html Sent from the Apache Flink User Mailing List

Re: when use broadcast variable and run on bigdata display this error please help

2015-08-20 Thread Rico Bergmann
As you can see from the exceptions your broadcast variable is too large to fit into the main memory. I think storing that amount of data in a broadcast variable is not the best approach. Try to use a dataset for this, I would suggest. > Am 20.08.2015 um 11:56 schrieb hagersaleh : > > pleas

Re: Flink to ingest from Kafka to HDFS?

2015-08-20 Thread Stephan Ewen
BTW: This is becoming a dev discussion, maybe should move to that list... On Thu, Aug 20, 2015 at 12:12 PM, Stephan Ewen wrote: > Yes, one needs exactly a mechanism to seek the output stream back to the > last checkpointed position, in order to overwrite duplicates. > > I think HDFS is not going

Re: Flink to ingest from Kafka to HDFS?

2015-08-20 Thread Stephan Ewen
Yes, one needs exactly a mechanism to seek the output stream back to the last checkpointed position, in order to overwrite duplicates. I think HDFS is not going to make this easy, there is basically no seek for write. Not sure how to solve this, other then writing to tmp files and copying upon suc

Re: Gelly ran out of memory

2015-08-20 Thread Stephan Ewen
Actually, you ran out of "Flink Managed Memory", not user memory. User memory shortage manifests itself as Java OutofMemoryError. At this point, the Delta iterations cannot spill. They additionally suffer a bit from memory fragmentation. A possible workaround is to use the option "setSolutionSetUn

Re: Gelly ran out of memory

2015-08-20 Thread Andra Lungu
Hi Flavio, These kinds of exceptions generally arise from the fact that you ran out of `user` memory. You can try to increase that a bit. In your flink-conf.yaml try adding # The memory fraction allocated system -user taskmanager.memory.fraction: 0.4 This will give 0.6 of the unit of memory to th

Gelly ran out of memory

2015-08-20 Thread Flavio Pompermaier
Hi to all, I tried to run my gelly job on Flink 0.9-SNAPSHOT and I was having an EOFException, so I tried on 0.10-SNAPSHOT and now I have the following error: Caused by: java.lang.RuntimeException: Memory ran out. Compaction failed. numPartitions: 32 minPartition: 73 maxPartition: 80 number of ov

Re: when use broadcast variable and run on bigdata display this error please help

2015-08-20 Thread hagersaleh
please help -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/when-use-broadcast-variable-and-run-on-bigdata-display-this-error-please-help-tp2455p2461.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.c

Re: Flink to ingest from Kafka to HDFS?

2015-08-20 Thread Rico Bergmann
My ideas for checkpointing: I think writing to the destination should not depend on the checkpoint mechanism (otherwise the output would never be written to the destination if checkpointing is disabled). Instead I would keep the offsets of written and Checkpointed records. When recovering you w

Re: Flink to ingest from Kafka to HDFS?

2015-08-20 Thread Aljoscha Krettek
Yes, this seems like a good approach. We should probably no reuse the KeySelector for this but maybe a more use-case specific type of function that can create a desired filename from an input object. This is only the first part, though. The hard bit would be implementing rolling files and also int

Re: Flink to ingest from Kafka to HDFS?

2015-08-20 Thread Rico Bergmann
I'm thinking about implementing this. After looking into the flink code I would basically subclass FileOutputFormat in let's say KeyedFileOutputFormat, that gets an additional KeySelector object. The path in the file system is then appended by the string, the KeySelector returns. U think thi