Hi,
@Ted:
> Is it possible to prune (unneeded) field(s) so that heap requirement is
> lower ?
The XmlInputFormat [0] splits the raw data into smaller chunks, which
are then further processed. I don't think I can reduce the field's
(Tuple2) sizes. The major difference to Mahout's
XmlInputFormat i
I think the only way is adding more managed memory.
The large record handler only take effects in reduce side which used by the
merge sorter. According to
the exception, it is thrown during the combing phase which only uses an
in-memory sorter, which doesn't
have large record handle mechanism.
Be
Here are some pointers
- You would rather need MORE managed memory, not less, because the sorter
uses that.
- We added the "large record handler" to the sorter for exactly these use
cases. Can you check in the code whether it is enabled? You'll have to go
through a bit of the code to see that
For #2, XmlInputFormat was involved.
Is it possible to prune (unneeded) field(s) so that heap requirement is
lower ?
On Wed, Jun 14, 2017 at 8:47 AM, Sebastian Neef <
gehax...@mailbox.tu-berlin.de> wrote:
> Hi Ted,
>
> sure.
>
> Here's the stack strace with .distinct() with the Exception in the
Hi Ted,
sure.
Here's the stack strace with .distinct() with the Exception in the
'SortMerger Reading Thread': [1]
Here's the stack strace without .distinct() and the 'Requested array
size exceeds VM limit' error: [2]
If you need anything else, I can more or less reliably reproduce the issue.
T
For the 'Requested array size exceeds VM limit' error, can you pastebin the
full stack trace ?
Thanks
On Wed, Jun 14, 2017 at 3:22 AM, Sebastian Neef <
gehax...@mailbox.tu-berlin.de> wrote:
> Hi,
>
> I removed the .distinct() and ran another test.
>
> Without filtering duplicate entries, the Job
Hi,
I removed the .distinct() and ran another test.
Without filtering duplicate entries, the Job processes more data and
runs much longer, but eventually fails with the following error:
> java.lang.OutOfMemoryError: Requested array size exceeds VM limit
Even then playing around with the aforeme
aSource (at
> createInput(ExecutionEnvironment.java:552)
> (org.apache.flink.api.java.hadoop.mapred.HadoopInputFormat)) ->
> Combine(Distinct at parseDumpData(SkipDumpParser.java:43)) (28/40)
> java.io.IOException: Cannot write record to fresh sort buffer. Record too
> large.
Best,
Sebastian
Hi,
Can you paste some code snippet to show how you use the DataSet API?
Best,
Kurt
On Tue, Jun 13, 2017 at 4:29 PM, Sebastian Neef <
gehax...@mailbox.tu-berlin.de> wrote:
> Hi Kurt,
>
> thanks for the input.
>
> What do you mean with "try to disable your combiner"? Any tips on how I
> can do t
Hi Flavio,
thanks for pointing me to your old thread.
I don't have administrative rights on the cluster, but from what dmesg
reports, I could not find anything that looks like an OOM message.
So no luck for me, I guess...
Best,
Sebastian
Hi Ted,
thanks for bringing this to my attention.
I just rechecked my Java version and it is indeed version 8. Both the
code and the Flink environment run that version.
Cheers,
Sebastian
Hi Kurt,
thanks for the input.
What do you mean with "try to disable your combiner"? Any tips on how I
can do that?
I don't actively use any combine* DataSet API functions, so the calls to
the SynchronousChainedCombineDriver come from Flink.
Kind regards,
Sebastian
set, it successfully
> finishes. However, when a bigger dataset is used, I get multiple
> exceptions:
>
> - Caused by: java.io.IOException: Cannot write record to fresh sort
> buffer. Record too large.
> - Thread 'SortMerger Reading Thread' terminated due to an exception:
Sebastian:
Are you using jdk 7 or jdk 8 ?
For jdk 7, there was bug w.r.t. code cache getting full which affects
performance.
https://bugs.openjdk.java.net/browse/JDK-8051955
https://bugs.openjdk.java.net/browse/JDK-8074288
http://blog.andresteingress.com/2016/10/19/java-codecache
Cheers
On Mo
Try to see of in the output of dmesg command there are some log about an
OOM. The OS logs there such info. I had a similar experience recently...
see [1]
Best,
Flavio
[1]
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-and-swapping-question-td13284.html
On 12 Jun 2017 2
Hi Stefan,
thanks for the answer and the advise, which I've already seen in another
email.
Anyway, I played around with the taskmanager.numberOfTaskSlots and
taskmanager.memory.fraction options. I noticed that decreasing the
former and increasing the latter lead to longer execution and more
proce
hen I'm running my Flink job on a small dataset, it successfully
> finishes. However, when a bigger dataset is used, I get multiple exceptions:
>
> - Caused by: java.io.IOException: Cannot write record to fresh sort
> buffer. Record too large.
> - Thread 'SortMerger Readin
Hi,
when I'm running my Flink job on a small dataset, it successfully
finishes. However, when a bigger dataset is used, I get multiple exceptions:
- Caused by: java.io.IOException: Cannot write record to fresh sort
buffer. Record too large.
- Thread 'SortMerger Reading Thread' te
18 matches
Mail list logo