Sorry, I had a second look and your stacktrace does not even point to
the spilling channel - it reads from the memory segment directly.
-> setting the temp dirs will thus not make a difference
I'm wondering why your deserializer eventually reads from a file on
gs:// directly, instead of, for examp
Hi Nico,
Unfortunately I can't share any of data, but it is not even data being
processed at the point of failure - it is still in the
matching-files-from-GCS phase.
I am using Apache Beam's FileIO to match files and during one of those
match-files steps I get the failure above.
Currently I run
Hi Encho,
the SpillingAdaptiveSpanningRecordDeserializer that you see in your
stack trace is executed while reading input records from another task.
If the (serialized) records are too large (> 5MiB), it will write and
assemble them in a spilling channel, i.e. on disk, instead of using
memory. This