Re: Filescheme GS not found sometimes - inconsistent exceptions for reading from GCS

2018-09-13 Thread Nico Kruber
Sorry, I had a second look and your stacktrace does not even point to the spilling channel - it reads from the memory segment directly. -> setting the temp dirs will thus not make a difference I'm wondering why your deserializer eventually reads from a file on gs:// directly, instead of, for examp

Re: Filescheme GS not found sometimes - inconsistent exceptions for reading from GCS

2018-09-13 Thread Encho Mishinev
Hi Nico, Unfortunately I can't share any of data, but it is not even data being processed at the point of failure - it is still in the matching-files-from-GCS phase. I am using Apache Beam's FileIO to match files and during one of those match-files steps I get the failure above. Currently I run

Re: Filescheme GS not found sometimes - inconsistent exceptions for reading from GCS

2018-09-13 Thread Nico Kruber
Hi Encho, the SpillingAdaptiveSpanningRecordDeserializer that you see in your stack trace is executed while reading input records from another task. If the (serialized) records are too large (> 5MiB), it will write and assemble them in a spilling channel, i.e. on disk, instead of using memory. This