Re: Filescheme GS not found sometimes - inconsistent exceptions for reading from GCS

2018-09-13 Thread Nico Kruber
Sorry, I had a second look and your stacktrace does not even point to the spilling channel - it reads from the memory segment directly. -> setting the temp dirs will thus not make a difference I'm wondering why your deserializer eventually reads from a file on gs:// directly, instead of, for examp

Re: Filescheme GS not found sometimes - inconsistent exceptions for reading from GCS

2018-09-13 Thread Encho Mishinev
Hi Nico, Unfortunately I can't share any of data, but it is not even data being processed at the point of failure - it is still in the matching-files-from-GCS phase. I am using Apache Beam's FileIO to match files and during one of those match-files steps I get the failure above. Currently I run

Re: Filescheme GS not found sometimes - inconsistent exceptions for reading from GCS

2018-09-13 Thread Nico Kruber
Hi Encho, the SpillingAdaptiveSpanningRecordDeserializer that you see in your stack trace is executed while reading input records from another task. If the (serialized) records are too large (> 5MiB), it will write and assemble them in a spilling channel, i.e. on disk, instead of using memory. This

Filescheme GS not found sometimes - inconsistent exceptions for reading from GCS

2018-08-29 Thread Encho Mishinev
Hello, I am using Flink 1.5.3 and executing jobs through Apache Beam 2.6.0. One of my jobs involves reading from Google Cloud Storage which uses the file scheme "gs://". Everything was fine but once in a while I would get an exception that the scheme is not recognised. Now I've started seeing them