Re: One corrupt gzip in a directory of 100s

2015-04-02 Thread Ted Yu
it with >>> > Exception on the entire job. >>> > I like SPARK-6593, since it can cover also additional cases, not just in >>> > case of corrupted zip files. >>> > >>> > >>> > >>> > From: Dale Richardson >&g

Re: One corrupt gzip in a directory of 100s

2015-04-02 Thread Romi Kuntsman
t; > I like SPARK-6593, since it can cover also additional cases, not just >> in >> > case of corrupted zip files. >> > >> > >> > >> > From: Dale Richardson >> > To: "dev@spark.apache.org" >> > Date: 29/0

Re: One corrupt gzip in a directory of 100s

2015-04-01 Thread Ted Yu
6593, since it can cover also additional cases, not just in > > case of corrupted zip files. > > > > > > > > From: Dale Richardson > > To: "dev@spark.apache.org" > > Date: 29/03/2015 11:48 PM > > Subject:One corrupt gzip in a

Re: One corrupt gzip in a directory of 100s

2015-04-01 Thread Romi Kuntsman
job. > I like SPARK-6593, since it can cover also additional cases, not just in > case of corrupted zip files. > > > > From: Dale Richardson > To: "dev@spark.apache.org" > Date: 29/03/2015 11:48 PM > Subject: One corrupt gzip in a directory of

Re: One corrupt gzip in a directory of 100s

2015-04-01 Thread Gil Vernik
Richardson To: "dev@spark.apache.org" Date: 29/03/2015 11:48 PM Subject: One corrupt gzip in a directory of 100s Recently had an incident reported to me where somebody was analysing a directory of gzipped log files, and was struggling to load them into spark because one of the

One corrupt gzip in a directory of 100s

2015-03-29 Thread Dale Richardson
Recently had an incident reported to me where somebody was analysing a directory of gzipped log files, and was struggling to load them into spark because one of the files was corrupted - calling sc.textFiles('hdfs:///logs/*.gz') caused an IOException on the particular executor that was reading