Check out the xcievers http://tech.backtype.com/the-dark-side-of-hadoop
Tom On Sun, May 22, 2011 at 9:38 AM, Guy Doulberg <guy.doulb...@conduit.com> wrote: > Hey, > > > > I am dealing with a problem I just don't understand, maybe you could help > me… > > > > I have an external table, that is partitioned by date (for example > '20110511'). > > > > I assembled a hive query that contains multiple insertions. > > > > When I ran this query on small scale of partitions, lets say 5 days or a > week, the query ends up successfully. > > > > However, I need to run this query over 6 months of partitions, so I tried to > run the query, the query has failed. > > > > In the cluster log I could see that one mapper got EOFException, the > location of the file was not specified. > > > > I looked for a partition that points to an empty file…. all the locations of > the partitions are bind to a directory that contain data. > > > > I looked at the hadoop configuration xml…. I saw that in the > mapred.input.dir configuration one of the input files looks like this: > > hdfs://master001/tmp/hive-hive/hive_2011-05-22_12-16-46_680_69545572900013773/-mr-10009/1 > > > > I looked what there is under this location and found, a file called an > emptyFile with 0kb. > > > > When running the short query (over 5 days) this location doesn't exist in > the mapred.input.dir configuration. > > > > What do you think? > > > > > > Thanks, > > Guy Doulberg > >