Hey,

I am dealing with  a problem I just don't understand, maybe you could help me...

I have an external table, that is partitioned by date (for example '20110511').

I assembled a hive query that contains multiple insertions.

When I ran this query on small scale of partitions, lets say 5 days or a week, 
the query ends up successfully.

However, I need to run this query over 6 months of partitions, so I tried to 
run the query, the query has failed.

In the cluster log I could see that one mapper got EOFException, the location 
of the file was not specified.

I looked for a partition that points to an empty file.... all the locations of 
the partitions are bind to a directory that contain data.

I looked at the hadoop configuration xml.... I saw that in the mapred.input.dir 
configuration one of the input files looks like this:
hdfs://master001/tmp/hive-hive/hive_2011-05-22_12-16-46_680_69545572900013773/-mr-10009/1

I looked what there is under this location and found, a file called an 
emptyFile with 0kb.

When running the short query (over 5 days) this location doesn't exist  in the 
mapred.input.dir configuration.

What do you think?


Thanks,
Guy Doulberg

Reply via email to