Hey, I am dealing with a problem I just don't understand, maybe you could help me...
I have an external table, that is partitioned by date (for example '20110511'). I assembled a hive query that contains multiple insertions. When I ran this query on small scale of partitions, lets say 5 days or a week, the query ends up successfully. However, I need to run this query over 6 months of partitions, so I tried to run the query, the query has failed. In the cluster log I could see that one mapper got EOFException, the location of the file was not specified. I looked for a partition that points to an empty file.... all the locations of the partitions are bind to a directory that contain data. I looked at the hadoop configuration xml.... I saw that in the mapred.input.dir configuration one of the input files looks like this: hdfs://master001/tmp/hive-hive/hive_2011-05-22_12-16-46_680_69545572900013773/-mr-10009/1 I looked what there is under this location and found, a file called an emptyFile with 0kb. When running the short query (over 5 days) this location doesn't exist in the mapred.input.dir configuration. What do you think? Thanks, Guy Doulberg