Viraj, Regarding file handles, you were most likely running into this: https://issues.apache.org/jira/browse/HIVE-1508
<https://issues.apache.org/jira/browse/HIVE-1508>-Vinithra On Fri, Feb 11, 2011 at 7:08 PM, Viral Bajaria <viral.baja...@gmail.com>wrote: > are you running out of open file handles ? you should look into that > because you are running everything on one node. you should look at your > namenode/datanode logs to make sure that's not the case. > > the error sounds like a BlockException on HDFS and the hive copyTask is > just moving the file from your LOCAL mount point to an tmp hdfs location > before it loads into the correct partition. > > I have emailed the group before but with no luck in getting a reply, my > script which continuously loads files into hive uses a lot of file handles. > I don't know if it's hive that leaves a file handle open or if it's some > other process. my script does not run on the same box so it's definitely not > my script that is holding onto file handles. > > -Viral > > On Fri, Feb 11, 2011 at 5:20 PM, Cam Bazz <camb...@gmail.com> wrote: > >> yes, i have a lot of small files. this is because i wanted to process >> hourly instead of daily. >> >> i will be checking into whether this is the case, i now am re-running >> the process, and I see >> >> 332 files and directories, 231 blocks = 563 total. Heap Size is 119.88 >> MB / 910.25 MB (13%) >> Configured Capacity : 140.72 GB >> DFS Used : 6.63 MB >> Non DFS Used : 8.76 GB >> DFS Remaining : 131.95 GB >> DFS Used% : 0 % >> DFS Remaining% : 93.77 % >> >> I do not think this is the case, but I will be monitoring, and will >> see in half an hour. >> >> best regards, and thanks a bunch. >> >> -cam >> >> >> >> On Sat, Feb 12, 2011 at 3:00 AM, Christopher, Pat >> <patrick.christop...@hp.com> wrote: >> > If you're running with the defaults I think its around 20gb. If you're >> processing a couple hundred MBs you could easily hit this limit between >> desired outputs and any intermediate files created. HDFS allocates the >> available space in blocks so if you have a lot of small files, you'll run >> out of blocks before you run out space. This is one reason why HDFS/hadoop >> is 'bad' for dealing with lots of small files. >> > >> > You can check here: localhost:50070 that's the web page for your hdfs >> namenode. It has status information on your hdfs including size. >> > >> > Pat >> > >> > -----Original Message----- >> > From: Cam Bazz [mailto:camb...@gmail.com] >> > Sent: Friday, February 11, 2011 4:55 PM >> > To: user@hive.apache.org >> > Subject: Re: error out of all sudden >> > >> > but is there a ridiculously low default for hdfs space limits? I >> > looked everywhere in the configuration files, but could not find >> > anything that limits the size of hdfs >> > >> > i think i am running on a 150GB hard drive, and the data I am >> > processing is in amounts of couple of hundred of megabytes at max. >> > >> > best regards, >> > >> > -cam >> > >> > >> > >> > On Sat, Feb 12, 2011 at 2:44 AM, Christopher, Pat >> > <patrick.christop...@hp.com> wrote: >> >> Is your hdfs hitting its space limits? >> >> >> >> Pat >> >> >> >> -----Original Message----- >> >> From: Cam Bazz [mailto:camb...@gmail.com] >> >> Sent: Friday, February 11, 2011 4:38 PM >> >> To: user@hive.apache.org >> >> Subject: error out of all sudden >> >> >> >> Hello, >> >> >> >> I set up my one node pseudo distributed system, left with a cronjob, >> >> copying data from a remote server and loading them to hadoop, and >> >> doing some calculations per hour. >> >> >> >> It stopped working today, giving me this error. I deleted everything, >> >> and made it reprocess from beginning, and i still get the same error >> >> same place. >> >> >> >> is there a limit, on how many partitions there can be in a table? >> >> >> >> so, I tried for couple of hours solving the problem, but now my hive >> >> fun is over... >> >> >> >> any ideas as to why this might be happening, or what should i do tring >> >> to debug it. >> >> >> >> best regards, >> >> -c.b. >> >> >> >> >> >> 11/02/12 01:27:47 INFO ql.Driver: Starting command: load data local >> >> inpath '/var/mylog/hourly/log.CAT.2011021119' into table cat_raw >> >> partition(date_hour=2011021119) >> >> Copying data from file:/var/mylog/hourly/log.CAT.2011021119 >> >> >> >> 11/02/12 01:27:47 INFO exec.CopyTask: Copying data from >> >> file:/var/mylog/hourly/log.CAT.2011021119 to >> >> >> hdfs://darkstar:9000/tmp/hive-cam/hive_2011-02-12_01-27-47_415_7165217842693560517/10000 >> >> >> >> 11/02/12 01:27:47 INFO hdfs.DFSClient: Exception in >> >> createBlockOutputStream java.io.EOFException >> >> 11/02/12 01:27:47 INFO hdfs.DFSClient: Abandoning block >> >> blk_6275225343572661963_1859 >> >> 11/02/12 01:27:53 INFO hdfs.DFSClient: Exception in >> >> createBlockOutputStream java.io.EOFException >> >> 11/02/12 01:27:53 INFO hdfs.DFSClient: Abandoning block >> >> blk_2673116090916206836_1859 >> >> 11/02/12 01:27:59 INFO hdfs.DFSClient: Exception in >> >> createBlockOutputStream java.io.EOFException >> >> 11/02/12 01:27:59 INFO hdfs.DFSClient: Abandoning block >> >> blk_5414825878079983460_1859 >> >> 11/02/12 01:28:05 INFO hdfs.DFSClient: Exception in >> >> createBlockOutputStream java.io.EOFException >> >> 11/02/12 01:28:05 INFO hdfs.DFSClient: Abandoning block >> >> blk_6043862611357349730_1859 >> >> 11/02/12 01:28:11 WARN hdfs.DFSClient: DataStreamer Exception: >> >> java.io.IOException: Unable to create new block. >> >> at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2845) >> >> at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102) >> >> at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288) >> >> >> >> 11/02/12 01:28:11 WARN hdfs.DFSClient: Error Recovery for block >> >> blk_6043862611357349730_1859 bad datanode[0] nodes == null >> >> 11/02/12 01:28:11 WARN hdfs.DFSClient: Could not get block locations. >> >> Source file >> "/tmp/hive-cam/hive_2011-02-12_01-27-47_415_7165217842693560517/10000/log.CAT.2011021119" >> >> - Aborting... >> >> Failed with exception null >> >> 11/02/12 01:28:11 ERROR exec.CopyTask: Failed with exception null >> >> java.io.EOFException >> >> at java.io.DataInputStream.readByte(DataInputStream.java:250) >> >> at >> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298) >> >> at >> org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319) >> >> at org.apache.hadoop.io.Text.readString(Text.java:400) >> >> at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2901) >> >> at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826) >> >> at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102) >> >> at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288) >> >> >> > >> > >