If you can create table having schema similar to your files' structure. and later add files as partition into the table-
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable%2FPartitionStatements then you can query your files using where clause. This seems to be a time taking alternate and I have never tried it. So try it at your own risk :) -- Ravi. *''We do not inherit the earth from our ancestors, we borrow it from our children.'' PROTECT IT !* On Tue, Jul 31, 2012 at 11:04 AM, Vinod Singh <vi...@vinodsingh.com> wrote: > I believe Hive does not have any feature, which can provide this > information. You may like to write a custom Map / Reduce program and get > the file name being processed as shown below- > > ((FileSplit) context.getInputSplit()).getPath() > > and then emit the file name when an occurrence of the word is found. > > Thanks, > Vinod > > > On Tue, Jul 31, 2012 at 9:41 AM, Techy Teck <comptechge...@gmail.com>wrote: > >> I have around 100 files and each file is of the size of 1GB. And I need >> to find a String in all these 100 files and also which files contains that >> particular String. I am working with Hadoop File System and all those 100 >> files are in Hadoop File System. >> >> All the 100 files are under real folder, so If I do like this below, I >> will be getting all the 100 files. And I need to find which files contains >> a particular String *hello* under real folder. >> >> bash-3.00$ hadoop fs -ls /technology/dps/real >> >> >> >> >> And this is my data structure in hdfs- >> >> row format delimited >> fields terminated by '\29' >> collection items terminated by ',' >> map keys terminated by ':' >> stored as textfile >> >> >> >> How I can write MapReduce jobs to do this particular problem so that I >> can find which files contains a particular string? Any simple example will >> be of great help to me. > > >