If you can create table having schema similar to your files' structure. and
later add files as partition into the table-

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable%2FPartitionStatements

then you can query your files using where clause.

This seems to be a time taking alternate and I have never tried it. So try
it at your own risk :)

--
Ravi.
*''We do not inherit the earth from our ancestors, we borrow it from our
children.'' PROTECT IT !*



On Tue, Jul 31, 2012 at 11:04 AM, Vinod Singh <vi...@vinodsingh.com> wrote:

> I believe Hive does not have any feature, which can provide this
> information. You may like to write a custom Map / Reduce program and get
> the file name being processed as shown below-
>
> ((FileSplit) context.getInputSplit()).getPath()
>
> and then emit the file name when an occurrence of the word is found.
>
> Thanks,
> Vinod
>
>
> On Tue, Jul 31, 2012 at 9:41 AM, Techy Teck <comptechge...@gmail.com>wrote:
>
>> I have around 100 files and each file is of the size of 1GB. And I need
>> to find a String in all these 100 files and also which files contains that
>> particular String. I am working with Hadoop File System and all those 100
>> files are in Hadoop File System.
>>
>> All the 100 files are under real folder, so If I do like this below, I
>> will be getting all the 100 files. And I need to find which files contains
>> a particular String *hello* under real folder.
>>
>> bash-3.00$ hadoop fs -ls /technology/dps/real
>>
>>
>>
>>
>> And this is my data structure in hdfs-
>>
>> row format delimited
>> fields terminated by '\29'
>> collection items terminated by ','
>> map keys terminated by ':'
>> stored as textfile
>>
>>
>>
>> How I can write MapReduce jobs to do this particular problem so that I
>> can find which files contains a particular string? Any simple example will
>> be of great help to me.
>
>
>

Reply via email to