Hive can handle a sequence file just like a text file, only it omits the key completely and only uses the value part of it, other than that you won't notice the difference between sequence or plain text file
From: David Kulp [mailto:dk...@fiksu.com] Sent: Thursday, April 19, 2012 2:13 PM To: user@hive.apache.org Subject: Re: using the key from a SequenceFile I'm trying to achieve something very similar. I want to write an MR program that writes results in a record-based sequencefile that would be directly readable from hive as though it were created using "STORED AS SEQUENCEFILE" with, say, BinarySortableSerDe. >From this discussion it seems that Hive does not / cannot take advantage of >the key/values in a sequencefile, but rather it requires a value that is >serialized using a SerDe. Is that right? If so, does that mean that the right approach is to using the BinarySortableSerDe to pass the collector a row's worth of data as the Writable value. And would Hive "just work" on such data? If SequencefileOutputFormat is used, will it automatically place sync markers in the file to allow for file splitting? Thanks! (ps. As an aside, Avro would be better. Wouldn't it be a huge win for MapReduce to have an AvroOutputFileFormat and for Hive to have a serde that read such files? It seems like there's a natural correspondence between the richer data representations of an SQL schema and an Avro schema, and there's already code for working with Avro in MR as input.) On Apr 19, 2012, at 6:15 AM, madhu phatak wrote: Serde will allow you to create custom data from your sequence File https://cwiki.apache.org/confluence/display/Hive/SerDe On Thu, Apr 19, 2012 at 3:37 PM, Ruben de Vries <ruben.devr...@hyves.nl<mailto:ruben.devr...@hyves.nl>> wrote: I'm trying to migrate a part of our current hadoop jobs from normal mapreduce jobs to hive, Previously the data was stored in sequencefiles with the keys containing valueable data! However if I load the data into a table I loose that key data (or at least I can't access it with hive), I want to somehow use the key from the sequence file in hive. I know this has come up before since I can find some hints of people needing it but I can't seem to find a working solution and since I'm not very good with java I really can't get it done myself :(. Does anyone have a snippet of something like this working? I get errors like; ../hive/mapred/CustomSeqRecordReader.java:14: cannot find symbol [javac] symbol : constructor SequenceFileRecordReader() [javac] location: class org.apache.hadoop.mapred.SequenceFileRecordReader<K,V> [javac] public class CustomSeqRecordReader<K, V> extends SequenceFileRecordReader<K, V> implements RecordReader<K, V> { Hope some1 has a snippet or can help me out, would really love to be able to switch part of our jobs to hive, Ruben de Vries -- https://github.com/zinnia-phatak-dev/Nectar