To answer my own question -- so that someone else may benefit some day -- I've
found that there is nothing special about key or value formats in a
SequenceFile. As has been noted, keys are ignored. Each new key/value pair is
seen as a new row from Hive's perspective. There's no concept of usi
Check out these links :
http://stackoverflow.com/questions/2763112/opening-lucene-index-stored-in-hdfs
http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5575646&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5575646
http://www.drdobbs.com/article/print?articleID=226300
You're a lifesaver!
From: Dilip Joseph [mailto:dilip.antony.jos...@gmail.com]
Sent: Thursday, April 19, 2012 5:47 PM
To: user@hive.apache.org
Subject: Re: using the key from a SequenceFile
An example input format for using SequenceFile keys in hive is at
https://gist.github.com/2421795 . The co
An example input format for using SequenceFile keys in hive is at
https://gist.github.com/2421795 . The code just reverses how the key and
value are accessed in the standard SequenceFileRecordRecorder and
SequenceFileInputFormat that comes with hadoop.
You can use this custom input format by spec
In fact, it's just not a reasonable thing to do partition pruning on.
Imagine a situation where you had:
WHERE partition_column = f(unix_timestamp()) AND ordinary_column =
f(unix_timestamp).
The right hand side of the predicate has to be evaluated at map-time,
whereas you're assuming that left ha
I don't know what the state of Hive's partition pruning is, but I
would imagine that the problem is that the two example you're giving
are fundamentally different.
1) WHERE local_date = =date_add('2011-12-07',3) ,
the udf is a function of some constants, so the constant gets
evaluated at compile
Hi Anand
The row group size of a RC file is defined
by hive.io.rcfile.record.buffer.size . The default value is 4MB.
Good to set it to a higher value as 32 MB.
SET hive.io.rcfile.record.buffer.size = 33554432 ;
Regards
Bejoy KS
From: "Ladda, Anand"
Essentially just take the file and split on ';' . The only acception
is the CLI allows ; to be escaped by \;
Edward
On Thu, Apr 19, 2012 at 11:17 AM, Chandan B.K wrote:
> Hello users,
>
> Thanks Bhavesh, well as Bhavesh said, I completely agree.
> For that i need to parse the fi
Hello users,
Thanks Bhavesh, well as Bhavesh said, I completely agree.
For that i need to parse the file, extract the file line-by-line and
execute it.
If $bin/hive -f '/path/to/query/file' can execute a entire file without any
overhead(manual parsing etc), there should be some wa
Hello All,
I second this question. I have a MS SQL "rank" function which I would
like to run, the results it gives appears to suggest it is executed Mapper
side as opposed to reducer side, even when run with "cluster by"
constraints.
-Justin
On Thu, Apr 19, 2012 at 1:21 AM, Ranjan Bagchi wrot
On Thu, Apr 19, 2012 at 3:07 AM, Ruben de Vries wrote:
> I’m trying to migrate a part of our current hadoop jobs from normal
> mapreduce jobs to hive,
>
> Previously the data was stored in sequencefiles with the keys containing
> valueable data!
I think you'll want to define your table using a cu
But I'm not clear on how to write a single row of multiple values in my MR
program, since my only way to output data is to send values to the collector.
Are you saying that there's no row delimiter and I simply make repeated calls
to the collector, e.g.
output.collect(null, row1col1)
output.co
Hive can handle a sequence file just like a text file, only it omits the key
completely and only uses the value part of it, other than that you won't notice
the difference between sequence or plain text file
From: David Kulp [mailto:dk...@fiksu.com]
Sent: Thursday, April 19, 2012 2:13 PM
To: use
Hi All,
We did a successful setup of hadoop-0.20.203.0 and hive-0.7.1. We also loaded a
large number of CSV files into HDFS successfully. We can query through hive CLI.
Now we want to search for an keyword in any of the columns of a particular
table. Any link/thread will be helpful.
Thanks & R
I'm trying to achieve something very similar. I want to write an MR program
that writes results in a record-based sequencefile that would be directly
readable from hive as though it were created using "STORED AS SEQUENCEFILE"
with, say, BinarySortableSerDe.
From this discussion it seems that H
http://grokbase.com/p/hive/user/111gqvs0g0/%E2%80%8Fsequence-file-custom-serdes-question
according this hive ignore key part . May be u have to write custom
inputformat which combine both key and value.
On Thu, Apr 19, 2012 at 4:43 PM, Ruben de Vries wrote:
> Afaik SerDe only serialzes / deserial
Afaik SerDe only serialzes / deserializes the value part of the sequencefile :(
?
From: madhu phatak [mailto:phatak@gmail.com]
Sent: Thursday, April 19, 2012 12:16 PM
To: user@hive.apache.org
Subject: Re: using the key from a SequenceFile
Serde will allow you to create custom data from your
Check out this link
https://cwiki.apache.org/Hive/hiveclient.html
Regards
Shashwat Shriparv
On Thu, Apr 19, 2012 at 10:18 AM, Bhavesh Shah wrote:
> Hello Andes,
> I don't know about the HBASE,
>
> And about your ResultSet :
> You can traverse your resultset like as usually like:
>
> ResultSet r
as per my understanding,
In this case hive needs to look for all the partitions because it does not
have the value before hand on the partition check and note the udfs are
executed on the mapred and not on hive client side.
I would suggest you write a hive query in a file and replace the partitio
Serde will allow you to create custom data from your sequence File
https://cwiki.apache.org/confluence/display/Hive/SerDe
On Thu, Apr 19, 2012 at 3:37 PM, Ruben de Vries wrote:
> I’m trying to migrate a part of our current hadoop jobs from normal
> mapreduce jobs to hive,
>
> Previously the d
I'm trying to migrate a part of our current hadoop jobs from normal mapreduce
jobs to hive,
Previously the data was stored in sequencefiles with the keys containing
valueable data!
However if I load the data into a table I loose that key data (or at least I
can't access it with hive), I want to
Hi,
I have a table partitioned by local_date. When I write a query with
WHERE local_date = =date_add('2011-12-07',3) ,
hive executes the UDF ahead and looks only into the specific partitions. But
when the udf becomes more complex like
WHERE local_date = date_sub(to_date(from_unixtime(unix_t
22 matches
Mail list logo