Re: using the key from a SequenceFile

2012-04-19 Thread David Kulp
To answer my own question -- so that someone else may benefit some day -- I've found that there is nothing special about key or value formats in a SequenceFile. As has been noted, keys are ignored. Each new key/value pair is seen as a new row from Hive's perspective. There's no concept of usi

Re: Any column search in HIVE

2012-04-19 Thread shashwat shriparv
Check out these links : http://stackoverflow.com/questions/2763112/opening-lucene-index-stored-in-hdfs http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5575646&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5575646 http://www.drdobbs.com/article/print?articleID=226300

RE: using the key from a SequenceFile

2012-04-19 Thread Ruben de Vries
You're a lifesaver! From: Dilip Joseph [mailto:dilip.antony.jos...@gmail.com] Sent: Thursday, April 19, 2012 5:47 PM To: user@hive.apache.org Subject: Re: using the key from a SequenceFile An example input format for using SequenceFile keys in hive is at https://gist.github.com/2421795 . The co

Re: using the key from a SequenceFile

2012-04-19 Thread Dilip Joseph
An example input format for using SequenceFile keys in hive is at https://gist.github.com/2421795 . The code just reverses how the key and value are accessed in the standard SequenceFileRecordRecorder and SequenceFileInputFormat that comes with hadoop. You can use this custom input format by spec

Re: nested UDFs on Partition column

2012-04-19 Thread Philip Tromans
In fact, it's just not a reasonable thing to do partition pruning on. Imagine a situation where you had: WHERE partition_column = f(unix_timestamp()) AND ordinary_column = f(unix_timestamp). The right hand side of the predicate has to be evaluated at map-time, whereas you're assuming that left ha

Re: nested UDFs on Partition column

2012-04-19 Thread Philip Tromans
I don't know what the state of Hive's partition pruning is, but I would imagine that the problem is that the two example you're giving are fundamentally different. 1) WHERE local_date = =date_add('2011-12-07',3) , the udf is a function of some constants, so the constant gets evaluated at compile

Re: Row Group Size of RCFile

2012-04-19 Thread Bejoy Ks
Hi Anand        The row group size of a RC file is defined by hive.io.rcfile.record.buffer.size . The default value is 4MB. Good to set it to a higher value as 32 MB. SET hive.io.rcfile.record.buffer.size = 33554432 ; Regards Bejoy KS  From: "Ladda, Anand"

Re: Execute query file

2012-04-19 Thread Edward Capriolo
Essentially just take the file and split on ';' . The only acception is the CLI allows ; to be escaped by \; Edward On Thu, Apr 19, 2012 at 11:17 AM, Chandan B.K wrote: > Hello users, > >                  Thanks Bhavesh, well as Bhavesh said, I completely agree. > For that i need to parse the fi

Re: Execute query file

2012-04-19 Thread Chandan B.K
Hello users, Thanks Bhavesh, well as Bhavesh said, I completely agree. For that i need to parse the file, extract the file line-by-line and execute it. If $bin/hive -f '/path/to/query/file' can execute a entire file without any overhead(manual parsing etc), there should be some wa

Re: Lifecycle and Configuration of a hive UDF

2012-04-19 Thread Justin Coffey
Hello All, I second this question. I have a MS SQL "rank" function which I would like to run, the results it gives appears to suggest it is executed Mapper side as opposed to reducer side, even when run with "cluster by" constraints. -Justin On Thu, Apr 19, 2012 at 1:21 AM, Ranjan Bagchi wrot

Re: using the key from a SequenceFile

2012-04-19 Thread Owen O'Malley
On Thu, Apr 19, 2012 at 3:07 AM, Ruben de Vries wrote: > I’m trying to migrate a part of our current hadoop jobs from normal > mapreduce jobs to hive, > > Previously the data was stored in sequencefiles with the keys containing > valueable data! I think you'll want to define your table using a cu

Re: using the key from a SequenceFile

2012-04-19 Thread David Kulp
But I'm not clear on how to write a single row of multiple values in my MR program, since my only way to output data is to send values to the collector. Are you saying that there's no row delimiter and I simply make repeated calls to the collector, e.g. output.collect(null, row1col1) output.co

RE: using the key from a SequenceFile

2012-04-19 Thread Ruben de Vries
Hive can handle a sequence file just like a text file, only it omits the key completely and only uses the value part of it, other than that you won't notice the difference between sequence or plain text file From: David Kulp [mailto:dk...@fiksu.com] Sent: Thursday, April 19, 2012 2:13 PM To: use

Any column search in HIVE

2012-04-19 Thread Garg, Rinku
Hi All, We did a successful setup of hadoop-0.20.203.0 and hive-0.7.1. We also loaded a large number of CSV files into HDFS successfully. We can query through hive CLI. Now we want to search for an keyword in any of the columns of a particular table. Any link/thread will be helpful. Thanks & R

Re: using the key from a SequenceFile

2012-04-19 Thread David Kulp
I'm trying to achieve something very similar. I want to write an MR program that writes results in a record-based sequencefile that would be directly readable from hive as though it were created using "STORED AS SEQUENCEFILE" with, say, BinarySortableSerDe. From this discussion it seems that H

Re: using the key from a SequenceFile

2012-04-19 Thread madhu phatak
http://grokbase.com/p/hive/user/111gqvs0g0/%E2%80%8Fsequence-file-custom-serdes-question according this hive ignore key part . May be u have to write custom inputformat which combine both key and value. On Thu, Apr 19, 2012 at 4:43 PM, Ruben de Vries wrote: > Afaik SerDe only serialzes / deserial

RE: using the key from a SequenceFile

2012-04-19 Thread Ruben de Vries
Afaik SerDe only serialzes / deserializes the value part of the sequencefile :( ? From: madhu phatak [mailto:phatak@gmail.com] Sent: Thursday, April 19, 2012 12:16 PM To: user@hive.apache.org Subject: Re: using the key from a SequenceFile Serde will allow you to create custom data from your

Re: how to manage the result set?

2012-04-19 Thread shashwat shriparv
Check out this link https://cwiki.apache.org/Hive/hiveclient.html Regards Shashwat Shriparv On Thu, Apr 19, 2012 at 10:18 AM, Bhavesh Shah wrote: > Hello Andes, > I don't know about the HBASE, > > And about your ResultSet : > You can traverse your resultset like as usually like: > > ResultSet r

Re: nested UDFs on Partition column

2012-04-19 Thread Nitin Pawar
as per my understanding, In this case hive needs to look for all the partitions because it does not have the value before hand on the partition check and note the udfs are executed on the mapred and not on hive client side. I would suggest you write a hive query in a file and replace the partitio

Re: using the key from a SequenceFile

2012-04-19 Thread madhu phatak
Serde will allow you to create custom data from your sequence File https://cwiki.apache.org/confluence/display/Hive/SerDe On Thu, Apr 19, 2012 at 3:37 PM, Ruben de Vries wrote: > I’m trying to migrate a part of our current hadoop jobs from normal > mapreduce jobs to hive, > > Previously the d

using the key from a SequenceFile

2012-04-19 Thread Ruben de Vries
I'm trying to migrate a part of our current hadoop jobs from normal mapreduce jobs to hive, Previously the data was stored in sequencefiles with the keys containing valueable data! However if I load the data into a table I loose that key data (or at least I can't access it with hive), I want to

nested UDFs on Partition column

2012-04-19 Thread Ramkumar
Hi, I have a table partitioned by local_date.  When I write a query with WHERE local_date = =date_add('2011-12-07',3) , hive executes the UDF ahead and looks only into the specific partitions.   But when the udf becomes more complex like WHERE local_date = date_sub(to_date(from_unixtime(unix_t