Re: READING FILE FROM MONGO DB

2014-03-26 Thread Shrikanth Shankar
https://github.com/mongodb/mongo-hadoop is from the mongo folks themselves Shrikanth On Wed, Mar 26, 2014 at 10:01 AM, Nitin Pawar wrote: > take a look at https://github.com/yc-huang/Hive-mongo > > > On Wed, Mar 26, 2014 at 10:29 PM, Swagatika Tripathy < > swagatikat...@gmail.com> wrote: > >> H

Re: Hive Row number

2012-12-14 Thread shrikanth shankar
See http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/doing_rank_with_hive thanks, Shrikanth On Dec 14, 2012, at 8:18 AM, maya bhardwaj wrote: > I am converting the Netezza query in hive.How can i achieve following query > in hive --> row_number() over (partition by record_id, event_id

Re: Writing Custom Serdes for Hive

2012-10-16 Thread shrikanth shankar
I think what you need is a custom Input Format/ Record Reader. By the time the SerDe is called the row has been fetched. I believe the record reader can get access to predicates. The code to access HBase from Hive needs it for the same reasons as you would need with Mongo and might be a good pla

Re: View Partition Pruning not Occurring during transform

2012-10-10 Thread shrikanth shankar
I assume the reason for this is that the Hive compiler has no way of determining that the 'day' that is input into the transform script is the same 'day' that is output from the transform script. Even if it did, its unclear if pushing down would be legal without knowing the semantics of the tra

Re: Nested Select Statements

2012-08-09 Thread shrikanth shankar
This should work Select ts,id,sum(metric/usage_count) from usage join (select count(*) usage_count from usage) V on ( 1 = 1) group by ts,id; thanks, Shrikanth On Aug 9, 2012, at 1:33 PM, wrote: > Hi (vers), > > This might be a very basic question for most of you but I am stuck at it for >

Re: Parse Error with '-' in Hive

2012-05-31 Thread shrikanth shankar
I believe Hive column names cant have '-' in them . From what I know this JSON serde uses column names as JSON expressions. This means that if you renamed the column name you would end up with a null value for the column. You might want to try a different json serde (for e.g. the one Amazon use

Re: Hive 'rest' column

2012-05-30 Thread shrikanth shankar
I believe the default LazySerDe takes a parameter called 'serialization.last.column.takes.rest'. Setting this to true might solve your issue (restoMsg would become a string then and you might have to parse it in the query into an array) thanks, Shrikanth On May 30, 2012, at 9:27 AM, wrote:

Re: What's the right data storage/representation?

2012-05-15 Thread shrikanth shankar
----Original Message- > From: shrikanth shankar [mailto:sshan...@qubole.com] > Sent: Tuesday, May 15, 2012 1:14 PM > To: user@hive.apache.org > Subject: Re: What's the right data storage/representation? > > I would agree on keeping track of the history of updates in a separate t

Re: how to select without Mapreduce after index build?

2012-05-15 Thread shrikanth shankar
o I think I should choose right cols to create index, and the index size > will be more smaller ,is it right? > And is it index was sorted? > What’s the different in bitmap index , compact index and aggregate index? > > > Best regards > Ransom. > > From: shrikanth

Re: What's the right data storage/representation?

2012-05-15 Thread shrikanth shankar
I would agree on keeping track of the history of updates in a separate table in Hive (you may not need to maintain it in the application tier). This pattern seems to be the "Slowly Changing Dimension" pattern used in other (more traditional) Data Warehouses... I suspect the challenge here would

Re: how to select without Mapreduce after index build?

2012-05-11 Thread shrikanth shankar
My understanding is that the scan of the index is used to remove splits that are known not to contain matching data. If you remove enough splits the second MR task will run much faster. The index should also be much smaller than the base table and that MR task should be much cheaper Shrikanth O