https://github.com/mongodb/mongo-hadoop is from the mongo folks themselves
Shrikanth
On Wed, Mar 26, 2014 at 10:01 AM, Nitin Pawar wrote:
> take a look at https://github.com/yc-huang/Hive-mongo
>
>
> On Wed, Mar 26, 2014 at 10:29 PM, Swagatika Tripathy <
> swagatikat...@gmail.com> wrote:
>
>> H
See
http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/doing_rank_with_hive
thanks,
Shrikanth
On Dec 14, 2012, at 8:18 AM, maya bhardwaj wrote:
> I am converting the Netezza query in hive.How can i achieve following query
> in hive --> row_number() over (partition by record_id, event_id
I think what you need is a custom Input Format/ Record Reader. By the time the
SerDe is called the row has been fetched. I believe the record reader can get
access to predicates. The code to access HBase from Hive needs it for the same
reasons as you would need with Mongo and might be a good pla
I assume the reason for this is that the Hive compiler has no way of
determining that the 'day' that is input into the transform script is the same
'day' that is output from the transform script. Even if it did, its unclear if
pushing down would be legal without knowing the semantics of the
tra
This should work
Select ts,id,sum(metric/usage_count) from usage join (select count(*)
usage_count from usage) V on ( 1 = 1) group by ts,id;
thanks,
Shrikanth
On Aug 9, 2012, at 1:33 PM, wrote:
> Hi (vers),
>
> This might be a very basic question for most of you but I am stuck at it for
>
I believe Hive column names cant have '-' in them . From what I know this JSON
serde uses column names as JSON expressions. This means that if you renamed the
column name you would end up with a null value for the column. You might want
to try a different json serde (for e.g. the one Amazon use
I believe the default LazySerDe takes a parameter called
'serialization.last.column.takes.rest'. Setting this to true might solve your
issue (restoMsg would become a string then and you might have to parse it in
the query into an array)
thanks,
Shrikanth
On May 30, 2012, at 9:27 AM,
wrote:
----Original Message-
> From: shrikanth shankar [mailto:sshan...@qubole.com]
> Sent: Tuesday, May 15, 2012 1:14 PM
> To: user@hive.apache.org
> Subject: Re: What's the right data storage/representation?
>
> I would agree on keeping track of the history of updates in a separate t
o I think I should choose right cols to create index, and the index size
> will be more smaller ,is it right?
> And is it index was sorted?
> What’s the different in bitmap index , compact index and aggregate index?
>
>
> Best regards
> Ransom.
>
> From: shrikanth
I would agree on keeping track of the history of updates in a separate table in
Hive (you may not need to maintain it in the application tier). This pattern
seems to be the "Slowly Changing Dimension" pattern used in other (more
traditional) Data Warehouses... I suspect the challenge here would
My understanding is that the scan of the index is used to remove splits that
are known not to contain matching data. If you remove enough splits the second
MR task will run much faster. The index should also be much smaller than the
base table and that MR task should be much cheaper
Shrikanth
O
11 matches
Mail list logo