Sort by Avro field

2014-02-20 Thread Software Dev
Is it possible to sort by a field within an Avro struct? Am I doing
something wrong?


hive> describe logs;
OK
requestheader struct from deserializer
year int
month int
day int

hive> select * from logs where year = 2014 order by requestheader.timestamp;
FAILED: ParseException line 1:68 mismatched input 'timestamp' expecting
Identifier near '.' in expression specification


Output Avro result as JSON

2014-02-20 Thread Software Dev
When I run a query in the hive shell for an Avro field it displays it as
json. How can I accomplish then when saving to a file?


Presplitting on NGinx HttpUseridModule

2014-05-28 Thread Software Dev
Any suggestions on pre-splitting on a uid generated from NGinx? The
look like the following as a base64 encoded cookie... (CgAQS1NhGAE
DATNBBikAg==, CgAQNVNhGZsePgTdBB/KAg==, ...)

http://wiki.nginx.org/HttpUseridModule
http://www.lexa.ru/programs/mod-uid-eng.html


Help with query

2014-05-29 Thread Software Dev
We have a table with user entered queries, their IP. How could we
write a query that will count and order queries by their count having
a unique IP count > X. For example if we had the same IP enter the
same query Y times we wouldnlt want to include this in the final
result unless there have been X-Y other IP's that searched for that
query.

Is this perhaps better suited fro Pig?

Thanks


Help with query

2014-05-29 Thread Software Dev
We have a table with user entered queries, their IP. How could we
write a query that will count and order queries by their count having
a unique IP count > X. For example if we had the same IP enter the
same query Y times we wouldnlt want to include this in the final
result unless there have been X-Y other IP's that searched for that
query.

Is this perhaps better suited fro Pig?

Thanks


Help with Query

2014-05-30 Thread Software Dev
We have a table with user entered queries, their IP. How could we
write a query that will count and order queries by their count having
a unique IP count > X. For example if we had the same IP enter the
same query Y times we wouldnlt want to include this in the final
result unless there have been X-Y other IP's that searched for that
query.

Is this perhaps better suited fro Pig?

Thanks