Re: Rolling MAU computation

2012-10-10 Thread Igor Tatarinov
If you have a lot of data, you might have to write a custom reducer (in python) to keep track of the moving date window. If you don't have that much data, you might want to use a temp table such that datediff(end_date, start_date) < 30. To create such a table, you can self-join a table of unique

Re: Rolling MAU computation

2012-10-10 Thread MiaoMiao
How about SELECT day, COUNT(DISTINCT(userid)) FROM logins WHERE day - logins.day < 30 GROUP BY day; On Thu, Oct 11, 2012 at 6:05 AM, Tom Hubina wrote: > I'm trying to compute the number of active users in the previous 30 days for > each day over a date range. I can't think of any way to do it dir

Re: Creating a table with a custom InputFormat

2012-10-10 Thread Josh Spiegel
I want to specify the configuration properties when the table is created. This bug looks like it enables specifying a configuration when the table is referenced. Thanks again, Josh On Wed, Oct 10, 2012 at 8:33 PM, Edward Capriolo wrote: > This is something similar. it may be the issue I remember

Re: Creating a table with a custom InputFormat

2012-10-10 Thread Edward Capriolo
This is something similar. it may be the issue I remember. https://reviews.facebook.net/D2499 In any case we could pass ALL the information from the properties, Input paths, would be a way to make hive semi pig-like. On Wed, Oct 10, 2012 at 11:26 PM, Josh Spiegel wrote: > Thanks. Do you have

Re: Creating a table with a custom InputFormat

2012-10-10 Thread Josh Spiegel
Thanks. Do you have the bug number? I couldn't find it. On Wed, Oct 10, 2012 at 7:49 PM, Edward Capriolo wrote: > No this is not we have a ticket open to propage tbl properies matching > a pattern to the input format but not complete yet. > > On Wed, Oct 10, 2012 at 10:21 PM, Josh Spiegel wrot

Re: Creating a table with a custom InputFormat

2012-10-10 Thread Edward Capriolo
No this is not we have a ticket open to propage tbl properies matching a pattern to the input format but not complete yet. On Wed, Oct 10, 2012 at 10:21 PM, Josh Spiegel wrote: > Hi, > > I want to create a table with a custom InputFormat. For example, something > like this: > > CREATE TABLE xxx

Creating a table with a custom InputFormat

2012-10-10 Thread Josh Spiegel
Hi, I want to create a table with a custom InputFormat. For example, something like this: CREATE TABLE xxx (blah string) STORED AS INPUTFORMAT 'org.foo.MyInputFormat' OUTPUTFORMAT 'org.foo.MyOutputFormat' ; I also want to be able specify some configuration values that will be available to

Re: View Partition Pruning not Occurring during transform

2012-10-10 Thread John Omernik
Agreed. That's the conclusion we came to as well. So it's less of a bug and more of a feature request. I think one of the main advantages of hive is the flexibility in allowing non-technical users to run basic queries without having to think about the transform stuff. (i.e. we in the IT shop can se

Re: Rolling MAU computation

2012-10-10 Thread Tom Hubina
An example would be awesome .. I've never used a map side join (though I'm searching on that now .. ) Tom On Wed, Oct 10, 2012 at 3:59 PM, Roberto Sanabria wrote: > I've done this with a map side join using a table that stores days of the > week. I use that to drive the day im calculating the co

Re: Rolling MAU computation

2012-10-10 Thread Roberto Sanabria
I've done this with a map side join using a table that stores days of the week. I use that to drive the day im calculating the count for. Let me know if you need an example. Cheers, R On Wed, Oct 10, 2012 at 3:05 PM, Tom Hubina wrote: > I'm trying to compute the number of active users in the pr

Rolling MAU computation

2012-10-10 Thread Tom Hubina
I'm trying to compute the number of active users in the previous 30 days for each day over a date range. I can't think of any way to do it directly within Hive so I'm wondering if you guys have any ideas. Basically the algorithm is something like: For each day in date range: SELECT day, COUNT(

Re: View Partition Pruning not Occurring during transform

2012-10-10 Thread shrikanth shankar
I assume the reason for this is that the Hive compiler has no way of determining that the 'day' that is input into the transform script is the same 'day' that is output from the transform script. Even if it did, its unclear if pushing down would be legal without knowing the semantics of the tra

View Partition Pruning not Occurring during transform

2012-10-10 Thread John Omernik
Greetings all, I am trying to incorporate a TRANSFORM into a view (so we can abstract the transform script away from the user) As a Test, I have a table partitioned on day (in -MM-DD formated) with lots of partitions and I tried this CREATE VIEW view_transform as Select TRANSFORM (day, ip)

error in log file

2012-10-10 Thread Ajit Kumar Shreevastava
Hi All, When I fire a Query language through hive shell query run perfectly but hive.log file records some error like:--> 2012-10-10 15:26:33,226 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be re sol

Re: Multiple Hive Connection Issues

2012-10-10 Thread Ruslan Al-Fakikh
Just a note: There is also hiveserver2 which fixes connection issues, and it is included in cdh 4 On Tue, Oct 9, 2012 at 4:24 PM, nagarjuna kanamarlapudi wrote: > Hi, > > > I have a requirement of using multiple hive connections simultaneously to > run multiple queries in parallel. I use JDBC Cl

Re: Book 'Programming Hive' from O'Reilly now available!

2012-10-10 Thread Alexander Lorenz
Awesome! Order placed. Great stuff Ed and Mark! On Oct 10, 2012, at 11:06 AM, Navis류승우 wrote: > Great works! > > Ant I've heard our team will translate your book into Korean. Let's sell it > a lot. ^^ > > Regards, > Navis > > 2012/10/1 Aniket Mokashi > >> +1. Great work guys. Congrats! >>

Re: Help in hive query

2012-10-10 Thread Manu A
Thanks Jan. It worked! Regards, Manu On Wed, Oct 10, 2012 at 12:00 PM, Jan Dolinár wrote: > Hi Manu, > > I believe the last "group by q2.auth_count" is wrong, because it > causes computing average only across lines with same value of > q2.auth_count, which is of course equal to its value. > >

Re: Book 'Programming Hive' from O'Reilly now available!

2012-10-10 Thread Navis류승우
Great works! Ant I've heard our team will translate your book into Korean. Let's sell it a lot. ^^ Regards, Navis 2012/10/1 Aniket Mokashi > +1. Great work guys. Congrats! > I just placed an order. > > ~Aniket > > > On Sun, Sep 30, 2012 at 11:37 AM, varun kumar wrote: > >> Hi Edward, >> >> Ma