If you have a lot of data, you might have to write a custom reducer (in
python) to keep track of the moving date window.
If you don't have that much data, you might want to use a temp table
such that datediff(end_date, start_date) < 30. To
create such a table, you can self-join a table of unique
How about
SELECT day, COUNT(DISTINCT(userid)) FROM logins WHERE day - logins.day
< 30 GROUP BY day;
On Thu, Oct 11, 2012 at 6:05 AM, Tom Hubina wrote:
> I'm trying to compute the number of active users in the previous 30 days for
> each day over a date range. I can't think of any way to do it dir
I want to specify the configuration properties when the table is created.
This bug looks like it enables specifying a configuration when the table is
referenced.
Thanks again,
Josh
On Wed, Oct 10, 2012 at 8:33 PM, Edward Capriolo wrote:
> This is something similar. it may be the issue I remember
This is something similar. it may be the issue I remember.
https://reviews.facebook.net/D2499
In any case we could pass ALL the information from the properties,
Input paths, would be a way to make hive semi pig-like.
On Wed, Oct 10, 2012 at 11:26 PM, Josh Spiegel wrote:
> Thanks. Do you have
Thanks. Do you have the bug number? I couldn't find it.
On Wed, Oct 10, 2012 at 7:49 PM, Edward Capriolo wrote:
> No this is not we have a ticket open to propage tbl properies matching
> a pattern to the input format but not complete yet.
>
> On Wed, Oct 10, 2012 at 10:21 PM, Josh Spiegel wrot
No this is not we have a ticket open to propage tbl properies matching
a pattern to the input format but not complete yet.
On Wed, Oct 10, 2012 at 10:21 PM, Josh Spiegel wrote:
> Hi,
>
> I want to create a table with a custom InputFormat. For example, something
> like this:
>
> CREATE TABLE xxx
Hi,
I want to create a table with a custom InputFormat. For example, something
like this:
CREATE TABLE xxx (blah string)
STORED AS
INPUTFORMAT 'org.foo.MyInputFormat'
OUTPUTFORMAT 'org.foo.MyOutputFormat'
;
I also want to be able specify some configuration values that will be
available to
Agreed. That's the conclusion we came to as well. So it's less of a bug and
more of a feature request. I think one of the main advantages of hive is
the flexibility in allowing non-technical users to run basic queries
without having to think about the transform stuff. (i.e. we in the IT shop
can se
An example would be awesome .. I've never used a map side join (though I'm
searching on that now .. )
Tom
On Wed, Oct 10, 2012 at 3:59 PM, Roberto Sanabria
wrote:
> I've done this with a map side join using a table that stores days of the
> week. I use that to drive the day im calculating the co
I've done this with a map side join using a table that stores days of the
week. I use that to drive the day im calculating the count for. Let me know
if you need an example.
Cheers,
R
On Wed, Oct 10, 2012 at 3:05 PM, Tom Hubina wrote:
> I'm trying to compute the number of active users in the pr
I'm trying to compute the number of active users in the previous 30 days
for each day over a date range. I can't think of any way to do it directly
within Hive so I'm wondering if you guys have any ideas.
Basically the algorithm is something like:
For each day in date range:
SELECT day, COUNT(
I assume the reason for this is that the Hive compiler has no way of
determining that the 'day' that is input into the transform script is the same
'day' that is output from the transform script. Even if it did, its unclear if
pushing down would be legal without knowing the semantics of the
tra
Greetings all, I am trying to incorporate a TRANSFORM into a view (so we
can abstract the transform script away from the user)
As a Test, I have a table partitioned on day (in -MM-DD formated) with
lots of partitions
and I tried this
CREATE VIEW view_transform as
Select TRANSFORM (day, ip)
Hi All,
When I fire a Query language through hive shell query run perfectly but
hive.log file records some error like:-->
2012-10-10 15:26:33,226 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115))
- Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it
cannot be re
sol
Just a note:
There is also hiveserver2 which fixes connection issues, and it is
included in cdh 4
On Tue, Oct 9, 2012 at 4:24 PM, nagarjuna kanamarlapudi
wrote:
> Hi,
>
>
> I have a requirement of using multiple hive connections simultaneously to
> run multiple queries in parallel. I use JDBC Cl
Awesome! Order placed.
Great stuff Ed and Mark!
On Oct 10, 2012, at 11:06 AM, Navis류승우 wrote:
> Great works!
>
> Ant I've heard our team will translate your book into Korean. Let's sell it
> a lot. ^^
>
> Regards,
> Navis
>
> 2012/10/1 Aniket Mokashi
>
>> +1. Great work guys. Congrats!
>>
Thanks Jan. It worked!
Regards,
Manu
On Wed, Oct 10, 2012 at 12:00 PM, Jan Dolinár wrote:
> Hi Manu,
>
> I believe the last "group by q2.auth_count" is wrong, because it
> causes computing average only across lines with same value of
> q2.auth_count, which is of course equal to its value.
>
>
Great works!
Ant I've heard our team will translate your book into Korean. Let's sell it
a lot. ^^
Regards,
Navis
2012/10/1 Aniket Mokashi
> +1. Great work guys. Congrats!
> I just placed an order.
>
> ~Aniket
>
>
> On Sun, Sep 30, 2012 at 11:37 AM, varun kumar wrote:
>
>> Hi Edward,
>>
>> Ma
18 matches
Mail list logo