Re: [GENERAL] Statistics query

Steve Crawford Wed, 10 Apr 2013 10:52:41 -0700

On 04/10/2013 10:31 AM, Chris Curvey wrote:

On Wed, Apr 10, 2013 at 12:30 PM, Steve Crawford<[email protected]<mailto:[email protected]>> wrote:


    I'm seeking ideas on the best way to craft the following query.
    I've stripped everything down to the bare essentials and
    simplified it below.

    Input data has a timestamp (actually an int received from the
    system in the form of a Unix epoch), a unit identifier and a status:

     event_time | unit_id | status
    ------------+---------+--------
     1357056011 |      60 |      1
     1357056012 |     178 |      0
     1357056019 |     168 |      0
     1357056021 |       3 |      0
     1357056021 |       4 |      1
     1357056021 |     179 |      0
     1357056022 |       0 |      1
     1357056022 |       1 |      0
     1357056023 |       2 |      0
     1357056024 |       9 |      0
     1357056025 |       5 |      0
     1357056025 |       6 |      0
     1357056026 |       7 |      1
    ...

    A given unit_id cannot have two events at the same time (enforced
    by constraints).

    Given a point in time I would like to:

    1. Identify all distinct unit_ids with an entry that exists in the
    preceding hour then

    2. Count both the total events and sum the status=1 events for the
    most recent 50 events for each unit_id that fall within a limited
    period (e.g. don't look at data earlier than midnight). So unit_id
    60 might have 50 events in the last 15 minutes while unit_id 4
    might have only 12 events after midnight.

    The output would look something like:

     unit_id | events | status_1_count
    ---------+--------+----------------
          1  |     50 |             34
          2  |     27 |             18
          1  |     50 |             34
          1  |      2 |              0
    ...

    Each sub-portion is easy and while I could use external processing
    or set-returning functions I was hoping first to find the
    secret-sauce to glue everything together into a single query.

    Cheers,
    Steve


something like

select unit_id, count(*), sum(status)
from mytable a
where event_time >= [whatever unix epoch translates to "last midnight"]
and exists
(  select *
   from mytable b
   where b.unit_id = a.unit_id
   and b.epoch >= [unix epoch that translates to "one hour ago"])
group by unit _id;

1) I think figuring out the unix epoch should be reasonable...but Idon't know how to do it off the top of my head.2) I could completely be misunderstanding this. I'm not sure why theexample results would have unit id 1 repeated. (which my suggestionWON'T do)

Because I screwed up cutting and pasting to make an example. The unit_idin the output should, in fact, be distinct:


 unit_id | events | status_1_count
---------+--------+----------------
      1  |     50 |             34
      2  |     27 |             18
      3  |     50 |             34
      4  |      2 |              0

You are correct, epoch is easy:
abstime(epoch)
or
extract(epoch from timestamptz)

depending on which direction you are going or for an hour differencejust subtract 3600.

The solution, however, misses the important complicating gotcha. Theunits I want listed are only those that have had at least one event inthe last hour. But for each such unit, I only want the statistics toreflect the most-recent 50 events (even if those events occurred earlierthan the current hour) provided the event occurred on the current date.So the events column can never be less than 1 nor more than 50.


For example...

One unit might have a single event at the start of the last hour but 49more in the preceding 10 minutes. I want to see that unit and the statsfor those 50 events.

Same thing if a unit has 50 events clustered at the end of an hour - Idon't want the earlier ones.

Another might have 50 events early in the day but none this hour. Idon't want to see that one.

But I do want to see the one that had an event in the last hour late inthe day along with the 48 other events that have accumulated since midnight.


Cheers,
Steve

Re: [GENERAL] Statistics query

Reply via email to