Re: Efficient counting of results

Israel Brewster Thu, 19 Oct 2017 11:08:21 -0700

> On Oct 19, 2017, at 9:40 AM, Israel Brewster <isr...@ravnalaska.net> wrote:
> 
> I am working on developing a report that groups data into a two-dimensional 
> array based on date and time. More specifically, date is grouped into 
> categories:
> 
> day, week-to-date, month-to-date, and year-to-date
> 
> Then, for each of those categories, I need to get a count of records that 
> fall into the following categories:
> 
> 0 minutes late, 1-5 minutes late, and 6-15 minutes late
> 
> where minutes late will be calculated based on a known scheduled time and the 
> time in the record. To further complicate things, there are actually two 
> times in each record, so under the day, week-to-date, month-to-date etc 
> groups, there will be two sets of "late" bins, one for each time. In table 
> form it would look  something like this:
> 
>                    | day  |  week-to-date | month-to-date |  year-to-date  |
> ----------------------------------------------------------------------------------------
> t1 0min        | <counts for each time group>
> t1 1-5 min    | ...
> t1 6-15 min  | ...
> t2 0min        | ...
> t2 1-5 min    | ...
> t2 6-15 min  | ...
> 
> So in the extreme scenario of a record that is for the current day, it will 
> be counted into 8 bins: once each for day, week-to-date, month-to-date and 
> year-to-date under the proper "late" bin for the first time in the record, 
> and once each into each of the time groups under the proper "late" bin for 
> the second time in the record. An older record may only be counted twice, 
> under the year-to-date group. A record with no matching schedule is 
> discarded, as is any record that is "late" by more than 15 minutes (those are 
> gathered into a separate report)
> 
> My initial approach was to simply make dictionaries for each "row" in the 
> table, like so:
> 
> t10 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,}
> t15 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,}
> .
> .
> t25 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,}
> t215 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,}
> 
> then loop through the records, find the schedule for that record (if any, if 
> not move on as mentioned earlier), compare t1 and t2 against the schedule, 
> and increment the appropriate bin counts using a bunch of if statements. 
> Functional, if ugly. But then I got to thinking: I keep hearing about all 
> these fancy numerical analysis tools for python like pandas and numpy - could 
> something like that help? Might there be a way to simply set up a table with 
> "rules" for the columns and rows, and drop my records into the table, having 
> them automatically counted into the proper bins or something? Or am I over 
> thinking this, and the "simple", if ugly approach is best?


I suppose I should mention: my data source is the results of a psycopg2 query, 
so a "record" is a tuple or dictionary (depending on how I want to set up the 
cursor)

> 
> -----------------------------------------------
> Israel Brewster
> Systems Analyst II
> Ravn Alaska
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7293
> -----------------------------------------------
> 
> 
> 
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Efficient counting of results

Reply via email to