> On Oct 19, 2017, at 9:40 AM, Israel Brewster <isr...@ravnalaska.net> wrote: > > I am working on developing a report that groups data into a two-dimensional > array based on date and time. More specifically, date is grouped into > categories: > > day, week-to-date, month-to-date, and year-to-date > > Then, for each of those categories, I need to get a count of records that > fall into the following categories: > > 0 minutes late, 1-5 minutes late, and 6-15 minutes late > > where minutes late will be calculated based on a known scheduled time and the > time in the record. To further complicate things, there are actually two > times in each record, so under the day, week-to-date, month-to-date etc > groups, there will be two sets of "late" bins, one for each time. In table > form it would look something like this: > > | day | week-to-date | month-to-date | year-to-date | > ---------------------------------------------------------------------------------------- > t1 0min | <counts for each time group> > t1 1-5 min | ... > t1 6-15 min | ... > t2 0min | ... > t2 1-5 min | ... > t2 6-15 min | ... > > So in the extreme scenario of a record that is for the current day, it will > be counted into 8 bins: once each for day, week-to-date, month-to-date and > year-to-date under the proper "late" bin for the first time in the record, > and once each into each of the time groups under the proper "late" bin for > the second time in the record. An older record may only be counted twice, > under the year-to-date group. A record with no matching schedule is > discarded, as is any record that is "late" by more than 15 minutes (those are > gathered into a separate report) > > My initial approach was to simply make dictionaries for each "row" in the > table, like so: > > t10 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,} > t15 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,} > . > . > t25 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,} > t215 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,} > > then loop through the records, find the schedule for that record (if any, if > not move on as mentioned earlier), compare t1 and t2 against the schedule, > and increment the appropriate bin counts using a bunch of if statements. > Functional, if ugly. But then I got to thinking: I keep hearing about all > these fancy numerical analysis tools for python like pandas and numpy - could > something like that help? Might there be a way to simply set up a table with > "rules" for the columns and rows, and drop my records into the table, having > them automatically counted into the proper bins or something? Or am I over > thinking this, and the "simple", if ugly approach is best?
I suppose I should mention: my data source is the results of a psycopg2 query, so a "record" is a tuple or dictionary (depending on how I want to set up the cursor) > > ----------------------------------------------- > Israel Brewster > Systems Analyst II > Ravn Alaska > 5245 Airport Industrial Rd > Fairbanks, AK 99709 > (907) 450-7293 > ----------------------------------------------- > > > > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list