Thats an interesting data structure Dennis. I will actually be running this type of query many times preferable in an ad-hoc environment. That makes it tough for sqlite3 since there will be several hundred thousand tuples.
On Fri, Jul 20, 2012 at 12:18 AM, Dennis Lee Bieber <wlfr...@ix.netcom.com>wrote: > {NOTE: preferences for comp.lang.python are to follow the RFC on > "netiquette" -- that is, post comments /under/ quoted material, trimming > what is not relevant... I've restructured this reply to match} > > On Thu, 19 Jul 2012 21:28:12 -0400, Rita <rmorgan...@gmail.com> > declaimed the following in gmane.comp.python.general: > > > > > > > On Thu, Jul 19, 2012 at 8:52 PM, Dave Angel <d...@davea.name> wrote: > > > > > On 07/19/2012 07:51 PM, Rita wrote: > > > > Hello, > > > > > > > > I have data in many files (/data/year/month/day/) which are named > like > > > > YearMonthDayHourMinute.gz. > > > > > > > > I would like to build a data structure which can easily handle > querying > > > the > > > > data. So for example, if I want to query data from 3 weeks ago till > > > today, > > > > i can do it rather quickly. > > > > > > > > each YearMonthDayHourMinute.gz file look like this and they are > about 4to > > > > 6kb > > > > red 34 > > > > green 44 > > > > blue 88 > > > > orange 4 > > > > black 3 > > > > while 153 > > > > > > > > I would like to query them so I can generate a plot rather quickly > but > > > not > > > > sure what is the best way to do this. > > > > > > > > > > > > > > > > > > What part of your code is giving you difficulty? You didn't post any > > > code. You don't specify the OS, nor version of your Python, nor what > > > other programs you expect to use along with Python. > > > > > Using linux 2.6.31; Python 2.7.3. > > I am not necessary looking for code just a pythonic way of doing it. > > Eventually, I would like to graph the data using matplotlib > > > > > Which doesn't really answer the question. After all, since the > source data is already in date/time-stamped files, a simple, sorted, > "glob" of files within a desired span would answer the requirement. > > But -- it would mean that you reparse the files for each processing > run. > > An alternative would be to run a pre-processor that parses the > files > into, say, an SQLite3 database (and which can determine, from the > highest datetime entry in the database, which /new/ files need to be > parsed on subsequent runs). Then do the query/plotting from a second > program which retrieves data from the database. > > But if this is a process that only needs to be run once, or at rare > intervals, maybe you only need to parse the files into an in-memory data > structure... Say a list of tuples of the form: > > [ (datetime, {color: value, color2: value2, ...}), > (datetime2, > ...) ] > > -- > Wulfraed Dennis Lee Bieber AF6VN > wlfr...@ix.netcom.com HTTP://wlfraed.home.netcom.com/ > > -- > http://mail.python.org/mailman/listinfo/python-list > -- --- Get your facts first, then you can distort them as you please.--
-- http://mail.python.org/mailman/listinfo/python-list