Hi, I currently a have around 3 years' worth of files like
home.20210527 home.20210526 home.20210525 ... so around 1000 files, each of which contains information about data usage in lines like name kb alice 123 bob 4 ... zebedee 9999999 (there are actually more columns). I have about 400 users and the individual files are around 70 KB in size. Once a month I want to plot the historical usage as a line graph for the whole period for which I have data for each user. I already have some code to extract the current usage for a single from the most recent file: for line in open(file, "r"): columns = line.split() if len(columns) < data_column: logging.debug("no. of cols.: %i less than data col", len(columns)) continue regex = re.compile(user) if regex.match(columns[user_column]): usage = columns[data_column] logging.info(usage) return usage logging.error("unable to find %s in %s", user, file) return "none" Obviously I will want to extract all the data for all users from a file once I have opened it. After looping over all files I would naively end up with, say, a nested dict like {"20210527": { "alice" : 123, , ..., "zebedee": 9999999}, "20210526": { "alice" : 123, "bob" : 3, ..., "zebedee": 9}, "20210525": { "alice" : 123, "bob" : 1, ..., "zebedee": 9999999}, "20210524": { "alice" : 123, ..., "zebedee": 9}, "20210523": { "alice" : 123, ..., "zebedee": 9999999}, ...} where the user keys would vary over time as accounts, such as 'bob', are added and latter deleted. Is creating a potentially rather large structure like this the best way to go (I obviously could limit the size by, say, only considering the last 5 years)? Or is there some better approach for this kind of problem? For plotting I would probably use matplotlib. Cheers, Loris -- This signature is currently under construction. -- https://mail.python.org/mailman/listinfo/python-list