Il giorno giovedì 27 maggio 2021 alle 11:28:31 UTC+2 Loris Bennett ha scritto: > Hi, > > I currently a have around 3 years' worth of files like > > home.20210527 > home.20210526 > home.20210525 > ... > > so around 1000 files, each of which contains information about data > usage in lines like > > name kb > alice 123 > bob 4 > ... > zebedee 9999999 > > (there are actually more columns). I have about 400 users and the > individual files are around 70 KB in size. > > Once a month I want to plot the historical usage as a line graph for the > whole period for which I have data for each user. > > I already have some code to extract the current usage for a single from > the most recent file: > > for line in open(file, "r"): > columns = line.split() > if len(columns) < data_column: > logging.debug("no. of cols.: %i less than data col", len(columns)) > continue > regex = re.compile(user) > if regex.match(columns[user_column]): > usage = columns[data_column] > logging.info(usage) > return usage > logging.error("unable to find %s in %s", user, file) > return "none" > > Obviously I will want to extract all the data for all users from a file > once I have opened it. After looping over all files I would naively end > up with, say, a nested dict like > > {"20210527": { "alice" : 123, , ..., "zebedee": 9999999}, > "20210526": { "alice" : 123, "bob" : 3, ..., "zebedee": 9}, > "20210525": { "alice" : 123, "bob" : 1, ..., "zebedee": 9999999}, > "20210524": { "alice" : 123, ..., "zebedee": 9}, > "20210523": { "alice" : 123, ..., "zebedee": 9999999}, > ...} > > where the user keys would vary over time as accounts, such as 'bob', are > added and latter deleted. > > Is creating a potentially rather large structure like this the best way > to go (I obviously could limit the size by, say, only considering the > last 5 years)? Or is there some better approach for this kind of > problem? For plotting I would probably use matplotlib. > > Cheers, > > Loris > > -- > This signature is currently under construction.
Have you tried to use pandas to read the data? Then you may try to add a column with the date and then join the datasets. -- https://mail.python.org/mailman/listinfo/python-list