Alex Jillard wrote: > I've been working with Python and Django for the past week or so and thus > far it's been great. My question is more of a general Python question than > it is a Django question, but hopefully I can get some help or a link to an > appropriate doc/website. So far I've not been able to dig up much useful > information up on Google. > > The application that I'm working on checks a CVS repository for all active > projects and returns a list of the corresponding project names. I'd then > like to put these names into a database. We start a few new projects every > week, so our CVS repo grows fairly quickly and I'd like to be able to have > the database stay current, either with a cron job or by a user starting the > processes. > > So far all that works, but since CVS will always return a full list of > projects, and I only want one entry in the database per project, I need to > filter out the new projects. Right now I'm just comparing what the CVS > return list contains with what is in the database. > > Here is the code that I use. Output_list is from CVS and current_projects > is from the database. Is there a faster way to do this, or a built in > method in Python? This code works fine now, but I can see it getting slow. > I'm not sure of a pre-built way of doing this, but if your data is guaranteed to be sorted, you can take advantage of that, and improve the performance of the algorithm. Use a binary search rather than iterating through the whole current_projects list.
You can also use the Python keyword 'in' instead of comparing each term: for b in output_list: if b not in current_projects: new_projects.append(b) Even this will give you a bit of a performance boost, because 'in' doesn't (shouldn't?) iterate through the whole list in every case. I don't believe that 'in' will take advantage of a sorted List, so if it really matters to you, you could implement your own function that does. Another thing that may speed things up: right now you are building the output_list up, and then building the current_projects list up, and then comparing them one by one. You could build the current_projects list, and then instead of putting the items in 'output_list' into a list, do the comparison there: if item not in current_projects: new_projects.append(item) Of course, this won't work if you are getting a list as input, and not the individual items. Cheers! Jeff Anderson
signature.asc
Description: OpenPGP digital signature