Alex Jillard wrote:
> I've been working with Python and Django for the past week or so and thus
> far it's been great.  My question is more of a general Python question than
> it is a Django question, but hopefully I can get some help or a link to an
> appropriate doc/website.  So far I've not been able to dig up much useful
> information up on Google.
>
> The application that I'm working on checks a CVS repository for all active
> projects and returns a list of the corresponding project names.  I'd then
> like to put these names into a database.  We start a few new projects every
> week, so our CVS repo grows fairly quickly and I'd like to be able to have
> the database stay current, either with a cron job or by a user starting the
> processes.
>
> So far all that works, but since CVS will always return a full list of
> projects, and I only want one entry in the database per project, I need to
> filter out the new projects.  Right now I'm just comparing what the CVS
> return list contains with what is in the database.
>
> Here is the code that I use.  Output_list is from CVS and current_projects
> is from the database.  Is there a faster way to do this, or a built in
> method in Python?  This code works fine now, but I can see it getting slow.
>   
I'm not sure of a pre-built way of doing this, but if your data is
guaranteed to be sorted, you can take advantage of that, and improve the
performance of the algorithm. Use a binary search rather than iterating
through the whole current_projects list.

You can also use the Python keyword 'in' instead of comparing each term:

    for b in output_list:
        if b not in current_projects:
            new_projects.append(b)

Even this will give you a bit of a performance boost, because 'in'
doesn't (shouldn't?) iterate through the whole list in every case. I
don't believe that 'in' will take advantage of a sorted List, so if it
really matters to you, you could implement your own function that does.

Another thing that may speed things up: right now you are building the
output_list up, and then building the current_projects list up, and then
comparing them one by one.

You could build the current_projects list, and then instead of putting
the items in 'output_list' into a list, do the comparison there:

    if item not in current_projects:
        new_projects.append(item)

Of course, this won't work if you are getting a list as input, and not
the individual items.

Cheers!


Jeff Anderson

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to