On May 25, 2:37 am, [EMAIL PROTECTED] wrote:
> im writing a webcrawler.
> after visiting a new site i want to store it in alphabetical order.
>
> so obv i want fast insert. i want to delete duplicates too.
>
> which datastructure is best for this?

I think you ought to re-examine your requirements.  Why do you need to
store in alphabetical order?  (And, what does that even mean for a web
site?)  Personally, I'd probably just use a simple dict for in-memory
storage, and use a database for external storage.  In that case,
adding a site or page to the "seen" structure is a simple matter of
doing

urls[key] = url

Dicts are basically hash tables, so this insert is amortized O(1) and
you get dupe-detection for free, provided the key is unique.  If you
want to display the contents in alphabetical order by "key," you can
do this by:

keyz = sorted (urls.keys())
for key in keyz:
    print urls[key]

The problem with this is that you pay the sorting cost every time you
want to display the results.  If that is a problem, you can maintain
an in-memory index (and rely on the database to do it for you in the
case of externally stored results).
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to