Re: matching strings in a large set of strings

2010-05-06 Thread M.-A. Lemburg
Dennis Lee Bieber wrote: > On Thu, 29 Apr 2010 11:38:28 +0200, "Karin Lagesen" > declaimed the following in comp.lang.python: > >> Hello. >> >> I have approx 83 million strings, all 14 characters long. I need to be >> able to take another string and find out whether this one is present >> within

Re: matching strings in a large set of strings

2010-05-03 Thread Bryan
Karin Lagesen wrote: > I have approx 83 million strings, all 14 characters long. I need to be > able to take another string and find out whether this one is present > within the 83 million strings. [...] > I run out of memory building both the set and the dictionary, so > what I seem to be left wit

Re: matching strings in a large set of strings

2010-05-02 Thread Albert van der Horst
In article <877hnpjtdw@rudin.co.uk>, Paul Rudin wrote: >"Karin Lagesen" writes: > >> Hello. >> >> I have approx 83 million strings, all 14 characters long. I need to be >> able to take another string and find out whether this one is present >> within the 83 million strings. >> >> Now, I have

Re: matching strings in a large set of strings

2010-05-01 Thread News123
Dennis Lee Bieber wrote: > On Sat, 01 May 2010 13:48:02 +0200, News123 declaimed > the following in gmane.comp.python.general: > >> Dennis Lee Bieber wrote: >>> That lets you do a binary search on the file. Much faster than a >>> linear search (linear search will average out to 41.5M read ope

Re: matching strings in a large set of strings

2010-05-01 Thread Stefan Behnel
Duncan Booth, 30.04.2010 10:20: So more than 3GB just for the strings (and that's for Python 2.x on Python 3.x you'll need nearly 5GB). Running on a 64 bit version of Python should be fine, but for a 32 bit system a naive approach just isn't going to work. Option 1: use a trie. That should redu

Re: matching strings in a large set of strings

2010-05-01 Thread News123
Dennis Lee Bieber wrote: > On Thu, 29 Apr 2010 11:38:28 +0200, "Karin Lagesen" > declaimed the following in comp.lang.python: > >> Hello. >> >> I have approx 83 million strings, all 14 characters long. I need to be >> able to take another string and find out whether this one is present >> within

Re: External Hashing [was Re: matching strings in a large set of strings]

2010-05-01 Thread Jack
http://www.swizwatch.com/ All Cartier replica watches sold at Hotwristwatch.com are brand-new and high quality. Each Cartier Replica Watch produced is examined carefully by our quality test department and each watch is inspected again before being sent to our customer. It is our desire that you do

Re: External Hashing [was Re: matching strings in a large set of strings]

2010-04-30 Thread Dave Angel
Helmut Jarausch wrote: I think one could apply an external hashing technique which would require only very few disk accesses per lookup. Unfortunately, I'm now aware of an implementation in Python. Does anybody know about a Python implementation of external hashing? Thanks, Helmut. That's wh

Re: External Hashing [was Re: matching strings in a large set of strings]

2010-04-30 Thread Tim Chase
On 04/30/2010 12:51 PM, Helmut Jarausch wrote: I think one could apply an external hashing technique which would require only very few disk accesses per lookup. Unfortunately, I'm now aware of an implementation in Python. Does anybody know about a Python implementation of external hashing? Whil

External Hashing [was Re: matching strings in a large set of strings]

2010-04-30 Thread Helmut Jarausch
I think one could apply an external hashing technique which would require only very few disk accesses per lookup. Unfortunately, I'm now aware of an implementation in Python. Does anybody know about a Python implementation of external hashing? Thanks, Helmut. -- Helmut Jarausch Lehrstuhl fuer N

Re: matching strings in a large set of strings

2010-04-30 Thread Christian Heimes
s = "12345678901234" assert len(s) == 14 import sys sys.getsizeof(s) 38 So a single 14 char string takes 38 bytes. Make that at least 40 bytes. You have to take memory alignment into account. So a set with 83000 such strings takes approximately 1 MB. So far fairly trivial. But that's just th

Re: matching strings in a large set of strings

2010-04-30 Thread Paul Rudin
Duncan Booth writes: > Paul Rudin wrote: > >> Shouldn't a set with 83 million 14 character strings be fine in memory >> on a stock PC these days? I suppose if it's low on ram you might start >> swapping which will kill performance. Perhaps the method you're using >> to build the data structures

Re: matching strings in a large set of strings

2010-04-30 Thread Steven D'Aprano
On Fri, 30 Apr 2010 08:23:39 +0100, Paul Rudin wrote: > "Karin Lagesen" writes: > >> Hello. >> >> I have approx 83 million strings, all 14 characters long. I need to be >> able to take another string and find out whether this one is present >> within the 83 million strings. >> >> Now, I have tri

Re: matching strings in a large set of strings

2010-04-30 Thread Duncan Booth
Paul Rudin wrote: > Shouldn't a set with 83 million 14 character strings be fine in memory > on a stock PC these days? I suppose if it's low on ram you might start > swapping which will kill performance. Perhaps the method you're using > to build the data structures creates lots of garbage? How m

Re: matching strings in a large set of strings

2010-04-30 Thread Paul Rudin
"Karin Lagesen" writes: > Hello. > > I have approx 83 million strings, all 14 characters long. I need to be > able to take another string and find out whether this one is present > within the 83 million strings. > > Now, I have tried storing these strings as a list, a set and a dictionary. > I kn

Re: matching strings in a large set of strings

2010-04-29 Thread Terry Reedy
On 4/29/2010 5:38 AM, Karin Lagesen wrote: Hello. I have approx 83 million strings, all 14 characters long. I need to be able to take another string and find out whether this one is present within the 83 million strings. If the 'other string' is also 14 chars, so that you are looking for exac

Re: matching strings in a large set of strings

2010-04-29 Thread Miki
> I have approx 83 million strings, all 14 characters long. I need to be > able to take another string and find out whether this one is present > within the 83 million strings. Have a look at the shelve module. If you want to write the algorithm yourself, I suggest http://en.wikipedia.org/wiki/Tr

Re: matching strings in a large set of strings

2010-04-29 Thread Duncan Booth
MRAB wrote: > Karin Lagesen wrote: >> Hello. >> >> I have approx 83 million strings, all 14 characters long. I need to >> be able to take another string and find out whether this one is >> present within the 83 million strings. >> >> Now, I have tried storing these strings as a list, a set and

Re: matching strings in a large set of strings

2010-04-29 Thread MRAB
Karin Lagesen wrote: Hello. I have approx 83 million strings, all 14 characters long. I need to be able to take another string and find out whether this one is present within the 83 million strings. Now, I have tried storing these strings as a list, a set and a dictionary. I know that finding t

Re: matching strings in a large set of strings

2010-04-29 Thread Mark Tolonen
"Karin Lagesen" wrote in message news:416f727c6f5b0edb932b425db9579808.squir...@webmail.uio.no... Hello. I have approx 83 million strings, all 14 characters long. I need to be able to take another string and find out whether this one is present within the 83 million strings. Now, I have trie

Re: matching strings in a large set of strings

2010-04-29 Thread Stefan Behnel
Karin Lagesen, 29.04.2010 11:38: I have approx 83 million strings, all 14 characters long. I need to be able to take another string and find out whether this one is present within the 83 million strings. Now, I have tried storing these strings as a list, a set and a dictionary. I know that findi

Re: matching strings in a large set of strings

2010-04-29 Thread Peter Otten
Karin Lagesen wrote: > I have approx 83 million strings, all 14 characters long. I need to be > able to take another string and find out whether this one is present > within the 83 million strings. > > Now, I have tried storing these strings as a list, a set and a dictionary. > I know that findin

matching strings in a large set of strings

2010-04-29 Thread Karin Lagesen
Hello. I have approx 83 million strings, all 14 characters long. I need to be able to take another string and find out whether this one is present within the 83 million strings. Now, I have tried storing these strings as a list, a set and a dictionary. I know that finding things in a set and a di

Re: Matching Strings

2007-02-09 Thread Steven D'Aprano
On Fri, 09 Feb 2007 16:17:31 -0800, James Stroud wrote: > Assuming item is "(u'ground water',)" > > import re > item = re.compile(r"\(u'([^']*)',\)").search(item).group(1) Using a regex is a lot of overhead for a very simple operation. If item is the string "(u'ground water',)" then item[3:-3]

Re: Matching Strings

2007-02-09 Thread John Machin
On Feb 10, 11:58 am, Larry Bates <[EMAIL PROTECTED]> wrote: > rshepard-at-appl-ecosys.com wrote: > > On 2007-02-10, [EMAIL PROTECTED] wrote: > > >> if item == selName: > > > Slicing doesn't seem to do anything -- if I've done it correctly. I > > changed the above to read, > > >if item[2

Re: Matching Strings

2007-02-09 Thread John Machin
On Feb 10, 12:01 pm, rshepard-at-appl-ecosys.com wrote: > On 2007-02-10, James Stroud <[EMAIL PROTECTED]> wrote: > > > Assuming item is "(u'ground water',)" > > > import re > > item = re.compile(r"\(u'([^']*)',\)").search(item).group(1) > > James, > > I solved the problem when some experimentatio

Re: Matching Strings

2007-02-09 Thread John Machin
On Feb 10, 11:03 am, [EMAIL PROTECTED] wrote: > I'm not sure how to change a string so that it matches another one. > > My application (using wxPython and SQLite3 via pysqlite2) needs to compare > a string selected from the database into a list of tuples with another > string selected in a disp

Re: Matching Strings

2007-02-09 Thread rshepard-at-appl-ecosys . com
minded me that 'item' is a list index and not a string variable. by changing the line to, if item[0] == selName: I get the matchs correctly. Now I need to extract the proper matching strings from the list of tuples, and I'm working on that. Many thanks, Rich -- http://mail.python.org/mailman/listinfo/python-list

Re: Matching Strings

2007-02-09 Thread Gabriel Genellina
En Fri, 09 Feb 2007 21:03:32 -0300, <[EMAIL PROTECTED]> escribió: > I'm not sure how to change a string so that it matches another one. > > My application (using wxPython and SQLite3 via pysqlite2) needs to > compare > a string selected from the database into a list of tuples with another

Re: Matching Strings

2007-02-09 Thread Larry Bates
rshepard-at-appl-ecosys.com wrote: > On 2007-02-10, [EMAIL PROTECTED] wrote: > >> if item == selName: > > Slicing doesn't seem to do anything -- if I've done it correctly. I > changed the above to read, > > if item[2:-2] == selName: > > but the output's the same. > > Rich Use th

Re: Matching Strings

2007-02-09 Thread Paul McGuire
On Feb 9, 6:03 pm, [EMAIL PROTECTED] wrote: > I'm not sure how to change a string so that it matches another one. > > My application (using wxPython and SQLite3 via pysqlite2) needs to compare > a string selected from the database into a list of tuples with another > string selected in a displa

Re: Matching Strings

2007-02-09 Thread rshepard-at-appl-ecosys . com
On 2007-02-10, [EMAIL PROTECTED] wrote: > if item == selName: Slicing doesn't seem to do anything -- if I've done it correctly. I changed the above to read, if item[2:-2] == selName: but the output's the same. Rich -- http://mail.python.org/mailman/listinfo/python-list

Re: Matching Strings

2007-02-09 Thread James Stroud
[EMAIL PROTECTED] wrote: > I'm not sure how to change a string so that it matches another one. > > My application (using wxPython and SQLite3 via pysqlite2) needs to compare > a string selected from the database into a list of tuples with another > string selected in a display widget. > > A

Matching Strings

2007-02-09 Thread rshepard
I'm not sure how to change a string so that it matches another one. My application (using wxPython and SQLite3 via pysqlite2) needs to compare a string selected from the database into a list of tuples with another string selected in a display widget. An extract of the relevant code is: