On 3/6/2023 12:49 PM, avi.e.gr...@gmail.com wrote:
Thomas,
I may have missed any discussion where the OP explained more about proposed
usage. If the program is designed to load the full data once, never get updates
except by re-reading some file, and then handles multiple requests, then some
things may be worth doing.
It looked to me, and I may well be wrong, like he wanted to search for a string anywhere
in the text so a grep-like solution is a reasonable start with the actual data being
stored as something like a list of character strings you can search "one line"
at a time. I suspect a numpy variant may work faster.
And of course any search function he builds can be made to remember some or all
previous searches using a cache decorator. That generally uses a dictionary for
the search keys internally.
But using lots of dictionaries strikes me as only helping if you are searching for text anchored to the start of a line
so if you ask for "Honda" you instead ask the dictionary called "h" and search perhaps just for
"onda" then recombine the prefix in any results. But the example given wanted to match something like
"V6" in middle of the text and I do not see how that would work as you would now need to search 26
dictionaries completely.
Well, that's the question, isn't it? Just how is this expected to be
used? I didn't read the initial posting that carefully, and I may have
missed something that makes a difference.
The OP gives as an example a user entering a string ("v60"). The
example is for a model designation. If we know that this entry box will
only receive model, then I would populate a dictionary using the model
numbers as keys. The number of distinct keys will probably not be that
large.
For example, highly simplified of course:
>>> models = {'v60': 'Volvo', 'GV60': 'Genesis', 'cl': 'Acura'}
>>> entry = '60'
>>> candidates = (m for m in models.keys() if entry in m)
>>> list(candidates)
['v60', 'GV60']
The keys would be lower-cased. A separate dictionary would give the
complete string with the desired casing. The values could be object
references to the complete information. If there might be several
different models models with the same key, then the values could be
lists or dictionaries and one would need to do some disambiguation, but
that should be simple or quick.
It all depends on the planned access patterns. If the OP really wants
full-text search in the complete unstructured data file, then yes, a
full text indexer of some kind will be useful. Whoosh certainly looks
good though I have not used it. But for populating dropdown lists in
web forms, most likely the design of the form will provide a structure
for the various searches.
-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail....@python.org> On
Behalf Of Thomas Passin
Sent: Monday, March 6, 2023 11:03 AM
To: python-list@python.org
Subject: Re: Fast full-text searching in Python (job for Whoosh?)
On 3/6/2023 10:32 AM, Weatherby,Gerard wrote:
Not sure if this is what Thomas meant, but I was also thinking dictionaries.
Dino could build a set of dictionaries with keys “a” through “z” that contain
data with those letters in them. (I’m assuming case insensitive search) and
then just search “v” if that’s what the user starts with.
Increased performance may be achieved by building dictionaries “aa”,”ab” ...
“zz. And so on.
Of course, it’s trading CPU for memory usage, and there’s likely a point at
which the cost of building dictionaries exceeds the savings in searching.
Chances are it would only be seconds at most to build the data cache,
and then subsequent queries would respond very quickly.
From: Python-list <python-list-bounces+gweatherby=uchc....@python.org> on behalf of
Thomas Passin <li...@tompassin.net>
Date: Sunday, March 5, 2023 at 9:07 PM
To: python-list@python.org <python-list@python.org>
Subject: Re: Fast full-text searching in Python (job for Whoosh?)
I would probably ingest the data at startup into a dictionary - or
perhaps several depending on your access patterns - and then you will
only need to to a fast lookup in one or more dictionaries.
If your access pattern would be easier with SQL queries, load the data
into an SQLite database on startup.
IOW, do the bulk of the work once at startup.
--
https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$>
--
https://mail.python.org/mailman/listinfo/python-list