Thomas,

I may have missed any discussion where the OP explained more about proposed 
usage. If the program is designed to load the full data once, never get updates 
except by re-reading some file, and then handles multiple requests, then some 
things may be worth doing.

It looked to me, and I may well be wrong, like he wanted to search for a string 
anywhere in the text so a grep-like solution is a reasonable start with the 
actual data being stored as something like a list of character strings you can 
search "one line" at a time. I suspect a numpy variant may work faster.

And of course any search function he builds can be made to remember some or all 
previous searches using a cache decorator. That generally uses a dictionary for 
the search keys internally.

But using lots of dictionaries strikes me as only helping if you are searching 
for text anchored to the start of a line so if you ask for "Honda" you instead 
ask the dictionary called "h" and search perhaps just for "onda" then recombine 
the prefix in any results. But the example given wanted to match something like 
"V6" in middle of the text and I do not see how that would work as you would 
now need to search 26 dictionaries completely.



-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail....@python.org> On 
Behalf Of Thomas Passin
Sent: Monday, March 6, 2023 11:03 AM
To: python-list@python.org
Subject: Re: Fast full-text searching in Python (job for Whoosh?)

On 3/6/2023 10:32 AM, Weatherby,Gerard wrote:
> Not sure if this is what Thomas meant, but I was also thinking dictionaries.
> 
> Dino could build a set of dictionaries with keys “a” through “z” that contain 
> data with those letters in them. (I’m assuming case insensitive search) and 
> then just search “v” if that’s what the user starts with.
> 
> Increased performance may be achieved by building dictionaries “aa”,”ab” ... 
> “zz. And so on.
> 
> Of course, it’s trading CPU for memory usage, and there’s likely a point at 
> which the cost of building dictionaries exceeds the savings in searching.

Chances are it would only be seconds at most to build the data cache, 
and then subsequent queries would respond very quickly.

> 
> From: Python-list <python-list-bounces+gweatherby=uchc....@python.org> on 
> behalf of Thomas Passin <li...@tompassin.net>
> Date: Sunday, March 5, 2023 at 9:07 PM
> To: python-list@python.org <python-list@python.org>
> Subject: Re: Fast full-text searching in Python (job for Whoosh?)
> 
> I would probably ingest the data at startup into a dictionary - or
> perhaps several depending on your access patterns - and then you will
> only need to to a fast lookup in one or more dictionaries.
> 
> If your access pattern would be easier with SQL queries, load the data
> into an SQLite database on startup.
> 
> IOW, do the bulk of the work once at startup.
> --
> https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$>

-- 
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to