Ah, thanks Dino. Autocomplete within a web page can be an interesting scenario but also a daunting one.
Now, do you mean you have a web page with a text field, initially I suppose empty, and the user types a single character and rapidly a drop-down list or something is created and shown? And as they type, it may shrink? And as soon as they select one, it is replaced in the text field and done? If your form has an attached function written in JavaScript, some might load your data into the browser and do all that work from within. No python needed. Now if your scenario is similar to the above, or perhaps the user needs to ask for autocompletion by using tab or something, and you want to keep sending requests to a server, you can of course use any language on the server. BUT I would be cautious in such a design. My guess is you autocomplete on every keystroke and the user may well type multiple characters resulting in multiple requests for your program. Is a new one called every time or is it a running service. If the latter, it pays to read in the data once and then carefully serve it. But when you get just the letter "h" you may not want to send and process a thousand results but limit It to say the first N. If they then add an o to make a ho, You may not need to do much if it is anchored to the start except to search in the results of the previous search rather than the whole data. But have you done some searching on how autocomplete from a fixed corpus is normally done? It is a quite common thing. -----Original Message----- From: Python-list <python-list-bounces+avi.e.gross=gmail....@python.org> On Behalf Of Dino Sent: Monday, March 6, 2023 7:40 AM To: python-list@python.org Subject: Re: RE: Fast full-text searching in Python (job for Whoosh?) Thank you for taking the time to write such a detailed answer, Avi. And apologies for not providing more info from the get go. What I am trying to achieve here is supporting autocomplete (no pun intended) in a web form field, hence the -i case insensitive example in my initial question. Your points are all good, and my original question was a bit rushed. I guess that the problem was that I saw this video: https://www.youtube.com/watch?v=gRvZbYtwTeo&ab_channel=NextDayVideo The idea that someone types into an input field and matches start dancing in the browser made me think that this was exactly what I needed, and hence I figured that asking here about Whoosh would be a good idea. I know realize that Whoosh would be overkill for my use-case, as a simple (case insensitive) query substring would get me 90% of what I want. Speed is in the order of a few milliseconds out of the box, which is chump change in the context of a web UI. Thank you again for taking the time to look at my question Dino On 3/5/2023 10:56 PM, avi.e.gr...@gmail.com wrote: > Dino, Sending lots of data to an archived forum is not a great idea. I > snipped most of it out below as not to replicate it. > > Your question does not look difficult unless your real question is about > speed. Realistically, much of the time spent generally is in reading in a > file and the actual search can be quite rapid with a wide range of methods. > > The data looks boring enough and seems to not have much structure other than > one comma possibly separating two fields. Do you want the data as one wide > filed or perhaps in two parts, which a CSV file is normally used to > represent. Do you ever have questions like tell me all cars whose name > begins with the letter D and has a V6 engine? If so, you may want more than > a vanilla search. > > What exactly do you want to search for? Is it a set of built-in searches or > something the user types in? > > The data seems to be sorted by the first field and then by the second and I > did not check if some searches might be ambiguous. Can there be many entries > containing III? Yep. Can the same words like Cruiser or Hybrid appear? > > So is this a one-time search or multiple searches once loaded as in a > service that stays resident and fields requests. The latter may be worth > speeding up. > > I don't NEED to know any of this but want you to know that the answer may > depend on this and similar factors. We had a long discussion lately on > whether to search using regular expressions or string methods. If your data > is meant to be used once, you may not even need to read the file into > memory, but read something like a line at a time and test it. Or, if you end > up with more data like how many cylinders a car has, it may be time to read > it in not just to a list of lines or such data structures, but get > numpy/pandas involved and use their many search methods in something like a > data.frame. > > Of course if you are worried about portability, keep using Get Regular > Expression Print. > > Your example was: > > $ grep -i v60 all_cars_unique.csv > Genesis,GV60 > Volvo,V60 > > You seem to have wanted case folding and that is NOT a normal search. And > your search is matching anything on any line. If you wanted only a complete > field, such as all text after a comma to the end of the line, you could use > grep specifications to say that. > > But once inside python, you would need to make choices depending on what > kind of searches you want to allow but also things like do you want all > matching lines shown if you search for say "a" ... > > -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list