Dino, Sending lots of data to an archived forum is not a great idea. I snipped most of it out below as not to replicate it.
Your question does not look difficult unless your real question is about speed. Realistically, much of the time spent generally is in reading in a file and the actual search can be quite rapid with a wide range of methods. The data looks boring enough and seems to not have much structure other than one comma possibly separating two fields. Do you want the data as one wide filed or perhaps in two parts, which a CSV file is normally used to represent. Do you ever have questions like tell me all cars whose name begins with the letter D and has a V6 engine? If so, you may want more than a vanilla search. What exactly do you want to search for? Is it a set of built-in searches or something the user types in? The data seems to be sorted by the first field and then by the second and I did not check if some searches might be ambiguous. Can there be many entries containing III? Yep. Can the same words like Cruiser or Hybrid appear? So is this a one-time search or multiple searches once loaded as in a service that stays resident and fields requests. The latter may be worth speeding up. I don't NEED to know any of this but want you to know that the answer may depend on this and similar factors. We had a long discussion lately on whether to search using regular expressions or string methods. If your data is meant to be used once, you may not even need to read the file into memory, but read something like a line at a time and test it. Or, if you end up with more data like how many cylinders a car has, it may be time to read it in not just to a list of lines or such data structures, but get numpy/pandas involved and use their many search methods in something like a data.frame. Of course if you are worried about portability, keep using Get Regular Expression Print. Your example was: $ grep -i v60 all_cars_unique.csv Genesis,GV60 Volvo,V60 You seem to have wanted case folding and that is NOT a normal search. And your search is matching anything on any line. If you wanted only a complete field, such as all text after a comma to the end of the line, you could use grep specifications to say that. But once inside python, you would need to make choices depending on what kind of searches you want to allow but also things like do you want all matching lines shown if you search for say "a" ... -----Original Message----- From: Python-list <python-list-bounces+avi.e.gross=gmail....@python.org> On Behalf Of Dino Sent: Saturday, March 4, 2023 10:47 PM To: python-list@python.org Subject: Re: Fast full-text searching in Python (job for Whoosh?) Here's the complete data file should anyone care. Acura,CL Acura,ILX Acura,Integra Acura,Legend <SNIP> smart,fortwo electric drive smart,fortwo electric drive cabrio -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list