That’s about what I got using a Python dictionary on random data on a high memory machine.
https://github.com/Gerardwx/database_testing.git It’s not obvious to me how to get it much faster than that. From: Python-list <python-list-bounces+gweatherby=uchc....@python.org> on behalf of Dino <d...@no.spam.ar> Date: Sunday, January 15, 2023 at 1:29 PM To: python-list@python.org <python-list@python.org> Subject: Re: Fast lookup of bulky "table" *** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. *** Thank you for your answer, Lars. Just a clarification: I am already doing a rough measuring of my queries. A fresh query without any caching: < 4s. Cached full query: < 5 micro-s (i.e. 6 orders of magnitude faster) Desired speed for my POC: 10 <ms Also, I didn't want to ask a question with way too many "moving parts", but when I talked about the "table", it's actually a 100k long list of IDs. I can then use each ID to invoke an API that will return those 40 attributes. The API is fast, but still, I am bound to loop through the whole thing to respond to the query, that's unless I pre-load the data into something that allows faster access. Also, as you correctly observed, "looking good with my colleagues" is a nice-to-have feature at this point, not really an absolute requirement :) Dino On 1/15/2023 3:17 AM, Lars Liedtke wrote: > Hey, > > before you start optimizing. I would suggest, that you measure response > times and query times, data search times and so on. In order to save > time, you have to know where you "loose" time. > > Does your service really have to load the whole table at once? Yes that > might lead to quicker response times on requests, but databases are > often very good with caching themselves, so that the first request might > be slower than following requests, with similar parameters. Do you use a > database, or are you reading from a file? Are you maybe looping through > your whole dataset on every request? Instead of asking for the specific > data? > > Before you start introducing a cache and its added complexity, do you > really need that cache? > > You are talking about saving microseconds, that sounds a bit as if you > might be “overdoing” it. How many requests will you have in the future? > At least in which magnitude and how quick do they have to be? You write > about 1-4 seconds on your laptop. But that does not really tell you that > much, because most probably the service will run on a server. I am not > saying that you should get a server or a cloud-instance to test against, > but to talk with your architect about that. > > I totally understand your impulse to appear as good as can be, but you > have to know where you really need to debug and optimize. It will not be > advantageous for you, if you start to optimize for optimizing's sake. > Additionally if you service is a PoC, optimizing now might be not the > first thing you have to worry about, but about that you made everything > as simple and readable as possible and that you do not spend too much > time for just showing how it could work. > > But of course, I do not know the tasks given to you and the expectations > you have to fulfil. All I am trying to say is to reconsider where you > really could improve and how far you have to improve. > > -- https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!npizb3UAz-jPUnhlimB3_lctLibK5EW4zJwjZVmQ41yV_-2WSm2eQ5cTi8vzOEuCfsdNTjIvIhFcakrX$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!npizb3UAz-jPUnhlimB3_lctLibK5EW4zJwjZVmQ41yV_-2WSm2eQ5cTi8vzOEuCfsdNTjIvIhFcakrX$> -- https://mail.python.org/mailman/listinfo/python-list