On 12/15/23 05:41, Vince McMahon wrote:
Ishan, you are right. Doing multithreaded Indexing is going much faster.
I found out after the remote machine became unresponsive very quickly ; it
crashed. lol.
FWIW I got better results posting docs in batches from a single thread.
Work is in a "private org" on gitlab so I can't post the link to the
code, but the basic layout is a DB reader that yields rows and a writer
that does requests.post() of a list of JSON docs. With the DB row ->
JSON doc transformer in-between.
I played with the size of the batch as well as async/await queue before
leaving it single-threaded w/ batch size of 5K docs: I had no speed
advantage with larger batches in our setup. And it doesn't DDoS the
index. ;)
Dima