I am impressed, Gus. Does it handle incremental changes from the source db tables, such as insert, update, and delete.
On Fri, Dec 15, 2023 at 12:58 PM Gus Heck <gus.h...@gmail.com> wrote: > Have you considered trying an existing document ingestion framework? I > wrote this one: https://github.com/nsoft/jesterj It already has a database > connector. If you do check it out and find difficulty please let me know by > leaving bug reports (if bug) or feedback (if confusion) in the discussions > section here: https://github.com/nsoft/jesterj/discussions > > As Mikhail noted, it's not easy to build a robust ingestion system from > scratch. > > -Gus > > On Fri, Dec 15, 2023 at 11:11 AM Dmitri Maziuk <dmitri.maz...@gmail.com> > wrote: > > > On 12/15/23 05:41, Vince McMahon wrote: > > > Ishan, you are right. Doing multithreaded Indexing is going much > faster. > > > I found out after the remote machine became unresponsive very quickly ; > > it > > > crashed. lol. > > FWIW I got better results posting docs in batches from a single thread. > > Work is in a "private org" on gitlab so I can't post the link to the > > code, but the basic layout is a DB reader that yields rows and a writer > > that does requests.post() of a list of JSON docs. With the DB row -> > > JSON doc transformer in-between. > > > > I played with the size of the batch as well as async/await queue before > > leaving it single-threaded w/ batch size of 5K docs: I had no speed > > advantage with larger batches in our setup. And it doesn't DDoS the > > index. ;) > > > > Dima > > > > > > -- > http://www.needhamsoftware.com (work) > https://a.co/d/b2sZLD9 (my fantasy fiction book) >