I am impressed, Gus.  Does it handle incremental changes from the source db
tables, such as insert, update, and delete.

On Fri, Dec 15, 2023 at 12:58 PM Gus Heck <gus.h...@gmail.com> wrote:

> Have you considered trying an existing document ingestion framework? I
> wrote this one: https://github.com/nsoft/jesterj It already has a database
> connector. If you do check it out and find difficulty please let me know by
> leaving bug reports (if bug) or feedback (if confusion) in the discussions
> section here: https://github.com/nsoft/jesterj/discussions
>
> As Mikhail noted, it's not easy to build a robust ingestion system from
> scratch.
>
> -Gus
>
> On Fri, Dec 15, 2023 at 11:11 AM Dmitri Maziuk <dmitri.maz...@gmail.com>
> wrote:
>
> > On 12/15/23 05:41, Vince McMahon wrote:
> > > Ishan, you are right.  Doing multithreaded Indexing is going much
> faster.
> > > I found out after the remote machine became unresponsive very quickly ;
> > it
> > > crashed.  lol.
> > FWIW I got better results posting docs in batches from a single thread.
> > Work is in a "private org" on gitlab so I can't post the link to the
> > code, but the basic layout is a DB reader that yields rows and a writer
> > that does requests.post() of a list of JSON docs. With the DB row ->
> > JSON doc transformer in-between.
> >
> > I played with the size of the batch as well as async/await queue before
> > leaving it single-threaded w/ batch size of 5K docs: I had no speed
> > advantage with larger batches in our setup. And it doesn't DDoS the
> > index. ;)
> >
> > Dima
> >
> >
>
> --
> http://www.needhamsoftware.com (work)
> https://a.co/d/b2sZLD9 (my fantasy fiction book)
>

Reply via email to