Re: my solr 8.11 is indexing 5000 only using custom code.

2023-12-19 Thread Vince McMahon
Mike, it is very nice. Thank you. On Sun, Dec 17, 2023 at 4:49 PM Mike Drob wrote: > You can! > https://lists.apache.org/thread/brw7r0cf0t0m1wltxg5sky6t6d9crgxm > > On Sun, Dec 17, 2023 at 3:12 PM Vince McMahon < > sippingonesandze...@gmail.com> > wrote: > > > Thanks, Gus. I wish I can "bookma

Re: my solr 8.11 is indexing 5000 only using custom code.

2023-12-17 Thread Mike Drob
You can! https://lists.apache.org/thread/brw7r0cf0t0m1wltxg5sky6t6d9crgxm On Sun, Dec 17, 2023 at 3:12 PM Vince McMahon wrote: > Thanks, Gus. I wish I can "bookmark this reply. lol. > > On Sat, Dec 16, 2023 at 11:10 PM Gus Heck wrote: > > > Yes. see the detectChangesViaHashing option here: > >

Re: my solr 8.11 is indexing 5000 only using custom code.

2023-12-17 Thread Vince McMahon
Thanks, Gus. I wish I can "bookmark this reply. lol. On Sat, Dec 16, 2023 at 11:10 PM Gus Heck wrote: > Yes. see the detectChangesViaHashing option here: > https://github.com/nsoft/jesterj/wiki/Scanners > > In any Lucene index there's not really such a thing as incremental update. > When you wa

Re: my solr 8.11 is indexing 5000 only using custom code.

2023-12-16 Thread Gus Heck
Yes. see the detectChangesViaHashing option here: https://github.com/nsoft/jesterj/wiki/Scanners In any Lucene index there's not really such a thing as incremental update. When you want to do an "update" you send the whole document, and it's really a delete/insert the under the covers (there's som

Re: my solr 8.11 is indexing 5000 only using custom code.

2023-12-15 Thread Vince McMahon
Nice insight, Dima. Happy Friday. On Fri, Dec 15, 2023 at 11:11 AM Dmitri Maziuk wrote: > On 12/15/23 05:41, Vince McMahon wrote: > > Ishan, you are right. Doing multithreaded Indexing is going much faster. > > I found out after the remote machine became unresponsive very quickly ; > it > > cr

Re: my solr 8.11 is indexing 5000 only using custom code.

2023-12-15 Thread Vince McMahon
I am impressed, Gus. Does it handle incremental changes from the source db tables, such as insert, update, and delete. On Fri, Dec 15, 2023 at 12:58 PM Gus Heck wrote: > Have you considered trying an existing document ingestion framework? I > wrote this one: https://github.com/nsoft/jesterj It

Re: my solr 8.11 is indexing 5000 only using custom code.

2023-12-15 Thread Dmitri Maziuk
On 12/15/23 13:26, Mikhail Khludnev wrote: FYI, providing the logs attached, the code already sends docs in 1K batch. That's not obvious to me: the docs are being fetched in 1K batches from the DB and passed on to some "solr" library that may or may not be doing "helpful" stuff under the hoo

Re: my solr 8.11 is indexing 5000 only using custom code.

2023-12-15 Thread Mikhail Khludnev
FYI, providing the logs attached, the code already sends docs in 1K batch. On Fri, Dec 15, 2023 at 7:11 PM Dmitri Maziuk wrote: > On 12/15/23 05:41, Vince McMahon wrote: > > Ishan, you are right. Doing multithreaded Indexing is going much faster. > > I found out after the remote machine became

Re: my solr 8.11 is indexing 5000 only using custom code.

2023-12-15 Thread Gus Heck
Have you considered trying an existing document ingestion framework? I wrote this one: https://github.com/nsoft/jesterj It already has a database connector. If you do check it out and find difficulty please let me know by leaving bug reports (if bug) or feedback (if confusion) in the discussions se

Re: my solr 8.11 is indexing 5000 only using custom code.

2023-12-15 Thread Dmitri Maziuk
On 12/15/23 05:41, Vince McMahon wrote: Ishan, you are right. Doing multithreaded Indexing is going much faster. I found out after the remote machine became unresponsive very quickly ; it crashed. lol. FWIW I got better results posting docs in batches from a single thread. Work is in a "privat

Re: my solr 8.11 is indexing 5000 only using custom code.

2023-12-15 Thread Vince McMahon
Ishan, you are right. Doing multithreaded Indexing is going much faster. I found out after the remote machine became unresponsive very quickly ; it crashed. lol. On Fri, Dec 15, 2023 at 4:35 AM Vince McMahon wrote: > Ishan, > > How do you multi-thread? > > Secondly, could you please tell me wh

Re: my solr 8.11 is indexing 5000 only using custom code.

2023-12-15 Thread Vince McMahon
Oh, Mikhail. Thanks for your questions. There are just 5000 distinct ids. I'll check during the office hours with the source side to fix their problem and try again. Thank you and you are a very wise man. On Fri, Dec 15, 2023 at 4:32 AM Vince McMahon wrote: > Great questions. Here are some

Re: my solr 8.11 is indexing 5000 only using custom code.

2023-12-15 Thread Vince McMahon
Ishan, How do you multi-thread? Secondly, could you please tell me what to look from the log that the Indexing is committing 1000 documents at a time after executing the code *solr.commit()?* Do you see anything that tells why it stops after 5000 rows, while there are about 15 rows fetched?

Re: my solr 8.11 is indexing 5000 only using custom code.

2023-12-15 Thread Vince McMahon
Great questions. Here are some of the answers. " Exit condition `if Not rows:\n break` is not clear to me. Why should it work? " The exit condition is when the postgres_query fetch nothing then "if not row" will breaks out from the while loop and close the cursor. " Also, how many distinct ids

Re: my solr 8.11 is indexing 5000 only using custom code.

2023-12-14 Thread Mikhail Khludnev
NB: it's not easy to build a robust ETL from scratch (btw, have you asked Copilot or chat gpt for it? ). I spot a few oddities in the code, but they are not critical. >From log I see (fwiw, you still have DEBUG log enabled) that 1000 recs were added in 17 or something secs. It makes some sense. But

Re: my solr 8.11 is indexing 5000 only using custom code.

2023-12-14 Thread Ishan Chattopadhyaya
If you're able to do multithreaded indexing, it will go much faster. On Thu, 14 Dec, 2023, 6:51 pm Vince McMahon, wrote: > Hi, > > I have written a custom python program to Index which may provide a > better control than DIH. > > But, it is still doing at most 5000 documentation. I have enable