Mike, it is very nice.
Thank you.
On Sun, Dec 17, 2023 at 4:49 PM Mike Drob wrote:
> You can!
> https://lists.apache.org/thread/brw7r0cf0t0m1wltxg5sky6t6d9crgxm
>
> On Sun, Dec 17, 2023 at 3:12 PM Vince McMahon <
> sippingonesandze...@gmail.com>
> wrote:
>
> > Thanks, Gus. I wish I can "bookma
You can!
https://lists.apache.org/thread/brw7r0cf0t0m1wltxg5sky6t6d9crgxm
On Sun, Dec 17, 2023 at 3:12 PM Vince McMahon
wrote:
> Thanks, Gus. I wish I can "bookmark this reply. lol.
>
> On Sat, Dec 16, 2023 at 11:10 PM Gus Heck wrote:
>
> > Yes. see the detectChangesViaHashing option here:
> >
Thanks, Gus. I wish I can "bookmark this reply. lol.
On Sat, Dec 16, 2023 at 11:10 PM Gus Heck wrote:
> Yes. see the detectChangesViaHashing option here:
> https://github.com/nsoft/jesterj/wiki/Scanners
>
> In any Lucene index there's not really such a thing as incremental update.
> When you wa
Yes. see the detectChangesViaHashing option here:
https://github.com/nsoft/jesterj/wiki/Scanners
In any Lucene index there's not really such a thing as incremental update.
When you want to do an "update" you send the whole document, and it's
really a delete/insert the under the covers (there's som
Nice insight, Dima. Happy Friday.
On Fri, Dec 15, 2023 at 11:11 AM Dmitri Maziuk
wrote:
> On 12/15/23 05:41, Vince McMahon wrote:
> > Ishan, you are right. Doing multithreaded Indexing is going much faster.
> > I found out after the remote machine became unresponsive very quickly ;
> it
> > cr
I am impressed, Gus. Does it handle incremental changes from the source db
tables, such as insert, update, and delete.
On Fri, Dec 15, 2023 at 12:58 PM Gus Heck wrote:
> Have you considered trying an existing document ingestion framework? I
> wrote this one: https://github.com/nsoft/jesterj It
On 12/15/23 13:26, Mikhail Khludnev wrote:
FYI, providing the logs attached, the code already sends docs in 1K batch.
That's not obvious to me: the docs are being fetched in 1K batches from
the DB and passed on to some "solr" library that may or may not be doing
"helpful" stuff under the hoo
FYI, providing the logs attached, the code already sends docs in 1K batch.
On Fri, Dec 15, 2023 at 7:11 PM Dmitri Maziuk
wrote:
> On 12/15/23 05:41, Vince McMahon wrote:
> > Ishan, you are right. Doing multithreaded Indexing is going much faster.
> > I found out after the remote machine became
Have you considered trying an existing document ingestion framework? I
wrote this one: https://github.com/nsoft/jesterj It already has a database
connector. If you do check it out and find difficulty please let me know by
leaving bug reports (if bug) or feedback (if confusion) in the discussions
se
On 12/15/23 05:41, Vince McMahon wrote:
Ishan, you are right. Doing multithreaded Indexing is going much faster.
I found out after the remote machine became unresponsive very quickly ; it
crashed. lol.
FWIW I got better results posting docs in batches from a single thread.
Work is in a "privat
Ishan, you are right. Doing multithreaded Indexing is going much faster.
I found out after the remote machine became unresponsive very quickly ; it
crashed. lol.
On Fri, Dec 15, 2023 at 4:35 AM Vince McMahon
wrote:
> Ishan,
>
> How do you multi-thread?
>
> Secondly, could you please tell me wh
Oh, Mikhail. Thanks for your questions. There are just 5000 distinct ids.
I'll check during the office hours with the source side to fix their
problem and try again.
Thank you and you are a very wise man.
On Fri, Dec 15, 2023 at 4:32 AM Vince McMahon
wrote:
> Great questions. Here are some
Ishan,
How do you multi-thread?
Secondly, could you please tell me what to look from the log that the
Indexing is committing 1000 documents at a time after executing the code
*solr.commit()?* Do you see anything that tells why it stops after 5000
rows, while there are about 15 rows fetched?
Great questions. Here are some of the answers.
" Exit condition `if Not rows:\n break` is not clear to me. Why should it
work? "
The exit condition is when the postgres_query fetch nothing then "if not
row" will breaks out from the while loop and close the cursor.
" Also, how many distinct ids
NB: it's not easy to build a robust ETL from scratch (btw, have you asked
Copilot or chat gpt for it? ).
I spot a few oddities in the code, but they are not critical.
>From log I see (fwiw, you still have DEBUG log enabled) that 1000 recs were
added in 17 or something secs. It makes some sense. But
If you're able to do multithreaded indexing, it will go much faster.
On Thu, 14 Dec, 2023, 6:51 pm Vince McMahon,
wrote:
> Hi,
>
> I have written a custom python program to Index which may provide a
> better control than DIH.
>
> But, it is still doing at most 5000 documentation. I have enable
16 matches
Mail list logo