Re: how to fix full-import indexed document not shown on query search

Mikhail Khludnev Mon, 11 Dec 2023 02:46:35 -0800

Hello,
1. Start on small scope
2. UI may crash if you request debug to respond back a lot of verbose data
that's not a way to go.
3. use curl or postman to receive big respionses
4. you may open log file with less which is a perfect tool for large logs.
Looking logs in ui hardly may be an option.


On Mon, Dec 11, 2023 at 1:25 PM Vince McMahon <sippingonesandze...@gmail.com>
wrote:

> Thanks for pointing that out.
>
> I work with relatively large data volumes.  Last time when I use verbose
> logging via Solr UI, it crashed.  What is the better way to gather the logs
> without the crashing?
>
>
> On Fri, Dec 8, 2023 at 5:11 PM Mikhail Khludnev <m...@apache.org> wrote:
>
> > Vince,
> > Regardless of DIH, LogUpdateProcessorFactory
> > <
> >
> https://solr.apache.org/guide/solr/latest/configuration-guide/update-request-processors.html#update-processor-factories-you-should-not-modify-or-remove
> > >
> > should log deleteQuery which wiped the docs. You can enable verbose
> logging
> > and find out what happend.
> >
> > On Fri, Dec 8, 2023 at 4:29 PM Vince McMahon <
> > sippingonesandze...@gmail.com>
> > wrote:
> >
> > > Hi,  ufuk
> > >
> > > I was thinking along the same lines to broaden the tool of choice
> > > on handling delta-load.  Flume looks like an interesting option.
> > >
> > > I'm so blessed to be working with so many smart and kind people in this
> > > mailing list.
> > >
> > > Thank you.  Happy Friday.
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Dec 8, 2023 at 1:48 AM ufuk yılmaz <uyil...@vivaldi.net.invalid
> >
> > > wrote:
> > >
> > > > Hi Vince,
> > > >
> > > > It shouldn’t take too much time to write a simple loop in your
> favorite
> > > > language which fetches rows from the db and sends them to Solr over
> > http
> > > to
> > > > /update handler. Imo It’s easier than trying to figure out DIH’s
> > > > particularities. Especially in the future, if you need to modify the
> > > > documents based on some logical conditions before indexing.
> > > >
> > > > If you don’t mind learning yet another tool, we used Apache Flume to
> > > index
> > > > data to Solr. It supports moving data from various sources into
> various
> > > > destinations. For your use case, maybe you can use sql as source and
> > > > MorphlineSolrSink as the destination (ctrl+f here:
> > > > https://flume.apache.org/releases/content/1.11.0/FlumeUserGuide.html
> )
> > > > There is an sql source plugin here which looks a bit old but may
> work:
> > > > https://github.com/keedio/flume-ng-sql-source
> > > > You can also write your own source plugin. Flume just helps with
> > > > guaranteed delivery, if you understand it’s way of working.
> > > >
> > > > I don’t know your business case but I’d prefer the first option most
> of
> > > > the time.
> > > >
> > > > -ufuk yilmaz
> > > >
> > > > —
> > > >
> > > > > On 8 Dec 2023, at 02:22, Vince McMahon <
> > sippingonesandze...@gmail.com>
> > > > wrote:
> > > > >
> > > > > Thanks, Shawn.
> > > > >
> > > > > DIH full-import, by itself works very well.  It is bummer that my
> > > > > incremental load itself is into millions.  When specifying
> batchSize
> > on
> > > > > data source, the delta-import will honor that batch size once, for
> > the
> > > > > first fetch, then will loop the rest by hundreds per sec.  That
> > doesn't
> > > > > help getting all the Indexing done in a day for my need.
> > > > >
> > > > > I hope this finding may help the maintainer of the code to improve.
> > It
> > > > > took me days to realize it.
> > > > >
> > > > > Thanks, again.
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Dec 7, 2023, 4:49 PM Shawn Heisey
> > <apa...@elyograg.org.invalid
> > > >
> > > > > wrote:
> > > > >
> > > > >>> On 12/7/23 07:56, Vince McMahon wrote:
> > > > >>> {
> > > > >>>   "responseHeader": {
> > > > >>>     "status": 0,
> > > > >>>     "QTime": 0
> > > > >>>   },
> > > > >>>   "initArgs": [
> > > > >>>     "defaults",
> > > > >>>     [
> > > > >>>       "config",
> > > > >>>       "db-data-config.xml"
> > > > >>>     ]
> > > > >>>   ],
> > > > >>>   "command": "status",
> > > > >>>   "status": "idle",
> > > > >>>   "importResponse": "",
> > > > >>>   "statusMessages": {
> > > > >>>     "Total Requests made to DataSource": "1",
> > > > >>>     "Total Rows Fetched": "915000",
> > > > >>>     "Total Documents Processed": "915000",
> > > > >>>     "Total Documents Skipped": "0",
> > > > >>>     "Full Dump Started": "2023-12-07 02:54:29",
> > > > >>>     "": "Indexing completed. Added/Updated: 915000 documents.
> > Deleted
> > > > >>> 0 documents.",
> > > > >>>     "Committed": "2023-12-07 02:54:51",
> > > > >>>     "Time taken": "0:0:21.831"
> > > > >>>   }
> > > > >>> }
> > > > >>
> > > > >> There's no way Solr can index 915000 docs in 21 seconds without a
> > LOT
> > > of
> > > > >> threads in the indexing program, and DIH is single-threaded.  As
> > > you've
> > > > >> already noted, it didn't actually index most of the documents.  I
> > > don't
> > > > >> have an answer as to why it didn't work.
> > > > >>
> > > > >> DIH lacks decent logging, error handling, and multi-threading.  It
> > is
> > > > >> not the most reliable way to index.  This is why it was
> deprecated a
> > > > >> while back and then removed from 9.x.  You would be far better off
> > > > >> writing your own indexing program rather than using DIH.
> > > > >>
> > > > >> I have an idea for a multi-threaded database->solr indexing
> program,
> > > but
> > > > >> haven't had much time to spend on it.  If I can ever get it done,
> it
> > > > >> will be freely available.
> > > > >>
> > > > >> On the entity, "rows" is not a valid attribute.  To control how
> many
> > > DB
> > > > >> rows are fetched at a time, set batchSize on the dataSource
> element.
> > > > >> The default batchSize is 500.
> > > > >>
> > > > >> Thanks,
> > > > >> Shawn
> > > > >>
> > > > >>
> > > >
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev

Re: how to fix full-import indexed document not shown on query search

Reply via email to