Hello, 1. Start on small scope 2. UI may crash if you request debug to respond back a lot of verbose data that's not a way to go. 3. use curl or postman to receive big respionses 4. you may open log file with less which is a perfect tool for large logs. Looking logs in ui hardly may be an option.
On Mon, Dec 11, 2023 at 1:25 PM Vince McMahon <sippingonesandze...@gmail.com> wrote: > Thanks for pointing that out. > > I work with relatively large data volumes. Last time when I use verbose > logging via Solr UI, it crashed. What is the better way to gather the logs > without the crashing? > > > On Fri, Dec 8, 2023 at 5:11 PM Mikhail Khludnev <m...@apache.org> wrote: > > > Vince, > > Regardless of DIH, LogUpdateProcessorFactory > > < > > > https://solr.apache.org/guide/solr/latest/configuration-guide/update-request-processors.html#update-processor-factories-you-should-not-modify-or-remove > > > > > should log deleteQuery which wiped the docs. You can enable verbose > logging > > and find out what happend. > > > > On Fri, Dec 8, 2023 at 4:29 PM Vince McMahon < > > sippingonesandze...@gmail.com> > > wrote: > > > > > Hi, ufuk > > > > > > I was thinking along the same lines to broaden the tool of choice > > > on handling delta-load. Flume looks like an interesting option. > > > > > > I'm so blessed to be working with so many smart and kind people in this > > > mailing list. > > > > > > Thank you. Happy Friday. > > > > > > > > > > > > > > > > > > > > > On Fri, Dec 8, 2023 at 1:48 AM ufuk yılmaz <uyil...@vivaldi.net.invalid > > > > > wrote: > > > > > > > Hi Vince, > > > > > > > > It shouldn’t take too much time to write a simple loop in your > favorite > > > > language which fetches rows from the db and sends them to Solr over > > http > > > to > > > > /update handler. Imo It’s easier than trying to figure out DIH’s > > > > particularities. Especially in the future, if you need to modify the > > > > documents based on some logical conditions before indexing. > > > > > > > > If you don’t mind learning yet another tool, we used Apache Flume to > > > index > > > > data to Solr. It supports moving data from various sources into > various > > > > destinations. For your use case, maybe you can use sql as source and > > > > MorphlineSolrSink as the destination (ctrl+f here: > > > > https://flume.apache.org/releases/content/1.11.0/FlumeUserGuide.html > ) > > > > There is an sql source plugin here which looks a bit old but may > work: > > > > https://github.com/keedio/flume-ng-sql-source > > > > You can also write your own source plugin. Flume just helps with > > > > guaranteed delivery, if you understand it’s way of working. > > > > > > > > I don’t know your business case but I’d prefer the first option most > of > > > > the time. > > > > > > > > -ufuk yilmaz > > > > > > > > — > > > > > > > > > On 8 Dec 2023, at 02:22, Vince McMahon < > > sippingonesandze...@gmail.com> > > > > wrote: > > > > > > > > > > Thanks, Shawn. > > > > > > > > > > DIH full-import, by itself works very well. It is bummer that my > > > > > incremental load itself is into millions. When specifying > batchSize > > on > > > > > data source, the delta-import will honor that batch size once, for > > the > > > > > first fetch, then will loop the rest by hundreds per sec. That > > doesn't > > > > > help getting all the Indexing done in a day for my need. > > > > > > > > > > I hope this finding may help the maintainer of the code to improve. > > It > > > > > took me days to realize it. > > > > > > > > > > Thanks, again. > > > > > > > > > > > > > > > > > > > > On Thu, Dec 7, 2023, 4:49 PM Shawn Heisey > > <apa...@elyograg.org.invalid > > > > > > > > > wrote: > > > > > > > > > >>> On 12/7/23 07:56, Vince McMahon wrote: > > > > >>> { > > > > >>> "responseHeader": { > > > > >>> "status": 0, > > > > >>> "QTime": 0 > > > > >>> }, > > > > >>> "initArgs": [ > > > > >>> "defaults", > > > > >>> [ > > > > >>> "config", > > > > >>> "db-data-config.xml" > > > > >>> ] > > > > >>> ], > > > > >>> "command": "status", > > > > >>> "status": "idle", > > > > >>> "importResponse": "", > > > > >>> "statusMessages": { > > > > >>> "Total Requests made to DataSource": "1", > > > > >>> "Total Rows Fetched": "915000", > > > > >>> "Total Documents Processed": "915000", > > > > >>> "Total Documents Skipped": "0", > > > > >>> "Full Dump Started": "2023-12-07 02:54:29", > > > > >>> "": "Indexing completed. Added/Updated: 915000 documents. > > Deleted > > > > >>> 0 documents.", > > > > >>> "Committed": "2023-12-07 02:54:51", > > > > >>> "Time taken": "0:0:21.831" > > > > >>> } > > > > >>> } > > > > >> > > > > >> There's no way Solr can index 915000 docs in 21 seconds without a > > LOT > > > of > > > > >> threads in the indexing program, and DIH is single-threaded. As > > > you've > > > > >> already noted, it didn't actually index most of the documents. I > > > don't > > > > >> have an answer as to why it didn't work. > > > > >> > > > > >> DIH lacks decent logging, error handling, and multi-threading. It > > is > > > > >> not the most reliable way to index. This is why it was > deprecated a > > > > >> while back and then removed from 9.x. You would be far better off > > > > >> writing your own indexing program rather than using DIH. > > > > >> > > > > >> I have an idea for a multi-threaded database->solr indexing > program, > > > but > > > > >> haven't had much time to spend on it. If I can ever get it done, > it > > > > >> will be freely available. > > > > >> > > > > >> On the entity, "rows" is not a valid attribute. To control how > many > > > DB > > > > >> rows are fetched at a time, set batchSize on the dataSource > element. > > > > >> The default batchSize is 500. > > > > >> > > > > >> Thanks, > > > > >> Shawn > > > > >> > > > > >> > > > > > > > > > > > > > -- > > Sincerely yours > > Mikhail Khludnev > > > -- Sincerely yours Mikhail Khludnev