Re: MultipleAdditiveTreeModel
Hi Spyros, Roopa, if you can create the Jira ticket with all the details you gathered, that would be much appreciated. If you tag me, Christine Poerschke, and Diego Ceccarelli at least, we'll take over from there! Thanks! -- Alessandro Benedetti Apache Lucene/Solr Committer Director, R&D Software Engineer, Search Consultant www.sease.io On Mon, 26 Jul 2021 at 21:29, Spyros Kapnissis wrote: > Hi Alessandro, Roopa, I also agree that this issue should be further > investigated and fixed. Please let me know if you need any help opening the > Jira ticket and provide more details. > > On Mon, Jul 26, 2021, 21:04 Roopa Rao wrote: > > > Hi Alessandro, > > I haven't created JIRA for this, we solved this the similar way that > Spyros > > described, by changing the threshold in the model. > > Ya it would be good to understand why there is the SLACK added. > > > > Thanks, > > Roopa > > > > On Mon, Jul 26, 2021 at 10:52 AM Alessandro Benedetti < > > a.benede...@sease.io> > > wrote: > > > > > I didn't get any additional notification (or maybe I missed it). > > > Has the Jira been created yet? > > > Boolean features are quite common around Learning To Rank use cases. > > > I do believe this contribution can be useful. > > > If you don't have time to create the Jira or contribute the pull > request, > > > no worries, just let us know and we (committers) will organize to do > it. > > > Thanks for your help. without the effort of our users, Apache Solr > > wouldn't > > > be the same. > > > Cheers > > > -- > > > Alessandro Benedetti > > > Apache Lucene/Solr Committer > > > Director, R&D Software Engineer, Search Consultant > > > > > > www.sease.io > > > > > > > > > On Fri, 16 Jul 2021 at 20:29, Roopa Rao wrote: > > > > > > > Spyros, thank you for verifying this, we are planning to do something > > > > similar. > > > > > > > > Thanks, > > > > Roopa > > > > > > > > On Fri, Jul 16, 2021 at 12:09 PM Spyros Kapnissis > > > > wrote: > > > > > > > > > Hello, > > > > > > > > > > Just to verify this, we had come across the exact same issue when > > > > > converting an XGBoost model to MUltipleAdditiveTrees. This was an > > issue > > > > > specifically with the categorical features that take on integer > > values. > > > > We > > > > > ended up subtracting 0.5 from the threshold value on any such split > > > point > > > > > on the converted model, so that it would output the same score as > the > > > > input > > > > > model. > > > > > > > > > > On Fri, Jul 16, 2021, 18:19 Roopa Rao wrote: > > > > > > > > > > > Okay, thank you for the input > > > > > > > > > > > > Roopa > > > > > > > > > > > > On Fri, Jul 16, 2021 at 5:55 AM Alessandro Benedetti < > > > > > a.benede...@sease.io > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > Hi Roopa, > > > > > > > I was not able to find why that slack was added. > > > > > > > I am not sure why we would like to change the threshold. > > > > > > > I would recommend creating a Jira issue and tag at least > myself, > > > > > > Christine > > > > > > > Poerschke and Diego Ceccarelli, so we can discuss and > potentially > > > > open > > > > > a > > > > > > > pull request. > > > > > > > > > > > > > > Cheers > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Alessandro Benedetti > > > > > > > Apache Lucene/Solr Committer > > > > > > > Director, R&D Software Engineer, Search Consultant > > > > > > > > > > > > > > www.sease.io > > > > > > > > > > > > > > > > > > > > > On Thu, 15 Jul 2021 at 22:24, Roopa Rao > > wrote: > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > In LTR for MultipleAdditiveTreeModel what is the purpose of > > > adding > > > > > > > > NODE_SPLIT_SLACK > > > > > > > > to the threshold? > > > > > > > > > > > > > > > > Reference: > org.apache.solr.ltr.model.MultipleAdditiveTreesModel > > > > > > > > > > > > > > > > private static final float NODE_SPLIT_SLACK = 1E-6f; > > > > > > > > > > > > > > > > > > > > > > > > public void setThreshold(float threshold) { this.threshold = > > > > > threshold > > > > > > + > > > > > > > > NODE_SPLIT_SLACK; } > > > > > > > > > > > > > > > > We have a feature which can return 0.0 or 1.0 > > > > > > > > > > > > > > > > And model with this tree: > > > > > > > > > > > > > > > > > > > > > > > > > > > > is_xyz_feature,threshold=0.9994,left=0.0010180053,right=-0.0057609854 > > > > > > > > > > > > > > > > However when Solr actually scores it it is taking it as > follows > > > > > > > > is_xyz_feature:1.0<= 1.01, Go Left > > > > > > > > > > > > > > > > So all the time it goes to left which is incorrect. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Roopa > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Configuring Solr JSON logs output to file using JsonLayout in log4j2.xml file
Hello, does anyone have a working Solr instance with JSON log format, using JsonLayout in log4j2.xml? Please share your configuration! I have jackson-core and other jackson packages inside the Solr folder: ./server/solr-webapp/webapp/WEB-INF/lib/jackson-core-2.11.2.jar ./server/solr-webapp/webapp/WEB-INF/lib/jackson-databind-2.11.2.jar ./server/solr-webapp/webapp/WEB-INF/lib/jackson-annotations-2.11.2.jar ./server/solr-webapp/webapp/WEB-INF/lib/jackson-dataformat-smile-2.11.2.jar But the log file isn't even created, and I don't even see any errors about this in the output! If I replace back JsonLayout to PatternLayout - all becomes work well. I have found a similar problem in the mail list here https://www.mail-archive.com/solr-user@lucene.apache.org/msg152191.html but it still without a solution. Can anybody help me with this? Maybe I need to add some dependencies manually in some Solr config file, or copy libraries files to some other folder? Thanks! -- Best regards, Alexey Murz Korepov. E-mail: mur...@gmail.com Messengers: Matrix - https://matrix.to/#/@murz:ru-matrix.org Telegram - @MurzNN
Re: How to integrate solr search with sharepoint
Hello Navanth, The Solr Users list (which I have redirected this message to), is a better place for this question, since you are asking about using Solr. The dev list is for developers to discuss releases, architecture issues, and things related to the Solr code. - Houston On Tue, Jul 27, 2021 at 12:40 AM Navnath Namde wrote: > Respected Sir/Madam, > > > > I am new into this solr search, I need to connect my sharepoint site with > solr search and implement the search. > > > > I have sharepoint online and sharepoint on-premises 2013. > > > > How can I connect ? Can anyone please help me out into this? > > > > Many Thanks in advance! > > > > Regards, > > Navnath Namde > > > This message contains information that may be privileged or confidential > and is the property of the KPIT Technologies Ltd. It is intended only for > the person to whom it is addressed. If you are not the intended recipient, > you are not authorized to read, print, retain copy, disseminate, > distribute, or use this message or any part thereof. If you receive this > message in error, please notify the sender immediately and delete all > copies of this message. KPIT Technologies Ltd. does not accept any > liability for virus infected mails. >
RE: [EXTERNAL] Re: How to integrate solr search with sharepoint
I did something similar to this a few years ago for SharePoint 2016, so I'm not sure if it this applies. Anyway, the thing to keep in mind is that there isn't a direct plugin for Solr and SharePoint as far as I know. What I had to do was use a search crawler like Nutch and crawl the SharePoint site and have it indexed into Solr. Then once I got the data into Solr, I could then formulate my search queries. Steps: 1. Create a Solr collection for the SharePoint data 2. Crawl the SharePoint site using a crawler such as Nutch 3. Verify that the crawler has indexed all of the SharePoint data you desire 4. Formulate your search queries using the Solr APIs Things to keep in mind: - Depending on how locked down your SharePoint site, you may struggle to have your crawler access the data. For example, I had to create a proxy to allow the crawler to access the SharePoint data - You'll have to think about how you will restrict searches made by users if you are dealing with sensitive data or data that isn’t supposed to be viewed by all searchers I hope this information helps! Good Luck, Ryan Rosario -Original Message- From: Houston Putman Sent: Tuesday, July 27, 2021 11:51 AM To: users@solr.apache.org; navnath.na...@kpit.com Subject: [EXTERNAL] Re: How to integrate solr search with sharepoint Hello Navanth, The Solr Users list (which I have redirected this message to), is a better place for this question, since you are asking about using Solr. The dev list is for developers to discuss releases, architecture issues, and things related to the Solr code. - Houston On Tue, Jul 27, 2021 at 12:40 AM Navnath Namde wrote: > Respected Sir/Madam, > > > > I am new into this solr search, I need to connect my sharepoint site > with solr search and implement the search. > > > > I have sharepoint online and sharepoint on-premises 2013. > > > > How can I connect ? Can anyone please help me out into this? > > > > Many Thanks in advance! > > > > Regards, > > Navnath Namde > > > This message contains information that may be privileged or > confidential and is the property of the KPIT Technologies Ltd. It is > intended only for the person to whom it is addressed. If you are not > the intended recipient, you are not authorized to read, print, retain > copy, disseminate, distribute, or use this message or any part > thereof. If you receive this message in error, please notify the > sender immediately and delete all copies of this message. KPIT > Technologies Ltd. does not accept any liability for virus infected mails. >
Help with unsubscribe because automated didn't work
Hello, I would like to unsubscribe kmcc...@u.washington.edu, kmcc...@uw.edu, kmcc...@uw.cse.edu from the listserv. Any email with "kmccok@". Hello, I am no longer able to send an email out from those emails, but I would like to leave the forwarding turned on, however the solr listserv is forwarding. I have been getting the solr emails since 2014 and I am no longer interested. I tried unsubscribing from this email by emailing users-unsubscr...@solr.apache.org, however that did not work. Thank you so much, Katie
Re: Help with unsubscribe because automated didn't work
Hi Katie, I've unsubscribed those three addresses from the users@solr mailing list. Please reach out if you continue to receive emails. On Tue, Jul 27, 2021 at 12:17 PM kmccork wrote: > Hello, > > I would like to unsubscribe kmcc...@u.washington.edu, kmcc...@uw.edu, > kmcc...@uw.cse.edu from the listserv. Any email with "kmccok@". > > Hello, > > I am no longer able to send an email out from those emails, but I would > like to leave the forwarding turned on, however the solr listserv is > forwarding. I have been getting the solr emails since 2014 and I am no > longer interested. > > I tried unsubscribing from this email by emailing > users-unsubscr...@solr.apache.org, however that did not work. > > Thank you so much, > Katie >
Re: Help with unsubscribe because automated didn't work
Hi Anshum, Thank you so much!!! Literally have been trying to do this since 2014. (Obviously gave up for a few years there. haha) I'll miss y'all!! Katie On Tue, Jul 27, 2021 at 12:36 PM Anshum Gupta wrote: > Hi Katie, > > I've unsubscribed those three addresses from the users@solr mailing list. > Please reach out if you continue to receive emails. > > On Tue, Jul 27, 2021 at 12:17 PM kmccork wrote: > >> Hello, >> >> I would like to unsubscribe kmcc...@u.washington.edu, kmcc...@uw.edu, >> kmcc...@uw.cse.edu from the listserv. Any email with "kmccok@". >> >> Hello, >> >> I am no longer able to send an email out from those emails, but I would >> like to leave the forwarding turned on, however the solr listserv is >> forwarding. I have been getting the solr emails since 2014 and I am no >> longer interested. >> >> I tried unsubscribing from this email by emailing >> users-unsubscr...@solr.apache.org, however that did not work. >> >> Thank you so much, >> Katie >> >
Re: Commit strategy for Heavy Bulk Indexing into solr
So it looks like I have narrowed down where the problem is and have also found a workaround but I would like to understand more. As I had mentioned, we have two stages in our bulk indexing operation. stage 1 : index Article documents [A1, A2.An] stage 2 : index Article documents with children [A1 with children, A2 with children..An with children] We were always running into issues in stage 2. After some time in stage 2, *solrClient.add( , commitWithin )* starts to timeout and then these timeouts happen consistently. Even the socketTimeout of 30 mins was exceeded by add call and we got socketTimeoutException. We have set commitWithin to be 6 hours to avoid unnecessary soft commits. Auto commit interval is 1 min with openSearcher=false and autoSoftCommit interval is 5 min. As mentioned above, we first index just the Articles in stage 1 and then in stage 2, the same set of Articles are indexed with children (block join). I had a suspicion that the huge amount of time taken by *solrClient.add* call can have something to do with the *block join updates *that take place in stage 2. Adding fresh joins of Articles with children on an empty collection was much faster and ran without SocketTimeout. So I modified our indexing pipeline to be as follows. 1. stage 1 : index Article documents [A1, A2.An] 2. delete all the Article documents 3. stage 2 : index Article documents with children [A1 with children, A2 with children..An with children] With this change, stage 2 would be a simple *add operation and not an update operation.* I tested the bulk indexing with this change and it finished successfully without any issues in a shorter time period! It will be very helpful to know what is the difference between A: When we add a document with children when collection does not already have the same document B: When we add a document with children when collection already has the same document without children I understand that *update *takes place in B but how can we explain such a difference in performance between A and B. Please note that we use RxJava and call solrClient.add() in parallel threads with a set of Article documents and the socketTimeout issue seems to pop up after we have already indexed about 90% of the documents. Some more clarity on what could be happening will be very useful. Thanks On Fri, Jul 23, 2021 at 2:31 PM Pratik Patel wrote: > Hi All, > > *tl;dr* : running into long GC pauses and solr client socket timeouts > when indexing bulk of documents into solr. Commit strategy in essence is to > do hard commits at the interval of 50k documents (maxDocs=50k) and disable > soft commit altogether during bulk indexing. Simple solr cloud set up with > one node and one shard. > > *Details*: > We have about 6 million documents which we are trying to index into solr. > From these, about 500k documents have a text field which holds Abstracts of > scientific papers/Articles. We extract keywords from these Abstracts and we > index these keywords as well into solr. > > We have a many to many kind of relationship between Articles and keywords. > To store this, we have following structure. > > Article documents > Keyword documents > Article-Keyword Join documents > > We use block join to index Articles with "Article-Keyword" join documents > and Keyword documents are indexed independently. > > In other words, we have blocks of "Article + Article-Keyword Joins" and we > have Keyword documents(they hold some additional metadata about keyword ). > > We have a bulk processing operation which creates these documents and > indexes them into solr. During this bulk indexing, we don't need documents > to be searchable. We need to search against them only after ALL the > documents are indexed. > > *Based on this, this is our current strategy. * > Soft commits are disabled and Hard commits are done at an interval of 50k > documents with openSearcher=false. Our code triggers explicit commits 4 > times after various stages of bulk indexing. Transaction logs are enabled > and have default settings. > > > ${solr.autoCommit.maxTime:-1} > ${solr.autoCommit.maxDocs:5} > false > > > > ${solr.autoSoftCommit.maxTime:-1} > > > Other Environmental Details: > Xms=8g and Xmx=14g, solr client socketTimeout=7 minutes and > zkClienttimeout=2 mins > Our indexing operation triggers many "add" operations in parallel using > RxJava (15 to 30 threads) each "add" operation is passed about 1000 > documents. > > Currently, when we run this indexing operation, we notice that after a > while solr goes into long GC pauses (longer than our sockeTimeout of 7 > minutes) and we get SocketTimeoutExceptions. > > *What could be causing such long GC pauses?* > > *Does this commit strategy make sense ? If not, what is the recommended > strategy that we can look into? * > > *Any help on this is much appreciated. Thanks.* > >
Re: Help with unsubscribe because automated didn't work
Can the same be done for my email address jagpreetkhan...@gmail.com as well please Thanks Sent from my iPhone On Jul 27, 2021, at 20:36, Anshum Gupta wrote: Hi Katie, I've unsubscribed those three addresses from the users@solr mailing list. Please reach out if you continue to receive emails. On Tue, Jul 27, 2021 at 12:17 PM kmccork wrote: > Hello, > > I would like to unsubscribe kmcc...@u.washington.edu, kmcc...@uw.edu, > kmcc...@uw.cse.edu from the listserv. Any email with "kmccok@". > > Hello, > > I am no longer able to send an email out from those emails, but I would > like to leave the forwarding turned on, however the solr listserv is > forwarding. I have been getting the solr emails since 2014 and I am no > longer interested. > > I tried unsubscribing from this email by emailing > users-unsubscr...@solr.apache.org, however that did not work. > > Thank you so much, > Katie >
Re: Help with unsubscribe because automated didn't work
Done. On Tue, Jul 27, 2021 at 1:36 PM Jagpreet Mahajan wrote: > Can the same be done for my email address jagpreetkhan...@gmail.com as > well please > > Thanks > > Sent from my iPhone > > On Jul 27, 2021, at 20:36, Anshum Gupta wrote: > > Hi Katie, > > I've unsubscribed those three addresses from the users@solr mailing list. > Please reach out if you continue to receive emails. > > On Tue, Jul 27, 2021 at 12:17 PM kmccork wrote: > > > Hello, > > > > I would like to unsubscribe kmcc...@u.washington.edu, kmcc...@uw.edu, > > kmcc...@uw.cse.edu from the listserv. Any email with "kmccok@". > > > > Hello, > > > > I am no longer able to send an email out from those emails, but I would > > like to leave the forwarding turned on, however the solr listserv is > > forwarding. I have been getting the solr emails since 2014 and I am no > > longer interested. > > > > I tried unsubscribing from this email by emailing > > users-unsubscr...@solr.apache.org, however that did not work. > > > > Thank you so much, > > Katie > > > -- Anshum Gupta