date:20210727

Re: MultipleAdditiveTreeModel

2021-07-27 Thread Alessandro Benedetti

Hi Spyros, Roopa,
if you can create the Jira ticket with all the details you gathered, that
would be much appreciated.
If you tag me, Christine Poerschke, and Diego Ceccarelli at least, we'll
take over from there!
Thanks!
--
Alessandro Benedetti
Apache Lucene/Solr Committer
Director, R&D Software Engineer, Search Consultant

www.sease.io


On Mon, 26 Jul 2021 at 21:29, Spyros Kapnissis  wrote:

> Hi Alessandro, Roopa, I also agree that this issue should be further
> investigated and fixed. Please let me know if you need any help opening the
> Jira ticket and provide more details.
>
> On Mon, Jul 26, 2021, 21:04 Roopa Rao  wrote:
>
> > Hi Alessandro,
> > I haven't created JIRA for this, we solved this the similar way that
> Spyros
> > described, by changing the threshold in the model.
> > Ya it would be good to understand why there is the SLACK added.
> >
> > Thanks,
> > Roopa
> >
> > On Mon, Jul 26, 2021 at 10:52 AM Alessandro Benedetti <
> > a.benede...@sease.io>
> > wrote:
> >
> > > I didn't get any additional notification (or maybe I missed it).
> > > Has the Jira been created yet?
> > > Boolean features are quite common around Learning To Rank use cases.
> > > I do believe this contribution can be useful.
> > > If you don't have time to create the Jira or contribute the pull
> request,
> > > no worries, just let us know and we (committers) will organize to do
> it.
> > > Thanks for your help. without the effort of our users, Apache Solr
> > wouldn't
> > > be the same.
> > > Cheers
> > > --
> > > Alessandro Benedetti
> > > Apache Lucene/Solr Committer
> > > Director, R&D Software Engineer, Search Consultant
> > >
> > > www.sease.io
> > >
> > >
> > > On Fri, 16 Jul 2021 at 20:29, Roopa Rao  wrote:
> > >
> > > > Spyros, thank you for verifying this, we are planning to do something
> > > > similar.
> > > >
> > > > Thanks,
> > > > Roopa
> > > >
> > > > On Fri, Jul 16, 2021 at 12:09 PM Spyros Kapnissis 
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > Just to verify this, we had come across the exact same issue when
> > > > > converting an XGBoost model to MUltipleAdditiveTrees. This was an
> > issue
> > > > > specifically with the categorical features that take on integer
> > values.
> > > > We
> > > > > ended up subtracting 0.5 from the threshold value on any such split
> > > point
> > > > > on the converted model, so that it would output the same score as
> the
> > > > input
> > > > > model.
> > > > >
> > > > > On Fri, Jul 16, 2021, 18:19 Roopa Rao  wrote:
> > > > >
> > > > > > Okay, thank you for the input
> > > > > >
> > > > > > Roopa
> > > > > >
> > > > > > On Fri, Jul 16, 2021 at 5:55 AM Alessandro Benedetti <
> > > > > a.benede...@sease.io
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Roopa,
> > > > > > > I was not able to find why that slack was added.
> > > > > > > I am not sure why we would like to change the threshold.
> > > > > > > I would recommend creating a Jira issue and tag at least
> myself,
> > > > > > Christine
> > > > > > > Poerschke and Diego Ceccarelli, so we can discuss and
> potentially
> > > > open
> > > > > a
> > > > > > > pull request.
> > > > > > >
> > > > > > > Cheers
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Alessandro Benedetti
> > > > > > > Apache Lucene/Solr Committer
> > > > > > > Director, R&D Software Engineer, Search Consultant
> > > > > > >
> > > > > > > www.sease.io
> > > > > > >
> > > > > > >
> > > > > > > On Thu, 15 Jul 2021 at 22:24, Roopa Rao 
> > wrote:
> > > > > > >
> > > > > > > > Hi All,
> > > > > > > >
> > > > > > > > In LTR for MultipleAdditiveTreeModel what is the purpose of
> > > adding
> > > > > > > > NODE_SPLIT_SLACK
> > > > > > > > to the threshold?
> > > > > > > >
> > > > > > > > Reference:
> org.apache.solr.ltr.model.MultipleAdditiveTreesModel
> > > > > > > >
> > > > > > > > private static final float NODE_SPLIT_SLACK = 1E-6f;
> > > > > > > >
> > > > > > > >
> > > > > > > > public void setThreshold(float threshold) { this.threshold =
> > > > > threshold
> > > > > > +
> > > > > > > > NODE_SPLIT_SLACK; }
> > > > > > > >
> > > > > > > > We have a feature which can return 0.0 or 1.0
> > > > > > > >
> > > > > > > > And model with this tree:
> > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > is_xyz_feature,threshold=0.9994,left=0.0010180053,right=-0.0057609854
> > > > > > > >
> > > > > > > > However when Solr actually scores it it is taking it as
> follows
> > > > > > > > is_xyz_feature:1.0<= 1.01, Go Left
> > > > > > > >
> > > > > > > > So all the time it goes to left which is incorrect.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Roopa
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Configuring Solr JSON logs output to file using JsonLayout in log4j2.xml file

2021-07-27 Thread Alexey Murz Korepov

Hello, does anyone have a working Solr instance with JSON log format, using
JsonLayout in log4j2.xml? Please share your configuration!

I have jackson-core and other jackson packages inside the Solr folder:
./server/solr-webapp/webapp/WEB-INF/lib/jackson-core-2.11.2.jar
./server/solr-webapp/webapp/WEB-INF/lib/jackson-databind-2.11.2.jar
./server/solr-webapp/webapp/WEB-INF/lib/jackson-annotations-2.11.2.jar
./server/solr-webapp/webapp/WEB-INF/lib/jackson-dataformat-smile-2.11.2.jar

But the log file isn't even created, and I don't even see any errors about
this in the output!

If I replace back JsonLayout to PatternLayout - all becomes work well.

I have found a similar problem in the mail list here
https://www.mail-archive.com/solr-user@lucene.apache.org/msg152191.html but
it still without a solution.

Can anybody help me with this? Maybe I need to add some dependencies
manually in some Solr config file, or copy libraries files to some
other folder? Thanks!

-- 
Best regards,
Alexey Murz Korepov.
E-mail: mur...@gmail.com
Messengers: Matrix - https://matrix.to/#/@murz:ru-matrix.org Telegram -
@MurzNN

Re: How to integrate solr search with sharepoint

2021-07-27 Thread Houston Putman

Hello Navanth,

The Solr Users list (which I have redirected this message to), is a better
place for this question, since you are asking about using Solr. The dev
list is for developers to discuss releases, architecture issues, and things
related to the Solr code.

- Houston

On Tue, Jul 27, 2021 at 12:40 AM Navnath Namde 
wrote:

> Respected Sir/Madam,
>
>
>
> I am new into this solr search, I need to connect my sharepoint site with
> solr search and implement the search.
>
>
>
> I have sharepoint online and sharepoint on-premises 2013.
>
>
>
> How can I connect ? Can anyone please help me out into this?
>
>
>
> Many Thanks in advance!
>
>
>
> Regards,
>
> Navnath Namde
>
>
> This message contains information that may be privileged or confidential
> and is the property of the KPIT Technologies Ltd. It is intended only for
> the person to whom it is addressed. If you are not the intended recipient,
> you are not authorized to read, print, retain copy, disseminate,
> distribute, or use this message or any part thereof. If you receive this
> message in error, please notify the sender immediately and delete all
> copies of this message. KPIT Technologies Ltd. does not accept any
> liability for virus infected mails.
>

RE: [EXTERNAL] Re: How to integrate solr search with sharepoint

2021-07-27 Thread Rosario, Ryan C. (JSC-IO111)[MORI ASSOCIATES INC]

I did something similar to this a few years ago for SharePoint 2016, so I'm not 
sure if it this applies. Anyway, the thing to keep in mind is that there isn't 
a direct plugin for Solr and SharePoint as far as I know. What I had to do was 
use a search crawler like Nutch and crawl the SharePoint site and have it 
indexed into Solr. Then once I got the data into Solr, I could then formulate 
my search queries. 

Steps:
1. Create a Solr collection for the SharePoint data
2. Crawl the SharePoint site using a crawler such as Nutch
3. Verify that the crawler has indexed all of the SharePoint data you desire
4. Formulate your search queries using the Solr APIs

Things to keep in mind:
- Depending on how locked down your SharePoint site, you may struggle to have 
your crawler access the data. For example, I had to create a proxy to allow the 
crawler to access the SharePoint data
- You'll have to think about how you will restrict searches made by users if 
you are dealing with sensitive data or data that isn’t supposed to be viewed by 
all searchers

I hope this information helps!

Good Luck,

Ryan Rosario

-Original Message-
From: Houston Putman  
Sent: Tuesday, July 27, 2021 11:51 AM
To: users@solr.apache.org; navnath.na...@kpit.com
Subject: [EXTERNAL] Re: How to integrate solr search with sharepoint

Hello Navanth,

The Solr Users list (which I have redirected this message to), is a better 
place for this question, since you are asking about using Solr. The dev list is 
for developers to discuss releases, architecture issues, and things related to 
the Solr code.

- Houston

On Tue, Jul 27, 2021 at 12:40 AM Navnath Namde 
wrote:

> Respected Sir/Madam,
>
>
>
> I am new into this solr search, I need to connect my sharepoint site 
> with solr search and implement the search.
>
>
>
> I have sharepoint online and sharepoint on-premises 2013.
>
>
>
> How can I connect ? Can anyone please help me out into this?
>
>
>
> Many Thanks in advance!
>
>
>
> Regards,
>
> Navnath Namde
>
>
> This message contains information that may be privileged or 
> confidential and is the property of the KPIT Technologies Ltd. It is 
> intended only for the person to whom it is addressed. If you are not 
> the intended recipient, you are not authorized to read, print, retain 
> copy, disseminate, distribute, or use this message or any part 
> thereof. If you receive this message in error, please notify the 
> sender immediately and delete all copies of this message. KPIT 
> Technologies Ltd. does not accept any liability for virus infected mails.
>

Help with unsubscribe because automated didn't work

2021-07-27 Thread kmccork

Hello,

I would like to unsubscribe kmcc...@u.washington.edu, kmcc...@uw.edu,
kmcc...@uw.cse.edu from the listserv. Any email with "kmccok@".

Hello,

I am no longer able to send an email out from those emails, but I would
like to leave the forwarding turned on, however the solr listserv is
forwarding. I have been getting the solr emails since 2014 and I am no
longer interested.

I tried unsubscribing from this email by emailing
users-unsubscr...@solr.apache.org, however that did not work.

Thank you so much,
Katie

Re: Help with unsubscribe because automated didn't work

2021-07-27 Thread Anshum Gupta

Hi Katie,

I've unsubscribed those three addresses from the users@solr mailing list.
Please reach out if you continue to receive emails.

On Tue, Jul 27, 2021 at 12:17 PM kmccork  wrote:

> Hello,
>
> I would like to unsubscribe kmcc...@u.washington.edu, kmcc...@uw.edu,
> kmcc...@uw.cse.edu from the listserv. Any email with "kmccok@".
>
> Hello,
>
> I am no longer able to send an email out from those emails, but I would
> like to leave the forwarding turned on, however the solr listserv is
> forwarding. I have been getting the solr emails since 2014 and I am no
> longer interested.
>
> I tried unsubscribing from this email by emailing
> users-unsubscr...@solr.apache.org, however that did not work.
>
> Thank you so much,
> Katie
>

Re: Help with unsubscribe because automated didn't work

2021-07-27 Thread kmccork

Hi Anshum, Thank you so much!!! Literally have been trying to do this since
2014. (Obviously gave up for a few years there. haha) I'll miss y'all!!

Katie

On Tue, Jul 27, 2021 at 12:36 PM Anshum Gupta  wrote:

> Hi Katie,
>
> I've unsubscribed those three addresses from the users@solr mailing list.
> Please reach out if you continue to receive emails.
>
> On Tue, Jul 27, 2021 at 12:17 PM kmccork  wrote:
>
>> Hello,
>>
>> I would like to unsubscribe kmcc...@u.washington.edu, kmcc...@uw.edu,
>> kmcc...@uw.cse.edu from the listserv. Any email with "kmccok@".
>>
>> Hello,
>>
>> I am no longer able to send an email out from those emails, but I would
>> like to leave the forwarding turned on, however the solr listserv is
>> forwarding. I have been getting the solr emails since 2014 and I am no
>> longer interested.
>>
>> I tried unsubscribing from this email by emailing
>> users-unsubscr...@solr.apache.org, however that did not work.
>>
>> Thank you so much,
>> Katie
>>
>

Re: Commit strategy for Heavy Bulk Indexing into solr

2021-07-27 Thread Pratik Patel

So it looks like I have narrowed down where the problem is and have also
found a workaround but I would like to understand more.

As I had mentioned, we have two stages in our bulk indexing operation.

stage 1 : index Article documents [A1, A2.An]
stage 2 : index Article documents with children [A1 with children, A2 with
children..An with children]

We were always running into issues in stage 2.
After some time in stage 2, *solrClient.add( ,
commitWithin )* starts to timeout and then these timeouts happen
consistently. Even the socketTimeout of 30 mins was exceeded by add call
and we got socketTimeoutException.

We have set commitWithin to be 6 hours to avoid unnecessary soft commits.
Auto commit interval is 1 min with openSearcher=false and autoSoftCommit
interval is 5 min.

As mentioned above, we first index just the Articles in stage 1 and then in
stage 2, the same set of Articles are indexed with children (block join). I
had a suspicion that the huge amount of time taken by *solrClient.add* call
can have something to do with the *block join updates *that take place in
stage 2. Adding fresh joins of Articles with children on an empty
collection was much faster and ran without SocketTimeout. So I modified our
indexing pipeline to be as follows.

1. stage 1 : index Article documents [A1, A2.An]
2. delete all the Article documents
3. stage 2 : index Article documents with children [A1 with children, A2
with children..An with children]

With this change, stage 2 would be a simple *add operation and not an
update operation.* I tested the bulk indexing with this change and it
finished successfully without any issues in a shorter time period!

It will be very helpful to know what is the difference between
A: When we add a document with children when collection does not already
have the same document
B: When we add a document with children when collection already has the
same document without children

I understand that *update *takes place in B but how can we explain such a
difference in performance between A and B.

Please note that we use RxJava and call solrClient.add() in parallel
threads with a set of Article documents and the socketTimeout issue seems
to pop up after we have already indexed about 90% of the documents.

Some more clarity on what could be happening will be very useful.

Thanks

On Fri, Jul 23, 2021 at 2:31 PM Pratik Patel  wrote:

> Hi All,
>
> *tl;dr* : running into long GC pauses and solr client socket timeouts
> when indexing bulk of documents into solr. Commit strategy in essence is to
> do hard commits at the interval of 50k documents (maxDocs=50k) and disable
> soft commit altogether during bulk indexing. Simple solr cloud set up with
> one node and one shard.
>
> *Details*:
> We have about 6 million documents which we are trying to index into solr.
> From these, about 500k documents have a text field which holds Abstracts of
> scientific papers/Articles. We extract keywords from these Abstracts and we
> index these keywords as well into solr.
>
> We have a many to many kind of relationship between Articles and keywords.
> To store this, we have following structure.
>
> Article documents
> Keyword documents
> Article-Keyword Join documents
>
> We use block join to index Articles with "Article-Keyword" join documents
> and Keyword documents are indexed independently.
>
> In other words, we have blocks of "Article + Article-Keyword Joins" and we
> have Keyword documents(they hold some additional metadata about keyword ).
>
> We have a bulk processing operation which creates these documents and
> indexes them into solr. During this bulk indexing, we don't need documents
> to be searchable. We need to search against them only after ALL the
> documents are indexed.
>
> *Based on this, this is our current strategy. *
> Soft commits are disabled and Hard commits are done at an interval of 50k
> documents with openSearcher=false. Our code triggers explicit commits 4
> times after various stages of bulk indexing. Transaction logs are enabled
> and have default settings.
>
> 
>   ${solr.autoCommit.maxTime:-1}
>   ${solr.autoCommit.maxDocs:5}
>   false
> 
>
> 
>   ${solr.autoSoftCommit.maxTime:-1}
> 
>
> Other Environmental Details:
> Xms=8g and Xmx=14g, solr client socketTimeout=7 minutes and
> zkClienttimeout=2 mins
> Our indexing operation triggers many "add" operations in parallel using
> RxJava (15 to 30 threads) each "add" operation is passed about 1000
> documents.
>
> Currently, when we run this indexing operation, we notice that after a
> while solr goes into long GC pauses (longer than our sockeTimeout of 7
> minutes) and we get SocketTimeoutExceptions.
>
> *What could be causing such long GC pauses?*
>
> *Does this commit strategy make sense ? If not, what is the recommended
> strategy that we can look into? *
>
> *Any help on this is much appreciated. Thanks.*
>
>

Re: Help with unsubscribe because automated didn't work

2021-07-27 Thread Jagpreet Mahajan

Can the same be done for my email address jagpreetkhan...@gmail.com as well 
please

Thanks

Sent from my iPhone

On Jul 27, 2021, at 20:36, Anshum Gupta  wrote:

Hi Katie,

I've unsubscribed those three addresses from the users@solr mailing list.
Please reach out if you continue to receive emails.

On Tue, Jul 27, 2021 at 12:17 PM kmccork  wrote:

> Hello,
> 
> I would like to unsubscribe kmcc...@u.washington.edu, kmcc...@uw.edu,
> kmcc...@uw.cse.edu from the listserv. Any email with "kmccok@".
> 
> Hello,
> 
> I am no longer able to send an email out from those emails, but I would
> like to leave the forwarding turned on, however the solr listserv is
> forwarding. I have been getting the solr emails since 2014 and I am no
> longer interested.
> 
> I tried unsubscribing from this email by emailing
> users-unsubscr...@solr.apache.org, however that did not work.
> 
> Thank you so much,
> Katie
>

Re: Help with unsubscribe because automated didn't work

2021-07-27 Thread Anshum Gupta

Done.

On Tue, Jul 27, 2021 at 1:36 PM Jagpreet Mahajan 
wrote:

> Can the same be done for my email address jagpreetkhan...@gmail.com as
> well please
>
> Thanks
>
> Sent from my iPhone
>
> On Jul 27, 2021, at 20:36, Anshum Gupta  wrote:
>
> Hi Katie,
>
> I've unsubscribed those three addresses from the users@solr mailing list.
> Please reach out if you continue to receive emails.
>
> On Tue, Jul 27, 2021 at 12:17 PM kmccork  wrote:
>
> > Hello,
> >
> > I would like to unsubscribe kmcc...@u.washington.edu, kmcc...@uw.edu,
> > kmcc...@uw.cse.edu from the listserv. Any email with "kmccok@".
> >
> > Hello,
> >
> > I am no longer able to send an email out from those emails, but I would
> > like to leave the forwarding turned on, however the solr listserv is
> > forwarding. I have been getting the solr emails since 2014 and I am no
> > longer interested.
> >
> > I tried unsubscribing from this email by emailing
> > users-unsubscr...@solr.apache.org, however that did not work.
> >
> > Thank you so much,
> > Katie
> >
>


-- 
Anshum Gupta

Re: MultipleAdditiveTreeModel

Configuring Solr JSON logs output to file using JsonLayout in log4j2.xml file

Re: How to integrate solr search with sharepoint

RE: [EXTERNAL] Re: How to integrate solr search with sharepoint

Help with unsubscribe because automated didn't work

Re: Help with unsubscribe because automated didn't work

Re: Help with unsubscribe because automated didn't work

Re: Commit strategy for Heavy Bulk Indexing into solr

Re: Help with unsubscribe because automated didn't work

Re: Help with unsubscribe because automated didn't work

10 matches

Site Navigation

Mail list logo

Footer information