Solr and Keycloak?

2021-11-03 Thread Eric Pugh
Has anyone gone through integrating Solr with Keycloak?   I’m trying to figure 
out how to map the Keycloak response back to what Solr needs to figure out the 
user.

Here is my security.json:
https://github.com/querqy/chorus/blob/75f153b699855e6e2862900bd4413764f7b6a01e/solr/security.json
 


And what I am getting back:

2021-11-02 21:03:27.805 INFO  (qtp332699949-17) [] 
o.a.s.s.RuleBasedAuthorizationPluginBase This resource is configured to have a 
permission {
  "name":"all",
  "role":"admin"}, The principal 
JWTPrincipalWithUserRoles{username='4a3d078b-418a-48fc-a26b-80d51f973084', 
token='*', claims={exp=1635887907, iat=1635887007, auth_time=1635887007, 
jti=cdab53d1-3dc2-4a7a-a98b-83b9b19257e6, 
iss=http://keycloak:9080/auth/realms/chorus, aud=account, 
sub=4a3d078b-418a-48fc-a26b-80d51f973084, typ=Bearer, azp=solr, 
nonce=tawciobxw3parxd0kyjw2p7r8sszymvdx, 
session_state=57f6aea7-f243-4fa3-a6e1-6e83926e65af, acr=1, 
allowed-origins=[http://localhost:8983], realm_access={roles=[offline_access, 
uma_authorization, default-roles-chorus]}, 
resource_access={account={roles=[manage-account, manage-account-links, 
view-profile]}}, scope=openid email profile, email_verified=false, name=bob 
dole, preferred_username=b...@dole.com, given_name=bob, family_name=dole, 
email=b...@dole.com}, roles=[profile, email]} does not have the right role 

___
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com  | 
My Free/Busy   
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 


This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.



First cross-join faceting after a commit is slow

2021-11-03 Thread Andy Lester
I’m on Solr 8.10.1 and having a performance problem with my cross-core join 
facets.

Here’s my basic query, with the interesting parts bolded (the joins in two 
facet.query fields)


curl "$URL" --silent --show-error \
-X POST \
-F "q=($word AND -parent_tracings:($word))" \
-F 'df=title_tracings_t' \
-F 'fl=flrid,nodeid' \
\
-F 'fq=((grouping:"1" OR grouping:"2" OR grouping:"3") OR solrtype:"N")' \
-F 'fq={!tag=grouping}((grouping:"1" OR grouping:"2") OR solrtype:"N")' \
-F 'fq={!tag=languagecode}(languagecode:"eng" OR solrtype:"N")' \
-F 'fq=(tw_searchable:"Y")' \
-F 'facet=on' \
-F "facet.query=(solrtype:T AND !{!join fromIndex=collmatchagg from=flrid 
to=flrid}id:$ID)" \
-F "facet.query=(solrtype:T AND {!join fromIndex=collmatchagg from=flrid 
to=flrid}id:$ID)" \
-F 'rows=0' \
-F 'wt=json' \
-F 'debugQuery=on’ \


The faceting works wonderfully, except for when I make a commit to the 
collmatchagg core, the “to” core in the facet.  The first query we make after a 
commit blocks for multiple seconds, depending on how many records are in the 
core.  With 10,000 records, which is standard load, that blocking is about 8-9 
seconds, which is too long.  If it’s only 1ten records in the core, the 
blocking is about 1 second.

Here are some things I’ve observed about these slow first queries after a 
commit:

* I’ve tried many different combinations of autowarmCount in the caching for 
collmatchagg, from 0 to do no warming, to 100 to pull in a minimal number of 
records.  Changing these settings does not seem to have any impact on the 
length of the block.

* If q=… in the query matches nothing, then the join is not slow.

* Making a direct query against collmatchagg is not slow after the commit.

* The only thing that seems to control the length of the block on the query is 
the number of records in the collmatchagg core.  It feels like Solr is reading 
the entire core into RAM each time.

Any suggestions or pointers would be helpful, because this is blocking us from 
releasing the feature. 

Thanks,
Andy

tomcat integration?

2021-11-03 Thread cmarkwood
Hello,
I've never used Solr but successfully ran Exercise 1 here: 
https://solr.apache.org/guide/8_10/solr-tutorial.html

I don't see how to integrate a Solr search with my existing website (.war 
deployed to Tomcat 8.5). It looks like there are major changes in Solr 8 and I 
don't see a .war file I can deploy to Tomcat.
Can I integragte Solr with Tomcat, or do I run Solr standalone and call into it 
with curl, etc.?
Any help appreciated! Thanks,Craig


Re: tomcat integration?

2021-11-03 Thread Furkan KAMACI
Hi,

Here is an explanation about it:
https://cwiki.apache.org/confluence/display/SOLR/WhyNoWar

Kind Regards,
Furkan KAMACI

On Wed, Nov 3, 2021 at 10:21 PM cmarkwood  wrote:

> Hello,
> I've never used Solr but successfully ran Exercise 1 here:
> https://solr.apache.org/guide/8_10/solr-tutorial.html
>
> I don't see how to integrate a Solr search with my existing website (.war
> deployed to Tomcat 8.5). It looks like there are major changes in Solr 8
> and I don't see a .war file I can deploy to Tomcat.
> Can I integragte Solr with Tomcat, or do I run Solr standalone and call
> into it with curl, etc.?
> Any help appreciated! Thanks,Craig
>


Re: Solr and Keycloak?

2021-11-03 Thread Jan Høydahl
Try to add "rolesClaim" to JWTAuthPlugin to tell it which JWT claim to use a 
role. 
E.g. if you pick the claim "roles", then your user would have the 
roles=[profile, email]. So try to map the role "email" to the "all" permission, 
and your requests should be allowed.

Jan

> 3. nov. 2021 kl. 13:26 skrev Eric Pugh :
> 
> Has anyone gone through integrating Solr with Keycloak?   I’m trying to 
> figure out how to map the Keycloak response back to what Solr needs to figure 
> out the user.
> 
> Here is my security.json:
> https://github.com/querqy/chorus/blob/75f153b699855e6e2862900bd4413764f7b6a01e/solr/security.json
>  
> 
> 
> And what I am getting back:
> 
> 2021-11-02 21:03:27.805 INFO  (qtp332699949-17) [] 
> o.a.s.s.RuleBasedAuthorizationPluginBase This resource is configured to have 
> a permission {
>  "name":"all",
>  "role":"admin"}, The principal 
> JWTPrincipalWithUserRoles{username='4a3d078b-418a-48fc-a26b-80d51f973084', 
> token='*', claims={exp=1635887907, iat=1635887007, auth_time=1635887007, 
> jti=cdab53d1-3dc2-4a7a-a98b-83b9b19257e6, 
> iss=http://keycloak:9080/auth/realms/chorus, aud=account, 
> sub=4a3d078b-418a-48fc-a26b-80d51f973084, typ=Bearer, azp=solr, 
> nonce=tawciobxw3parxd0kyjw2p7r8sszymvdx, 
> session_state=57f6aea7-f243-4fa3-a6e1-6e83926e65af, acr=1, 
> allowed-origins=[http://localhost:8983], realm_access={roles=[offline_access, 
> uma_authorization, default-roles-chorus]}, 
> resource_access={account={roles=[manage-account, manage-account-links, 
> view-profile]}}, scope=openid email profile, email_verified=false, name=bob 
> dole, preferred_username=b...@dole.com, given_name=bob, family_name=dole, 
> email=b...@dole.com}, roles=[profile, email]} does not have the right role 
> 
> ___
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
> http://www.opensourceconnections.com  
> | My Free/Busy   
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
> 
>   
> This e-mail and all contents, including attachments, is considered to be 
> Company Confidential unless explicitly stated otherwise, regardless of 
> whether attachments are marked as such.
> 



Is there a way to set max segment count for builtin merge policy?

2021-11-03 Thread Michael Conrad

Is there a way to set max segment count for builtin merge policy?

I'm having a serious issue where I'm trying to reindex 75 million 
documents and the segment count skyrockets with associated significant 
drop in performance. To the point we start getting lots of timeouts.


Is there a way to set the merge policy to try and keep the total segment 
count to around 16 or so? (This seems to be close to the max the hosts 
can manage without having serious performance issues.)


Solr 7.7.3.

Help appreciated.


Re: tomcat integration?

2021-11-03 Thread cmarkwood
 I see, thanks. So I need to write a query client, and (JSON) parser for the 
response.


  On Wednesday, November 3, 2021, 03:23:44 PM EDT, Furkan KAMACI 
 wrote: 

On Wednesday, November 3, 2021, 03:23:44 PM EDT, Furkan KAMACI 
 wrote:  
 
 Hi,
Here is an explanation about it: 
https://cwiki.apache.org/confluence/display/SOLR/WhyNoWar
Kind Regards,Furkan KAMACI
On Wed, Nov 3, 2021 at 10:21 PM cmarkwood  wrote:

Hello,
I've never used Solr but successfully ran Exercise 1 here: 
https://solr.apache.org/guide/8_10/solr-tutorial.html

I don't see how to integrate a Solr search with my existing website (.war 
deployed to Tomcat 8.5). It looks like there are major changes in Solr 8 and I 
don't see a .war file I can deploy to Tomcat.
Can I integragte Solr with Tomcat, or do I run Solr standalone and call into it 
with curl, etc.?
Any help appreciated! Thanks,Craig

  

Re: tomcat integration?

2021-11-03 Thread Michael Conrad

I recommend using the SolrJ client libs for the client war.

On 11/3/21 15:50, cmarkwood wrote:

  I see, thanks. So I need to write a query client, and (JSON) parser for the 
response.


   On Wednesday, November 3, 2021, 03:23:44 PM EDT, Furkan 
KAMACI  wrote:

 On Wednesday, November 3, 2021, 03:23:44 PM EDT, Furkan 
KAMACI  wrote:
  
  Hi,

Here is an explanation about 
it:https://cwiki.apache.org/confluence/display/SOLR/WhyNoWar
Kind Regards,Furkan KAMACI
On Wed, Nov 3, 2021 at 10:21 PM cmarkwood  wrote:

Hello,
I've never used Solr but successfully ran Exercise 1 
here:https://solr.apache.org/guide/8_10/solr-tutorial.html

I don't see how to integrate a Solr search with my existing website (.war 
deployed to Tomcat 8.5). It looks like there are major changes in Solr 8 and I 
don't see a .war file I can deploy to Tomcat.
Can I integragte Solr with Tomcat, or do I run Solr standalone and call into it 
with curl, etc.?
Any help appreciated! Thanks,Craig

   


Re: Solr and Keycloak?

2021-11-03 Thread Eric Pugh
It’s all working!   I still need to try out the “rolesClaim” below.   

If you want to see what I did, check out the PR 
https://github.com/querqy/chorus/pull/63 


Thanks Jan and Tim for the help!

Eric




> On Nov 3, 2021, at 3:43 PM, Jan Høydahl  wrote:
> 
> Try to add "rolesClaim" to JWTAuthPlugin to tell it which JWT claim to use a 
> role. 
> E.g. if you pick the claim "roles", then your user would have the 
> roles=[profile, email]. So try to map the role "email" to the "all" 
> permission, and your requests should be allowed.
> 
> Jan
> 
>> 3. nov. 2021 kl. 13:26 skrev Eric Pugh :
>> 
>> Has anyone gone through integrating Solr with Keycloak?   I’m trying to 
>> figure out how to map the Keycloak response back to what Solr needs to 
>> figure out the user.
>> 
>> Here is my security.json:
>> https://github.com/querqy/chorus/blob/75f153b699855e6e2862900bd4413764f7b6a01e/solr/security.json
>>  
>> 
>> 
>> And what I am getting back:
>> 
>> 2021-11-02 21:03:27.805 INFO  (qtp332699949-17) [] 
>> o.a.s.s.RuleBasedAuthorizationPluginBase This resource is configured to have 
>> a permission {
>> "name":"all",
>> "role":"admin"}, The principal 
>> JWTPrincipalWithUserRoles{username='4a3d078b-418a-48fc-a26b-80d51f973084', 
>> token='*', claims={exp=1635887907, iat=1635887007, auth_time=1635887007, 
>> jti=cdab53d1-3dc2-4a7a-a98b-83b9b19257e6, 
>> iss=http://keycloak:9080/auth/realms/chorus, aud=account, 
>> sub=4a3d078b-418a-48fc-a26b-80d51f973084, typ=Bearer, azp=solr, 
>> nonce=tawciobxw3parxd0kyjw2p7r8sszymvdx, 
>> session_state=57f6aea7-f243-4fa3-a6e1-6e83926e65af, acr=1, 
>> allowed-origins=[http://localhost:8983], 
>> realm_access={roles=[offline_access, uma_authorization, 
>> default-roles-chorus]}, resource_access={account={roles=[manage-account, 
>> manage-account-links, view-profile]}}, scope=openid email profile, 
>> email_verified=false, name=bob dole, preferred_username=b...@dole.com, 
>> given_name=bob, family_name=dole, email=b...@dole.com}, roles=[profile, 
>> email]} does not have the right role 
>> 
>> ___
>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
>> http://www.opensourceconnections.com  
>> | My Free/Busy   
>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
>> 
>>  
>> This e-mail and all contents, including attachments, is considered to be 
>> Company Confidential unless explicitly stated otherwise, regardless of 
>> whether attachments are marked as such.
>> 
> 

___
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com  | 
My Free/Busy   
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 


This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.



Re: Child doc question

2021-11-03 Thread Stephen Lewis Bianamara
Hi Folks,

Going to give this one more shot. Is there anyone out there who understands
child docs well enough to answer this basic question?

Thanks,
Stephen

On Fri, Oct 29, 2021 at 2:56 PM Stephen Lewis Bianamara <
stephen.bianam...@gmail.com> wrote:

> Hi SOLR Community,
>
> Still hoping for help on this. Is there anyone out there who understands
> child docs well enough to answer the question of "yes this should work" or
> "no it cannot work"?
>
> Thanks,
> Stephen
>
> On Wed, Oct 27, 2021 at 8:51 AM Stephen Lewis Bianamara <
> stephen.bianam...@gmail.com> wrote:
>
>> Hi Folks,
>>
>> Wanted to follow up here. Can someone help me just answer whether what
>> I'm hoping for is feasible?
>>
>> (a) The desired outcome is supported with the right model/queries
>> (b) The desired outcome is not supported; maybe it could be in the future
>> (c) The desired outcome is fundamentally unsupportable for the
>> foreseeable future
>>
>> To summarize, the desired behavior is:
>>
>> - Query which can AND across child docs (i.e, return parents
>> whos children match an AND query even if tokens are spread across children)
>> - Query whose parents are returned based on relevance of the children
>>
>> In my example, obviously there is parent data being queried ("post_en"),
>> but the parent data could easily be made child data if need be.
>>
>> Thanks!
>> Stephen
>>
>> On Mon, Oct 25, 2021 at 6:54 AM Stephen Lewis Bianamara <
>> stephen.bianam...@gmail.com> wrote:
>>
>>> Hi SOLR Community,
>>>
>>> I'm experimenting with solr 8.10 and trying to get a query pattern with
>>> child docs to work. An example of a nested document structure I'd like to
>>> search is below. In this example, there will only be two levels, child of
>>> type:post and /comments children.
>>> {
>>> "id": "post1",
>>> "type": "post",
>>> "post_en": "I put lemon on my apple slices to keep them fresh",
>>> "comments": [
>>> {
>>> "id": "comment1",
>>> "type": "comment",
>>> "comment_en": "Lime works too"
>>> },
>>> {
>>> "id": "comment2",
>>> "type": "comment",
>>> "comment_en": "Does it work for pears?"
>>> }
>>> ]
>>> }
>>>
>>> What I'd like is to be able to do keyword search for /lemon apple/ and
>>> only return the parent; /lemon lime/ and return the parent and comment1;
>>> /lemon pear/ and return the parent and comment2; /lime pear/ and return the
>>> parent, comment1, and comment2. And /lime gum/ should return nothing (as if
>>> it were an AND query). Additionally, this should all be done with relevance.
>>>
>>> I've tried a few combinations of nested docs from this documentation
>>> ,
>>> but am having trouble getting this to work. I wonder if I'm asking more
>>> from block join/child doc transformer than it currently supports, or
>>> perhaps I'm just missing something. Can someone familiar with nesting
>>> documents help me out? I've included my schema below as well.
>>>
>>> Thanks!
>>> Stephen
>>>
>>> 
>>> 
>>>   
>>> 
>>> >> omitNorms="true" />
>>> 
>>> 
>>> 
>>> 
>>> >> docValues="false" />
>>> 
>>> >> stored="true" />
>>>   
>>>   id
>>>   
>>> 
>>> 
>>> >> sortMissingLast="true" />
>>> >> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>>>   
>>> 
>>> 
>>> 
>>>   
>>>   
>>> 
>>> 
>>> 
>>>   
>>> 
>>>   
>>> 
>>>
>>


Re: Is there a way to set max segment count for builtin merge policy?

2021-11-03 Thread Shawn Heisey

On 11/3/2021 1:44 PM, Michael Conrad wrote:

Is there a way to set max segment count for builtin merge policy?

I'm having a serious issue where I'm trying to reindex 75 million 
documents and the segment count skyrockets with associated significant 
drop in performance. To the point we start getting lots of timeouts.


Is there a way to set the merge policy to try and keep the total segment 
count to around 16 or so? (This seems to be close to the max the hosts 
can manage without having serious performance issues.)


Solr 7.7.3.


The way to reduce total segment count is to reduce the thresholds that 
used to be controlled by mergeFactor.  I don't know of a way to 
explicitly set the max total count, but the per-tier count will affect 
the total count.  There will be at least three tiers of merging on most 
Solr installs, so the max total segment count will be at least three 
times the per-tier setting.


This config represents the defaults for Solr's merging policy:


  10
  10


On some Solr servers that I used to manage, those numbers were set to 
35.  I regularly saw total segment counts larger than 100.  That did not 
affect performance in a significant way.


If you are seeing significant performance problems it is more likely one 
of two problems that have nothing to do with the segment count:


1) Your max heap size is not quite big enough and needs to be increased. 
 This can lead to severe GC pauses because Java will spend more time 
doing GC than running the application.


2) Your index is so big that the amount of free memory on the server 
cannot effectively cache it.  The fix for that is to add physical 
memory, so that more unallocated memory is available to the operating 
system.  Solr is absolutely reliant on effective index caching for 
performance.


More of a side note:  One problem that you might be having with indexing 
millions of documents is that the indexing thread can get paused when 
merging becomes heavy.  This will be even more likely to happen if you 
reduce the numbers in the config that I included above.  The fix for 
that is to fiddle with the mergeScheduler config.



  6
  1


Some notes:  Go with a maxMergeCount that's at least 6.  If your indexes 
are on spinning hard disks, leave maxThreadCount at 1.  If the indexes 
are on SSD, you can increase the thread count, but don't go too wild. 
Probably 3 or 4 max, and I would be more likely to choose 2.  I have 
never had indexes on SSD, so I do not know how many threads are too many.


Thanks,
Shawn