Re: The logging of Solr queries

2022-09-23 Thread Anjali Maurya
We have a way to distinguish which query is a top level query.
We are using an 8-node cluster of solr cloud, when any request is hit then
min 9 and max 17 queries are being logged. It is consuming huge disk space.
When we used to use solr standalone , log files used to be generated of
~30GB per day and now with solr cloud , the log file's size has increased
to ~162GB.

On Thu, Sep 22, 2022 at 10:31 PM Shawn Heisey 
wrote:

> On 9/22/22 09:17, Anjali Maurya wrote:
> > Thanks Shawn for the suggestion.
> > Can we make any change for the logging of only top level query not the
> > shard level?
>
> I just tried doing a query on my tiny 9.1.0-SNAPSHOT SolrCloud install
> that consists of one node, one shard, and one core, with ZK embedded.
> The query was to the collection, not the core.  I expected to see two
> queries logged ... one for the collection and one for the core.  I only
> got one query logged, on the core.
>
> I know that when doing a sharded query that does NOT involve SolrCloud,
> that both queries are logged.  I am pretty sure that when both queries
> are logged, it is not difficult to figure out which log entry is for a
> top level query.
>
> Thanks,
> Shawn
>
>


Re: The logging of Solr queries

2022-09-23 Thread Anjali Maurya
So, we are searching for any solution to log only top level queries. Is
there any way at logging level to log only top level query?


On Fri, Sep 23, 2022 at 2:09 PM Anjali Maurya 
wrote:

> We have a way to distinguish which query is a top level query.
> We are using an 8-node cluster of solr cloud, when any request is hit then
> min 9 and max 17 queries are being logged. It is consuming huge disk space.
> When we used to use solr standalone , log files used to be generated of
> ~30GB per day and now with solr cloud , the log file's size has increased
> to ~162GB.
>
> On Thu, Sep 22, 2022 at 10:31 PM Shawn Heisey 
> wrote:
>
>> On 9/22/22 09:17, Anjali Maurya wrote:
>> > Thanks Shawn for the suggestion.
>> > Can we make any change for the logging of only top level query not the
>> > shard level?
>>
>> I just tried doing a query on my tiny 9.1.0-SNAPSHOT SolrCloud install
>> that consists of one node, one shard, and one core, with ZK embedded.
>> The query was to the collection, not the core.  I expected to see two
>> queries logged ... one for the collection and one for the core.  I only
>> got one query logged, on the core.
>>
>> I know that when doing a sharded query that does NOT involve SolrCloud,
>> that both queries are logged.  I am pretty sure that when both queries
>> are logged, it is not difficult to figure out which log entry is for a
>> top level query.
>>
>> Thanks,
>> Shawn
>>
>>


Re: The logging of Solr queries

2022-09-23 Thread Markus Jelsma
No, Solr can't do that unless you patch it to only log top-level queries.

As said, sharded queries are easy to filter, look for isShard=true
parameters to distinguish them. They usually also contain more parameters
than the original top-level query.



Op vr 23 sep. 2022 om 10:42 schreef Anjali Maurya
:

> So, we are searching for any solution to log only top level queries. Is
> there any way at logging level to log only top level query?
>
>
> On Fri, Sep 23, 2022 at 2:09 PM Anjali Maurya <
> anjali.maury...@indiamart.com>
> wrote:
>
> > We have a way to distinguish which query is a top level query.
> > We are using an 8-node cluster of solr cloud, when any request is hit
> then
> > min 9 and max 17 queries are being logged. It is consuming huge disk
> space.
> > When we used to use solr standalone , log files used to be generated of
> > ~30GB per day and now with solr cloud , the log file's size has increased
> > to ~162GB.
> >
> > On Thu, Sep 22, 2022 at 10:31 PM Shawn Heisey
> 
> > wrote:
> >
> >> On 9/22/22 09:17, Anjali Maurya wrote:
> >> > Thanks Shawn for the suggestion.
> >> > Can we make any change for the logging of only top level query not the
> >> > shard level?
> >>
> >> I just tried doing a query on my tiny 9.1.0-SNAPSHOT SolrCloud install
> >> that consists of one node, one shard, and one core, with ZK embedded.
> >> The query was to the collection, not the core.  I expected to see two
> >> queries logged ... one for the collection and one for the core.  I only
> >> got one query logged, on the core.
> >>
> >> I know that when doing a sharded query that does NOT involve SolrCloud,
> >> that both queries are logged.  I am pretty sure that when both queries
> >> are logged, it is not difficult to figure out which log entry is for a
> >> top level query.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>


Atomic indexing as default indexing

2022-09-23 Thread gnandre
Is there a way to make atomic indexing default?

Say, even if some clients send non-atomic indexing requests, it should get
converted to atomic indexing requests on Solr end, is that possible?

I am asking because we usually run into the following issue:
1. Client A is the major contributor of almost all the fields of  a Solr
document. This is non-atomic indexing.
2. Client B contributes some additional fields to the same document and
does this with atomic indexing.
3. If Client A indexes again, the fields populated by Client B are wiped
out.

If we make all indexing atomic indexing on Solr end then we won't run into
this problem (except in a rare case where Client A deletes the document
then indexes it back, this is fine and we can deal with it because it is
rare)


Re: Atomic indexing as default indexing

2022-09-23 Thread Shawn Heisey

On 9/23/22 09:51, gnandre wrote:

Is there a way to make atomic indexing default?

Say, even if some clients send non-atomic indexing requests, it should get
converted to atomic indexing requests on Solr end, is that possible?

I am asking because we usually run into the following issue:
1. Client A is the major contributor of almost all the fields of  a Solr
document. This is non-atomic indexing.
2. Client B contributes some additional fields to the same document and
does this with atomic indexing.
3. If Client A indexes again, the fields populated by Client B are wiped
out.

If we make all indexing atomic indexing on Solr end then we won't run into
this problem (except in a rare case where Client A deletes the document
then indexes it back, this is fine and we can deal with it because it is
rare)


We would be surprising a LOT of users if we did that.  Right now they 
can simply reindex a document to delete fields that were indexed before 
but shouldn't be there.  If we made atomic indexing the default, we 
would definitely get complaints about the fact that these fields did not 
get removed.


And what about users that have a schema that is not appropriate for 
atomic indexing?  Quite a lot of users, me included, have fields that 
are indexed but not stored and have no docValues.  I can guarantee you 
that if we made atomic indexing the default, that users would assume 
that all their existing fields will be preserved, and that might not be 
the case.


It sounds like what you should do is have client A be aware that a 
document might have changes done after they indexed it, and they should 
do a check to see whether a doc already exists, and if it does, change 
their indexing to atomic.


It is extremely problematic to have one index be built by two different 
entities in this way.  Maybe instead you should have separate indexes 
for each client and use Solr's join capability to combine the info from 
both indexes into one result.  Just be aware that Solr's join capability 
will NOT do everything a relational database expert might expect.


Thanks,
Shawn



Re: Atomic indexing as default indexing

2022-09-23 Thread Thomas Corthals
Op vr 23 sep. 2022 om 18:17 schreef Shawn Heisey
:

> On 9/23/22 09:51, gnandre wrote:
> > Is there a way to make atomic indexing default?
> >
> > Say, even if some clients send non-atomic indexing requests, it should
> get
> > converted to atomic indexing requests on Solr end, is that possible?
> >
> > I am asking because we usually run into the following issue:
> > 1. Client A is the major contributor of almost all the fields of  a Solr
> > document. This is non-atomic indexing.
> > 2. Client B contributes some additional fields to the same document and
> > does this with atomic indexing.
> > 3. If Client A indexes again, the fields populated by Client B are wiped
> > out.
> >
> > If we make all indexing atomic indexing on Solr end then we won't run
> into
> > this problem (except in a rare case where Client A deletes the document
> > then indexes it back, this is fine and we can deal with it because it is
> > rare)
>
> We would be surprising a LOT of users if we did that.  Right now they
> can simply reindex a document to delete fields that were indexed before
> but shouldn't be there.  If we made atomic indexing the default, we
> would definitely get complaints about the fact that these fields did not
> get removed.
>
> And what about users that have a schema that is not appropriate for
> atomic indexing?  Quite a lot of users, me included, have fields that
> are indexed but not stored and have no docValues.  I can guarantee you
> that if we made atomic indexing the default, that users would assume
> that all their existing fields will be preserved, and that might not be
> the case.
>
> It sounds like what you should do is have client A be aware that a
> document might have changes done after they indexed it, and they should
> do a check to see whether a doc already exists, and if it does, change
> their indexing to atomic.
>
> It is extremely problematic to have one index be built by two different
> entities in this way.  Maybe instead you should have separate indexes
> for each client and use Solr's join capability to combine the info from
> both indexes into one result.  Just be aware that Solr's join capability
> will NOT do everything a relational database expert might expect.
>
> Thanks,
> Shawn
>
>
Client A can use Optimistic Concurrency

to check if a document has been updated by client B.

Use the /get handler from client A to get the _version_ after indexing and
store it locally. Use that _version_ for further updates from client A to
check if the document was changed by client B.

Thomas


Re: Atomic indexing as default indexing

2022-09-23 Thread L H
Hello dear colleagues,

I was using Embedded solr on JAVA 8 for caching some data - however, I am
required to update JAVA to version 17.

I can see that core container is not able to access home directory.

Below is the exception I get; could someone please help me to know to fix
the issue?



  exception
==:


Caused by: org.apache.solr.common.SolrException: JVM Error creating core
[invoiceconfig]: null
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:856)
Caused by: org.apache.solr.common.SolrException: JVM Error creating core
[invoiceconfig]: null

at org.apache.solr.core.CoreContainer.lambda$load$0(CoreContainer.java:494)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:889)
Caused by: java.lang.ExceptionInInitializerError
Caused by: java.lang.ExceptionInInitializerError

at java.base/java.lang.J9VMInternals.ensureError(J9VMInternals.java:185)
at
java.base/java.lang.J9VMInternals.recordInitializationFailure(J9VMInternals.java:174)
at
org.apache.solr.core.MMapDirectoryFactory.init(MMapDirectoryFactory.java:51)
at org.apache.solr.core.SolrCore.initDirectoryFactory(SolrCore.java:528)
at org.apache.solr.core.SolrCore.(SolrCore.java:724)
at org.apache.solr.core.SolrCore.(SolrCore.java:688)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:838)
... 6 more
Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make
public jdk.internal.ref.Cleaner java.nio.DirectByteBuffer.cleaner()
accessible: module java.base does not "opens java.nio" to unnamed module
@f0b0647f
Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make
public jdk.internal.ref.Cleaner java.nio.DirectByteBuffer.cleaner()
accessible: module java.base does not "opens java.nio" to unnamed module
@f0b0647f

at
java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:354)
at
java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297)
at java.base/java.lang.reflect.Method.checkCanSetAccessible(Method.java:199)
at java.base/java.lang.reflect.Method.setAccessible(Method.java:193)
at
org.apache.lucene.store.MMapDirectory.unmapHackImpl(MMapDirectory.java:345)
at
java.base/java.security.AccessController.doPrivileged(AccessController.java:692)
at org.apache.lucene.store.MMapDirectory.(MMapDirectory.java:326)
... 11 more


Exception with embedded Solr (was: Re: Atomic indexing as default indexing)

2022-09-23 Thread Shawn Heisey

On 9/23/22 12:07, L H wrote:

Hello dear colleagues,

I was using Embedded solr on JAVA 8 for caching some data - however, I am
required to update JAVA to version 17.

I can see that core container is not able to access home directory.

Below is the exception I get; could someone please help me to know to fix
the issue?


I have removed the email headers that would bury this message inside a 
thread that has nothing to do with it, which is where I found your 
message.  You didn't even change the subject.  Please do not reply to an 
existing message unless that message is directly related to what you are 
sending.  Start a brand new message with a new subject for a new topic.


https://www.dropbox.com/s/3avr9o03gpx7rko/solr-user-buried-thread-2022-09.png?dl=0

What version of Solr/SolrJ are you using?  I suspect that you're using a 
version that was not qualified with any Java version later than 8.  You 
might need to upgrade Solr to have it work right with Java 17.  In 
recent years Java has gotten a lot better at not introducing breaking 
changes, but you have just jumped NINE major versions.  Any software is 
likely to change in extreme ways across that many major versions.


The sweet spot for Solr 7 or 8 seems to be Java 11, but these Solr 
versions only require Java 8.  Solr 9.x *requires* Java 11, and it is 
the only version I personally would run with anything newer than Java 
11.  For Solr 6, I would not run anything newer than Java 8.  Solr 7.0 
was the first version that was qualified to run in Java 9, and I recall 
code changes being required to achieve that.


Thanks,
Shawn



Re: Exception with embedded Solr (was: Re: Atomic indexing as default indexing)

2022-09-23 Thread Shawn Heisey

On 9/23/22 15:08, Shawn Heisey wrote:
have removed the email headers that would bury this message inside a 
thread that has nothing to do with it


I *thought*  had removed those headers.  But the message got buried anyway.

Shawn