Re: NullPointerException in UpdateLog.applyOlderUpdates under solr 8&9 involving partial updates and high update load

2024-01-18 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hi Calvin - thanks for opening the 
https://issues.apache.org/jira/browse/SOLR-17120 issue for this!

From: users@solr.apache.org At: 01/12/24 21:00:34 UTCTo:  users@solr.apache.org
Subject: Re: NullPointerException in UpdateLog.applyOlderUpdates under solr 8&9 
involving partial updates and high update load

Looking into the code referenced in the stack trace in my previous email,
it was the following code in UpdateLog.java:

  /** Add all fields from olderDoc into newerDoc if not already present in
newerDoc */
  private void applyOlderUpdates(
  SolrDocumentBase newerDoc, SolrInputDocument olderDoc,
Set mergeFields) {
for (String fieldName : olderDoc.getFieldNames()) {
  // if the newerDoc has this field, then this field from olderDoc can
be ignored
  if (!newerDoc.containsKey(fieldName)
  && (mergeFields == null || mergeFields.contains(fieldName))) {
for (Object val : olderDoc.getFieldValues(fieldName)) {
  newerDoc.addField(fieldName, val);
}
  }
}
  }

The `NullPointerException` is being thrown by the inner `for` statement
because the return value of `olderDoc.getFieldValues(fieldName)` is null. I
added some print statements and verified that when I saw the error it was
due to a `NullPointerException` for a field named `camera_unit` and there
had been a partial update of the same document within the last second that
included `"camera_unit": {"set": null}`. There was a lot of other activity
in between those two though, and I wasn't able to reproduce it with simple
documents that made the same updates, so there's more to it than just a
previous partial update of the same doc that sets a field to `null`.

I saw at least one other place in the (non-test) code where it's assumed
that the return value of `getFieldValues` can't be `null` and will throw an
exception if it is. That was the following code in
`DocumentAnalysisRequestHandler.java`:

Collection fieldValues = document.getFieldValues(name);
NamedList> indexTokens = new
SimpleOrderedMap<>();
for (Object fieldValue : fieldValues) {
  indexTokens.add(
tring.valueOf(fieldValue), analyzeValue(fieldValue.toString(),
analysisContext));
}

but in other places where I looked at how that method was used, there was
handling of the `null` return value case.

I'm not sure what the correct logic is in the `UpdateLog.java` case, but I
updated it to the following to test:

  /** Add all fields from olderDoc into newerDoc if not already present in
newerDoc */
  private void applyOlderUpdates(
  SolrDocumentBase newerDoc, SolrInputDocument olderDoc,
Set mergeFields) {
for (String fieldName : olderDoc.getFieldNames()) {
  // if the newerDoc has this field, then this field from olderDoc can
be ignored
  if (!newerDoc.containsKey(fieldName)
  && (mergeFields == null || mergeFields.contains(fieldName))) {
Collection values = olderDoc.getFieldValues(fieldName);
if (values == null) {
newerDoc.addField(fieldName, null);
} else {
for (Object val : values) {
  newerDoc.addField(fieldName, val);
}
}
  }
}
  }

Does that seem like the right decision, to set the field to `null` in the
`newerDoc` in this case?

Thanks for your time,
Calvin




SOLR data on ECS problem with write lock files

2024-01-18 Thread Darren Kukulka
Hi Everybody!

Has anybody had issues with write.lock files on AWS ECS SOLR instances
where data is stored on EFS?  i.e. if the SOLR ECS task restarts SOLR
thinks another process is using the write.lock file to make updates.

But the truth is that the stopped ECS task has not been terminated before
the new one starts up automatically

Our SOLR ECS tasks use individual solr-data locations on EFS, so they are
not sharing data, which makes this problem even more frustrating!

I have looked so far at solrconfig.xml changes like the locktype,
unlockOnStartup and writeLockTimeout directives, but I'm not sure any of
these would help with our scenario.

Unfortunately, we are stuck on SOLR 4.10.2, because going to 5 and above
means we would have to make code changes to our product that uses SOLR as
the dataimport handlers change in version 5.

I'm also wondering if this could be an ECS issue, rather than SOLR itself.
Perhaps the version of Docker engine we use to build the SOLR images run in
ECS (20.10.13) does not play well in ECS land when a task restarts? I have
not found yet if it is possible to modify the ECS lifecycle behaviour for a
specific ECS cluster

Any suggestions or pointers would be greatly appreciated!

Cheers,
Daz


Re: SOLR data on ECS problem with write lock files

2024-01-18 Thread uyil...@vivaldi.net.INVALID
Hello Darren,

I had a very similar problem when running Solr on EKS kubernetes cluster. The 
solution I found was to add a pre_stop shutdown hook to the kubernetes 
deployment, which runs the command "/opt/solr/bin/solr stop -k solrrocks -p 
8983" to gracefully stop Solr before the pod is killed. I also added a 180 
second grace period via "termination_grace_period_seconds". The downside is it 
takes at least 3 minutes to shut down the pod now.

That way lock file gets cleared before Solr is restarted. I don't know if the 
same approach can be used in ECS though.

-ufuk yilmaz

From: Darren Kukulka 
Sent: Thursday, January 18, 2024 6:59 PM
To: users@solr.apache.org 
Subject: SOLR data on ECS problem with write lock files

Hi Everybody!

Has anybody had issues with write.lock files on AWS ECS SOLR instances
where data is stored on EFS?  i.e. if the SOLR ECS task restarts SOLR
thinks another process is using the write.lock file to make updates.

But the truth is that the stopped ECS task has not been terminated before
the new one starts up automatically

Our SOLR ECS tasks use individual solr-data locations on EFS, so they are
not sharing data, which makes this problem even more frustrating!

I have looked so far at solrconfig.xml changes like the locktype,
unlockOnStartup and writeLockTimeout directives, but I'm not sure any of
these would help with our scenario.

Unfortunately, we are stuck on SOLR 4.10.2, because going to 5 and above
means we would have to make code changes to our product that uses SOLR as
the dataimport handlers change in version 5.

I'm also wondering if this could be an ECS issue, rather than SOLR itself.
Perhaps the version of Docker engine we use to build the SOLR images run in
ECS (20.10.13) does not play well in ECS land when a task restarts? I have
not found yet if it is possible to modify the ECS lifecycle behaviour for a
specific ECS cluster

Any suggestions or pointers would be greatly appreciated!

Cheers,
Daz


[ANNOUNCE] Apache Solr 9.4.1 released

2024-01-18 Thread David Smiley
The Solr PMC is pleased to announce the release of Apache Solr 9.4.1.

Solr is the popular, blazing fast, open source NoSQL search platform from
the Apache Solr project. Its major features include powerful full-text
search, hit highlighting, faceted search, dynamic clustering, database
integration, rich document handling, and geospatial search. Solr is highly
scalable, providing fault tolerant distributed search and indexing, and
powers the search and navigation features of many of the world's largest
internet sites.

Solr 9.4.1 is available for immediate download at:

  

### Solr 9.4.1 Release Highlights:

A big regression to the JSON Query API in 9.4 is primarily what prompted
this release.  Additionally, some security oriented improvements/fixes have
been added, and many transitive dependencies have been upgraded.

Please refer to the Upgrade Notes in the Solr Ref Guide for information on
upgrading from previous Solr versions:

  <
https://solr.apache.org/guide/solr/9_4/upgrade-notes/solr-upgrade-notes.html
>

Please read CHANGES.txt for a full list of bugfixes:

  


Re: SOLR data on ECS problem with write lock files

2024-01-18 Thread uyil...@vivaldi.net.INVALID
I forgot to add that I also changed the deployment strategy to "Recreate" 
rather than the default strategy "RollingUpdate", so the kubernetes cluster 
shuts down old pods before creating the new ones, to avoid the lock file 
problem. It might not suit you though if you need very high availability.

-ufuk yilmaz

From: uyil...@vivaldi.net.INVALID 
Sent: Thursday, January 18, 2024 7:08 PM
To: users@solr.apache.org 
Subject: Re: SOLR data on ECS problem with write lock files

Hello Darren,

I had a very similar problem when running Solr on EKS kubernetes cluster. The 
solution I found was to add a pre_stop shutdown hook to the kubernetes 
deployment, which runs the command "/opt/solr/bin/solr stop -k solrrocks -p 
8983" to gracefully stop Solr before the pod is killed. I also added a 180 
second grace period via "termination_grace_period_seconds". The downside is it 
takes at least 3 minutes to shut down the pod now.

That way lock file gets cleared before Solr is restarted. I don't know if the 
same approach can be used in ECS though.

-ufuk yilmaz

From: Darren Kukulka 
Sent: Thursday, January 18, 2024 6:59 PM
To: users@solr.apache.org 
Subject: SOLR data on ECS problem with write lock files

Hi Everybody!

Has anybody had issues with write.lock files on AWS ECS SOLR instances
where data is stored on EFS?  i.e. if the SOLR ECS task restarts SOLR
thinks another process is using the write.lock file to make updates.

But the truth is that the stopped ECS task has not been terminated before
the new one starts up automatically

Our SOLR ECS tasks use individual solr-data locations on EFS, so they are
not sharing data, which makes this problem even more frustrating!

I have looked so far at solrconfig.xml changes like the locktype,
unlockOnStartup and writeLockTimeout directives, but I'm not sure any of
these would help with our scenario.

Unfortunately, we are stuck on SOLR 4.10.2, because going to 5 and above
means we would have to make code changes to our product that uses SOLR as
the dataimport handlers change in version 5.

I'm also wondering if this could be an ECS issue, rather than SOLR itself.
Perhaps the version of Docker engine we use to build the SOLR images run in
ECS (20.10.13) does not play well in ECS land when a task restarts? I have
not found yet if it is possible to modify the ECS lifecycle behaviour for a
specific ECS cluster

Any suggestions or pointers would be greatly appreciated!

Cheers,
Daz