Shawn and Alexandre,
On 5/17/23 13:12, Shawn Heisey wrote:
On 5/17/23 10:01, Christopher Schultz wrote:
The "all" field contains copies of the other fields values for each
record I've studied except for "identifier".
I have re-indexed the whole document set and the "all" field still
does not contain the values I can see (in the search results) for
"identifier".
Can you share your schema? If you need to redact sensitive info from
it, please do it in a way that ensures we can distinguish one bit of
redacted data from other redacted bits.
Part of my intent in asking is to find out the answer to the question
that Alexandre asked. It will also provide data to determine what
questions I will ask next.
All of my fields are stored (this is how I knew that other field values
were in fact available in the "all" field).
I think this might be a false-alarm. As careful as I tried to be to make
sure to described the situation as accurately and completely as
possible, I cannot replicate it. I inserted a new document into the
index (via my own software) and the field values were copied as expected.
I wonder if my problem was a timing issue between inserting the document
and the server-specified soft-auto-commit value. We know there is a
delay between when the data are inserted into the index and when they
can be found successfully via a search. I did not take any screenshots
at the time of the field values so I can't even be sure I wasn't just
having selective-vision at the time.
Thanks for your replies and I apologize for the noise. I'll pick this
thread back up if for some reason I am able to reproduce the issue.
Speaking of the lag-between-insert-and-searchability, is there any
information Solr is able to provide regarding a core's freshness? I have
an administrative interface in my application I've been building which
is able to provide some basic information about a core, "freshen" a core
schema, and re-index the core with data from my application. I would
love to be able to show "last data added to index today 13:34:46" and
"last soft commit/searcher-open (or whatever the right term is) today
13:32:00" so the admin can see "okay, we have a blind-spot which extends
00:02:46 into the past". Does the core metadata give that kind of info?
I'm currently using SolrJ's CoreAdminRequest.getStatus call to get the
metadata.
I can see this kind of data in there (this is old data in a
code-comment; please ignore the actual values):
// index={numDocs=85,maxDoc=90,deletedDocs=5,
// indexHeapUsageBytes=-1,
// version=2093,
// segmentCount=8,
// current=true,
// hasDeletions=true,
//
directory=org.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/path
lockFactory=org.apache.lucene.store.NativeFSLockFactory@52e9883c;
maxCacheMB=48.0 maxMergeSizeMB=4.0),
// segmentsFile=segments_az,
// segmentsFileSizeInBytes=650,
// userData={commitCommandVer=0, commitTimeMSec=1678132702948},
// lastModified=Mon Mar 06 14:58:22 EST
2023,sizeInBytes=56606,size=55.28 KB}}}
Presumably, lastModified gives me the timestamp the last document was
added. What about when the index was opened for searching?
As always, thank you for your thoughts.
-chris