Hi,
I'm experimenting with Solr and indexing schemaless JSON content.
I'm using the latest docker image of Solr, and just testing various things.
The indexing and querying works as I would expect for documents of reasonable
size.
However, if I ask it to index a document that is ~100MB, I'm unable to query any results
from this document.
Yet, I can't find any indication that there was an error in indexing the
document.
Indexing:
curl -vv
'http://localhost:8983/solr/gettingstarted/update/json/docs?f=/docs/**&commit=true' -H
'Content-type: application/json' -d @837-10000-2022010415135.json
* Trying 127.0.0.1:8983...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8983 (#0)
> POST /solr/gettingstarted/update/json/docs?f=/docs/**&commit=true HTTP/1.1
> Host: localhost:8983
> User-Agent: curl/7.68.0
> Accept: */*
> Content-type:application/json
> Content-Length: 97581522
> Expect: 100-continue
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Security-Policy: default-src 'none'; base-uri 'none'; connect-src 'self';
form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src 'self'; media-src
'self'; style-src '
self' 'unsafe-inline'; script-src 'self'; worker-src 'self';
< X-Content-Type-Options: nosniff
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< Content-Type: text/plain;charset=utf-8
< Vary: Accept-Encoding, User-Agent
< Content-Length: 57
<
{
"responseHeader":{
"status":0,
"QTime":376}}
* Connection #0 to host localhost left intact
No errors are logged in the log file:
2022-03-03 22:57:00.412 INFO (searcherExecutor-26-thread-1-processing-x:gettingstarted) [
x:gettingstarted] o.a.s.c.QuerySenderListener QuerySenderListener done.
2022-03-03 22:57:00.414 INFO (searcherExecutor-26-thread-1-processing-x:gettingstarted) [
x:gettingstarted] o.a.s.c.SolrCore [gettingstarted] Registered new searcher autowarm time:
0 ms
2022-03-03 22:57:00.414 INFO (qtp1515403487-49) [ x:gettingstarted]
o.a.s.u.p.LogUpdateProcessorFactory [gettingstarted] webapp=/solr path=/update/json/docs
params={f=/docs/**&commit=true}{add=[fbb18697-d823-46e8-8571-6dde6750634b
(1726321231525838848)],commit=} 0 369
The "Num Docs" reported in the solr GUI increases each time I do this.
A query for everything (*:*) gives me the correct doc count.
But no matter what I query for, I cannot get a result from inside the large document. Am
I hitting some limit that is silently messing up the indexing and/or the query return?
Thanks,
Dan