Hi,
I am not able to hit any limit in terms of uploading a 100MB file and was
able to search the relevant fields inside the doc too.
[image: image.png]

I hope:
1. The json file that you are trying to upload has a root level key named
"docs"
2. You are not trying to fetch the entire document when using solr admin UI.

Reason for stating point 2 is that solr will convert all fields into
"stored" in schemaless mode AFAIK and hence you are asking the admin UI to
fetch 100MB of payload before which it will time out.

That being said, using solr in this form may not be the ideal way to
accomplish the task. (I am not a solr expert, so feel free to disagree if
you know better.)
You can try to use extractors and index key part of the information which
your users are more likely to search and use solr as a way to get the
matching document IDs and serve the actual document for rendering purposes
outside of solr.


On Fri, Mar 4, 2022 at 4:35 AM Dan Armbrust <daniel.armbrust.l...@gmail.com>
wrote:

> Hi,
>
> I'm experimenting with Solr and indexing schemaless JSON content.
>
> I'm using the latest docker image of Solr, and just testing various things.
>
> The indexing and querying works as I would expect for documents of
> reasonable size.
>
> However, if I ask it to index a document that is ~100MB, I'm unable to
> query any results
> from this document.
>
> Yet, I can't find any indication that there was an error in indexing the
> document.
>
> Indexing:
>
> curl -vv
> '
> http://localhost:8983/solr/gettingstarted/update/json/docs?f=/docs/**&commit=true'
> -H
> 'Content-type: application/json' -d @837-10000-2022010415135.json
> *   Trying 127.0.0.1:8983...
> * TCP_NODELAY set
> * Connected to localhost (127.0.0.1) port 8983 (#0)
>  > POST /solr/gettingstarted/update/json/docs?f=/docs/**&commit=true
> HTTP/1.1
>  > Host: localhost:8983
>  > User-Agent: curl/7.68.0
>  > Accept: */*
>  > Content-type:application/json
>  > Content-Length: 97581522
>  > Expect: 100-continue
>  >
> * Mark bundle as not supporting multiuse
> < HTTP/1.1 100 Continue
> * We are completely uploaded and fine
> * Mark bundle as not supporting multiuse
> < HTTP/1.1 200 OK
> < Content-Security-Policy: default-src 'none'; base-uri 'none';
> connect-src 'self';
> form-action 'self'; font-src 'self'; frame-ancestors 'none'; img-src
> 'self'; media-src
> 'self'; style-src '
> self' 'unsafe-inline'; script-src 'self'; worker-src 'self';
> < X-Content-Type-Options: nosniff
> < X-Frame-Options: SAMEORIGIN
> < X-XSS-Protection: 1; mode=block
> < Content-Type: text/plain;charset=utf-8
> < Vary: Accept-Encoding, User-Agent
> < Content-Length: 57
> <
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":376}}
> * Connection #0 to host localhost left intact
>
>
> No errors are logged in the log file:
>
> 2022-03-03 22:57:00.412 INFO
> (searcherExecutor-26-thread-1-processing-x:gettingstarted) [
> x:gettingstarted] o.a.s.c.QuerySenderListener QuerySenderListener done.
> 2022-03-03 22:57:00.414 INFO
> (searcherExecutor-26-thread-1-processing-x:gettingstarted) [
> x:gettingstarted] o.a.s.c.SolrCore [gettingstarted] Registered new
> searcher autowarm time:
> 0 ms
> 2022-03-03 22:57:00.414 INFO  (qtp1515403487-49) [ x:gettingstarted]
> o.a.s.u.p.LogUpdateProcessorFactory [gettingstarted]  webapp=/solr
> path=/update/json/docs
> params={f=/docs/**&commit=true}{add=[fbb18697-d823-46e8-8571-6dde6750634b
> (1726321231525838848)],commit=} 0 369
>
> The "Num Docs" reported in the solr GUI increases each time I do this.
>
> A query for everything (*:*) gives me the correct doc count.
>
> But no matter what I query for, I cannot get a result from inside the
> large document.  Am
> I hitting some limit that is silently messing up the indexing and/or the
> query return?
>
> Thanks,
>
> Dan
>
>
>
>
>

-- 
6harat
[solr enthusiast, not affiliated to core dev team]

Reply via email to