[
https://issues.apache.org/jira/browse/SOLR-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karl Wright updated SOLR-6199:
------------------------------
Component/s: clients - java
> SolrJ, using SolrInputDocument methods, requires entire document to be loaded
> into memory
> -----------------------------------------------------------------------------------------
>
> Key: SOLR-6199
> URL: https://issues.apache.org/jira/browse/SOLR-6199
> Project: Solr
> Issue Type: Bug
> Components: clients - java
> Affects Versions: 4.7.3
> Reporter: Karl Wright
>
> ManifoldCF has historically used Solr's extracting update handler for
> transmitting binary documents to Solr. Recently, we've included Tika
> processing of binary documents, and wanted instead to send an (unlimited by
> ManifoldCF) character stream as a primary content field to Solr instead.
> Unfortunately, it appears that the SolrInputDocument metaphor for receiving
> extracted content and metadata requires that all fields be completely
> converted to String objects. This will cause ManifoldCF to certainly run out
> of memory at some point, when multiple ManifoldCF threads all try to convert
> large documents to in-memory strings at the same time.
> I looked into what would be needed to add streaming support to UpdateRequest
> and SolrInputDocument. Basically, a legal option would be to set a field
> value that would be a Reader or a Reader[]. It would be straightforward to
> implement this, EXCEPT for the fact that SolrCloud apparently makes
> UpdateRequest copies, and copying a Reader isn't going to work unless there's
> a backing solid object somewhere. Even then, I could have gotten this to
> work by using a temporary file for large streams, but there's no signal from
> SolrCloud when it is done with its copies of UpdateRequest, so there's no
> place to free any backing storage.
> If anyone knows a good way to do non-extracting updates without loading
> entire documents into memory, please let me know.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]