Hello Ryan,

Solr should have no issues for indexing these kinds of documents,
retrieving them is another thing. For multiValued a lot of
datastructures are created when serializing the document, that is
probably the problem.

If you do not need the field returned, do not stored=true the field.
If you do, how about making it not multiValued, but store all values
using a delimiter. Use analysis to split on the delimiter so you can
still search for it. When retrieving the document, it should just spit
out a massive field.

Such a large field will still take some time, but it should take less
time and eat less memory than that many multiValued values.
Deserializing should also take less time, imagine your browser neatly
showing that many objects in XML format.

Regards,
Markus

2022-07-12 15:23 GMT+02:00, Ryan Yacyshyn <ryan.yacys...@gmail.com>:
> Hi all,
>
> I have a data modelling question. A department of ours is indexing
> companies and including a multivalued field in the doc to store a set of
> tickers related to the company. Sounded fine at first until I saw one
> document where the multivalued field had over 50,000 elements in it. There
> are other documents that are storing even more than this because Solr will
> hang and won't return the document.
>
> This is a *_ss field, and their query would be an exact match to one of the
> values in this field to return the company name, eg:
> ticker_fieldname_ss:"AAA NOE 12/31/21".
>
> I'm wondering what would be the best solution to index this data and search
> for the ticker information on a company?
>
> One way I'm thinking of is rather than storing the ticker information as a
> field on the company, have each ticker its own document with the fieldname
> of the company.
>
> Would love to hear your thoughts on this.
>
> Thanks!
>
> Regards,
> Ryan
>

Reply via email to