Re: What's the best practices for indexing XML Content with dynamic XML Elements (SOLR 6.1) ?

Stan Lee Tue, 16 Aug 2016 12:06:16 -0700

Sorry for not being specific. I believe this SOLR plugin (LUX) may fit my
scenario (query without knowing the tag in advance).
http://luxdb.org/README.html


On Tue, Aug 16, 2016 at 12:18 PM, Erick Erickson <[email protected]>
wrote:

> You haven't really described the scenario you want
> to implement. I get that you have raw XML of an
> unknown structure. What do you want to _do_ with that?
>
> 1> if all you want to do is index the data (i.e. strip the tags)
> try HtmlStripCharFilterFactory.
> 2> If you want to intelligently take content of the XML
> and ingest it into specific Solr fields, I don't think you'll be
> able to do that without writing some specific code to
> parse the XML, explore it and "do the right thing" with it
> which will probably involve SolrJ, an XML parser and
> some programming.
>
> Best,
> Erick
>
> On Tue, Aug 16, 2016 at 6:15 AM, Stan Lee <[email protected]> wrote:
> > We currently have a Microsoft SQL table with a XML datatype. We use DIH
> to
> > import the XML Content as is, that is not using the XPathEntityProcessor.
> > If the elements of the XML content is known, XPathEntity make sense.
> Could
> > someone kindly suggest the right way of handling such scenario, without
> > impacting search performance?
> > Which tokenizer should we be using?
> >
> >
> > Thanks.
>

Re: What's the best practices for indexing XML Content with dynamic XML Elements (SOLR 6.1) ?

Reply via email to