You haven't really described the scenario you want to implement. I get that you have raw XML of an unknown structure. What do you want to _do_ with that?
1> if all you want to do is index the data (i.e. strip the tags) try HtmlStripCharFilterFactory. 2> If you want to intelligently take content of the XML and ingest it into specific Solr fields, I don't think you'll be able to do that without writing some specific code to parse the XML, explore it and "do the right thing" with it which will probably involve SolrJ, an XML parser and some programming. Best, Erick On Tue, Aug 16, 2016 at 6:15 AM, Stan Lee <[email protected]> wrote: > We currently have a Microsoft SQL table with a XML datatype. We use DIH to > import the XML Content as is, that is not using the XPathEntityProcessor. > If the elements of the XML content is known, XPathEntity make sense. Could > someone kindly suggest the right way of handling such scenario, without > impacting search performance? > Which tokenizer should we be using? > > > Thanks.
