Sorry for not being specific. I believe this SOLR plugin (LUX) may fit my scenario (query without knowing the tag in advance). http://luxdb.org/README.html
On Tue, Aug 16, 2016 at 12:18 PM, Erick Erickson <[email protected]> wrote: > You haven't really described the scenario you want > to implement. I get that you have raw XML of an > unknown structure. What do you want to _do_ with that? > > 1> if all you want to do is index the data (i.e. strip the tags) > try HtmlStripCharFilterFactory. > 2> If you want to intelligently take content of the XML > and ingest it into specific Solr fields, I don't think you'll be > able to do that without writing some specific code to > parse the XML, explore it and "do the right thing" with it > which will probably involve SolrJ, an XML parser and > some programming. > > Best, > Erick > > On Tue, Aug 16, 2016 at 6:15 AM, Stan Lee <[email protected]> wrote: > > We currently have a Microsoft SQL table with a XML datatype. We use DIH > to > > import the XML Content as is, that is not using the XPathEntityProcessor. > > If the elements of the XML content is known, XPathEntity make sense. > Could > > someone kindly suggest the right way of handling such scenario, without > > impacting search performance? > > Which tokenizer should we be using? > > > > > > Thanks. >
