Hi to all,
I'm trying to understand how to "master" Morphline configuration files in
order to put some data into Solr but I'm facing some problem with
TestMorphlineSolrSink. This is what I done:
1) Since I want to index the title of the testXML.xml (i.e. "Tika test
document") so I commented out all the parsers
except org.apache.tika.parser.xml.DcXMLParser (which parse Doublin Core
metadata)
2) In schema.xml I added the following field:
<field name="title" type="text_en" indexed="true" stored="true"
multiValued="false" />
But:
- If I don't add anything to fmap or capture everything works fine but I
don't understand why (who fills that field?). If instead I add to capture
title or/and to famp title: title (or dc_title:title) Solr complains that 2
values are retrieved for 'title' (debugging the values I see the title and
one empty value in the 'title\ metadata array...).
Thus, the problem is that everything works magically if the field is named
title, but if I change its name to something like doc_title there's no way
to make it non-multivalued. Am I right? How can I fix this problem?
- I'd like to manage JSON files..How can I map JSON fields to Solr fields?
Could someone give a simple example?
Best,
Flavio