Hi, I have a large number of XML files to be indexed by Lucene. All the files share similar structure as below:
<Group id="abc" member="cde" blah blah ....> <Subgroup id="abc1" member ="fgh" blah blah ...> <Subgroup id="abc2" member ="fgh" blah blah ...> <Subgroup id="abc3" member ="fgh" blah blah ...> ...... </Group> Things to be noted are: The root element of Group has 30 or so attributes, and it usually has over 2000 Subgroup elements, which in turn also have more than 20 attributes. I want to create one Document object which holds the contents of the Group element, and one Document object which holds all the Subgroup elements. Here are my challenges however: 1. How many fields are advised for a Document to be indexed by Lucene? Will over 30 fields (for the Group element) be too many? 2. How to create a Document object and fields for holding all the Subgroup elements? Is this a good way to think of? 3. How can I link the Document object of the Group element to the Document object of all the Subgroup elements? Please note that I intend to use such two Document objects to achieve the group while I don't know whether it is a good solution or not. I am open to using more than two Documents to do the job, but I don't know how to connect all the objects in Lucene. Many thanks!