> So I've decided I'm going to simply have empty fields, and that > brought up several other questions. > > First, is there a limit on the number of fields per document?
I don't think so. > Secondly why are fields in Document implemented with a Vector instead > of a HashSet or similar? Wouldn't retrieval be faster without > iterating through a list? Field order may be important. Maybe it could be done with Lists instead of Vectors, though. > Lastly how difficult (or possible) is it to do something like extend > the Document class to have the functionality I want? I'm not sure, I never had to do it. I see the class is final, so you won't be able to extend it. Maybe the final modifier could be removed, if you present a good use case. Otis > I know I'm likely missing a simple solution but I just can't see it. > > Chris > > On 8/10/05, Chris D <[EMAIL PROTECTED]> wrote: > > I'm adding files to an index over time, so after some time I'm > likely > > to see the same file more than once. I would like to be able to > search > > for the information about that particular instance of the file > > (Filename, date etc) For instance I index File1 and then File2 > (which > > are identical) at different times I want to be able to search for > the > > contents and retrieve all the Filenames and MIME. > > > > The first way I did it was to add a seperate doc for every instance > as follows > > > > DOC 1 > > FILEID 123 > > MIME test/html > > CONTENT blam blam blam etc. > > > > DOC 2 > > FILEID 123 > > FILENAME File1 > > DATE 090909 > > > > DOC 3 > > FILEID 123 > > FILENAME File2 > > DATE 101010 > > AUTH Jim Jones > > > > The problem with this was that if the user needed all of the > Filenames > > that are associated with content:blam I would have to search for > > fileID:123 to retrieve them. This gets slow with several thousand > hits > > because I have to do a search for every hit. > > > > I solved that by using multiple fields of the same name. > > > > DOC 1 > > FILEID 123 > > MIME test/html > > CONTENT blam blam blam etc. > > FILENAME File1 > > DATE 090909 > > FILENAME File2 > > DATE 101010 > > AUTH Jim Jones > > > > But now I have a problem where I can't retrieve specific > information > > about an instance of the file. I tried using getFields(String) but > if > > I wanted the author for instance 2 I have a problem, it should be > Jim > > jones but in the index it looks like he's the auther for instance > 1. > > > > One solution I see would be to fill all of the fields for each > > instance with empty strings, but that seems like a bit of a hack. > > > > Another that fell appart fairly quickly was to have a reference > table. > > > > DOCID 1 > > FILEID 123abd321 > > MIME/TYPE text/html > > INSTANCE uri1 collectiondate1 > > URI1 http://blam.com/ > > COLLECTIONDATE1 12355 > > INSTANCE uri2 collectiondate2 author2 > > URI2 http://google.ca/ > > COLLECTIONDATE2 12356 > > AUTHOR2 Jim Brown > > > > Now I can't search for URI without having to search for URI1:foo + > URI2:foo ... > > > > How can I make specific attributes of an instance of the file > > searchable without having to do a search for every hit? > > > > Thanks, > > Chris > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]