Hi! I got lost.
Short version: Is it possible to index tons of files, execute a query for word 'foo'. Look at *each* hit in the 10 best files and receive some meta information? Extended version: I have html like files, which I want to index with Lucene. FileA: <tag1> <tag2 attr1=a attr2=b> foo </tag2> <tag2 attr1=c attr2=d> bar </tag2> <tag2 attr1=e attr2=e> foo </tag2> </tag1> FileB: <tag1> <tag2 attr1=a attr2=d> foo </tag2> <tag2 attr1=c attr2=d> bar </tag2> </tag1> How can I build the index, that if I search for 'foo', all corresponding attributes for each hit are returned from Lucene, but the ranking is calculated over the files: FileA a b foo [site ranking value for FileA] FileA e e foo [site ranking value for FileA] FileB a d foo [site ranking value for FileB] First I tried to instance an Document with two fields for each file. On field for the filename, the other one with the tokenized file content: doc.add(new Field("filename", "FileA"); doc.add(new Field("content", new FileReader(FileA)); Then, the ranking is fine (each file has it own value), but who can I find now the specific hits 'foo' in the file with the corresponding attributes? Second I tried to add a Document for each word in the file: doc.add(new Field("filename", "FileA")) doc.add(new Field("attr1", "a")) doc.add(new Field("attr1", "b")) doc.add(new Field("content", "foo")) Now, a query returns each hit of 'foo', with the corresponding attributes, but the ranking is calculated about each hit, and not of the whole file. What is the Lucene way to solve my problem? Thanks. Best regards, Tomas ___________________________________________________________ Der frühe Vogel fängt den Wurm. Hier gelangen Sie zum neuen Yahoo! Mail: http://mail.yahoo.de --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]