meta information of hits

Tomas Fischer Thu, 18 Jan 2007 06:12:11 -0800

Hi!

I got lost.


Short version:
Is it possible to index tons of files, execute a query
for word 'foo'.
Look at *each* hit in the 10 best files and receive
some meta information?


Extended version:
I have html like files, which I want to index with
Lucene.

FileA:
<tag1>
<tag2 attr1=a attr2=b> foo </tag2>
<tag2 attr1=c attr2=d> bar </tag2>
<tag2 attr1=e attr2=e> foo </tag2>
</tag1>

FileB:
<tag1>
<tag2 attr1=a attr2=d> foo </tag2>
<tag2 attr1=c attr2=d> bar </tag2>
</tag1>

How can I build the index, that if I search for 'foo',

all corresponding attributes for each hit are returned
from Lucene, but the
ranking is calculated over the files:
FileA a b foo [site ranking value for FileA]
FileA e e foo [site ranking value for FileA]
FileB a d foo [site ranking value for FileB]


First I tried to instance an Document with two fields
for each file. On field for the filename, 
the other one with the tokenized file content:

doc.add(new Field("filename", "FileA");
doc.add(new Field("content", new FileReader(FileA));

Then, the ranking is fine (each file has it own
value), but who can I find now the
specific hits 'foo' in the file with the corresponding
attributes?

Second I tried to add a Document for each word in the
file:

doc.add(new Field("filename", "FileA"))
doc.add(new Field("attr1", "a"))
doc.add(new Field("attr1", "b"))
doc.add(new Field("content", "foo"))

Now, a query returns each hit of 'foo', with the
corresponding attributes, but the ranking is
calculated about each hit, and not of the whole file.


What is the Lucene way to solve my problem?

Thanks.

Best regards,
Tomas




        
                
___________________________________________________________ 
Der frühe Vogel fängt den Wurm. Hier gelangen Sie zum neuen Yahoo! Mail: 
http://mail.yahoo.de

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

meta information of hits

Reply via email to