When you are indexing the file and adding the Document, you will need to parse out your filename per your regular expression, and then create the appropriate field:

Document doc = new Document()
String cat = getCategoryFromFileName(inputFileName)
doc.add(new Field("category", cat, ...)
//do the rest of your adds

Just locate where in the demo the Document add is taking place (I forget the exact spot) and then add in the appropriate stuff from above. Obviously, you need to implement the method I stubbed called getCategoryFromFileName.

HTH,
Grant
On Oct 29, 2007, at 1:06 PM, KR wrote:


I've been using the Lucene demo from
http://lucene.apache.org/java/2_1_0/demo.html

I have a set of documents
with filenames that give a good indication of content.

A filename of 12 digits (I think this is [0-9]{12} as a regular
expression) with the extension html is a troubleshooting guide, the number being an error code. A filename with two or three letters, then a minus
(which would be [a-z]{2,3}- I think), then a known string means the
document is about a particular subject; I have a list of the known strings
matched to subjects.

What I would like to do, is have my indexer create a field named
"category", populated with either the string "troubleshooting" or with the
known string extracted from the filename.

Examples:
For a file named 0000000000111.html the indexer adds the field "category" with the value "troubleshooting". For a file named xxx-cal-123.html the indexer adds the field "category" with the value "cal". For a file named xx-qv-(9).html the indexer adds the field "category" with the value "qv".

Is there a way to do that?

Beef.
--
View this message in context: 
http://www.nabble.com/Create-and-populate-a-field-when-indexing-tf4713018.html#a13471852
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Boot Camp Training:
ApacheCon Atlanta, Nov. 12, 2007.  Sign up now!  http://www.apachecon.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to