Karl,
Thanks for your tips.I have considered DOM processing but it seemed to take a
hell of a long time to process all the documents(12,125).
Malcolm Clark
Grant,
Thanks for your tips.I have considered DOM processing but it seemed to take a
hell of a long time to process all the documents(12,125).
Grant,
Thanks for your help with the problem I was experiencing. I split it all down
and realised the problem was the location of the IndexWriting(It was not in the
correct place within the SAX processing) and also becuase of some poor error
handling on my part.
kind thanks,
Malcolm
Hi there Malcolm!
I can´t see any place in your source that you add the document id of
the document you are parsing. startDocument() should atleast add a
sys-id field for the xml document being parsed;
public void startDocument() {
mDocument = new Document();
mDocument.add(new Field(
Sounds like you need to make your articles XML or stop trying to use an
XML parser to process the file, whichever is easier for you. I don't
think your issues are Lucene related. I think you need to get a better
handle on the XML processing. As I suggested on your Digester thread
before, I w
I'm not in anyway an expert, in fact far from, but when I try to reference
each article seperately it complains of entitites as the XML articles are
not well-formed.
Thanks,
MC
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For
Hi Grant,
A highly shortened version of the volume is like below.
]>
IEEE Annals of the History of Computing
Spring 1995 (Vol. 17, No. 1)
Published by the IEEE Computer Society
About this Issue
&A1003;
Comments, Queries, and Debate
&A1004;
Articles
&A1006;
From what I can see, you are only passing volume.xml to your parser.
If I understand your code and questions correctly, the Volume file
simply points to the actual articles that you want to parse. Seems like
you need to parse the Volume file, get the name/location of the article
file and then
It's XML like this. It has 120-ish volumes with references to 12,107 articles
which are like this below:
A1003
10.1041/A1003s-1995
IEEE Annals of the History of Computing
1058-6180/95/$4.00 © 1995
IEEE
Vol. 17, No. 1
Spring1995
pp. 3-3
About this Issuepp. 3-3
J.A.N.LeeEditor‐in‐Chief
The firs
I am not familiar with the INEX collection, could you post a sample?
Malcolm Clark wrote:
Hi again,
I am desperately asking for aid!!
I have used the sandbox demo to parse the INEX collection.The problem
being it points to a volume file which references 50 other xml
articles.Lucene only tre
Hi again,
I am desperately asking for aid!!
I have used the sandbox demo to parse the INEX collection.The problem being
it points to a volume file which references 50 other xml articles.Lucene
only treats this as one document.Is there any method of which I'm
overlooking that halts after each r
11 matches
Mail list logo