from:"Malcolm Clark"

RE: Index XML file

2006-12-14 Thread MALCOLM CLARK

Hi, Sent you a private email with some code attached ;-) Malcolm yeohwm <[EMAIL PROTECTED]> wrote: Hi, Thanks for the help. Please do let me know what jar file that I needed and where I can find them. Regards, Wooi Meng -- No virus found in this outgoing message. Checked by AVG Free

Re: Index XML file

2006-12-14 Thread MALCOLM CLARK

, Malcolm Clark

Output of index

2006-07-27 Thread MALCOLM CLARK

Hi, I'm going to attempt to output several thousand documents from a 3+ million document collection into a csv file. What is the most efficient method of retrieving all the text from the fields of each document one by one? Please help! Thanks, Malcolm

Re: Indexing large sets of documents?

2006-07-27 Thread MALCOLM CLARK

Is this the W3 Ent collection you are indexing? MC

Re: HTML text extraction

2006-06-29 Thread MALCOLM CLARK

Hi, Would you please send me your parser too? Thanks! Malcolm - Original Message From: Liao Xuefeng <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Friday, June 23, 2006 12:54:29 AM Subject: RE: HTML text extraction hi, all, I wrote my own html parser because it just meets

Re: Lucene in Action

2006-06-06 Thread Malcolm Clark

Try here.. http://www.abebooks.co.uk Maybe they have one cheaper. Malcolm - Original Message - From: "digby" <[EMAIL PROTECTED]> To: Sent: Tuesday, June 06, 2006 11:55 AM Subject: Re: Lucene in Action Thanks everyone, although now I'm not sure what to! Blackwells quicker but

Scoring

2006-05-23 Thread Malcolm Clark

Hi experts, I'm currently indexing the New INEX collection using lucene and pondering this question. When searching how do I retrieve the score based on a section or paragraph etc, and not the document score, when the documents are indexed in multi-fields (XML). Can anyone point me in the correc

Recommendations please

2006-05-13 Thread Malcolm Clark

Hi everyone, I am about to index the INEX collection (22 files with 3 files in each-ish) using Java Lucene. I am undecided with the approach to indexing and have left my LIA book at uni :-/ Would you recommend: 1.. indexing all files into one big index? (would this be inefficient to sear

Re: Reuters

2006-04-21 Thread Malcolm Clark

Okay converting to XML sounds like a great option. Thanks, Malcolm - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Reuters

2006-04-21 Thread Malcolm Clark

Hi all, I didn't know whether to add this to the thread asking about TREC indexing or start a new one. Anyway, has anyone attempted to index/search the Reuters collection which consists of SGML? Mine seems to run through the process okay but alas I'm left with nothing in the index when I check w

Re: search pdf

2006-04-16 Thread Malcolm Clark

URL for all the source code: http://www.lucenebook.com/LuceneInAction.zip - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: search pdf

2006-04-16 Thread Malcolm Clark

Hi, You have to parse/index the PDF files and then you can search the index with Lucene. Have a look at Lucene in Action and the source code which comes with it.There is a good demo which parses common formats such as PDF,Word XML etc. Cheers, MC -

Lucene probabilistic

2006-04-14 Thread Malcolm Clark

Hi all, I came across an old mail list item from 2003 exploring the possibilities of a more probabilistic approach to using Lucene. Do the online experts know if anyone achieved this since? Thanks for any advice, Malc

TREC and INEX

2006-03-25 Thread Malcolm Clark

Hi all, Are any of you planning on using Lucene in any way for the NLP in INEX this year or the Enterprise track in TREC? Thanks, MC

TREC,INEX and Lucene

2006-02-22 Thread Malcolm Clark

Hi all, I am planning on participating in the INEX and hopefully passively on a couple of TREC tracks mainly using the Lucene API. Is anyone else on this list planning on using Lucene during participation? I am particularly interested in the SPAM, Blog and ADHOC tracks. Malcolm Clark

Re: IndexReader.open crashes JVM

2005-12-15 Thread Malcolm Clark

Hi, Maybe post some of the code which is giving you problems and people can view it and try and see what's wrong. Cheers, MC - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Commit changes

2005-11-28 Thread MALCOLM CLARK

Okay.Thanks to you both. Malcolm

Re: Commit changes

2005-11-28 Thread MALCOLM CLARK

Hi thanks for your reply, So when I delete a document the writer.close(); this actually commits the deletion to the index which is not reversible? I have a facility which deletes but leaves the delete 'undoable' until the change is commited by closing the reader. I cannot access the doCommit o

Re: Commit changes

2005-11-28 Thread MALCOLM CLARK

Hi Oren, In the grand scheme of things and in comparison to some of the participants knowledge on here I am fairly new and inexperienced to Java and Lucene. I thought my way may be the most effectual method of implementing the commit.I am using many methods of searching/reading the index for a

Commit changes

2005-11-25 Thread Malcolm Clark

using? My class is this: public abstract class commitDelete extends IndexReader { protected final void commitIndex() { try{ super.commit(); }(IOException e){} } } Incidentally if I close the index does this commit anyway? Please help as I'm stumped. thanks in advance, Malcolm Clark

Memory fault

2005-11-15 Thread MALCOLM CLARK

I'm currently trying to index another collection. I am suffering a problem with writer.close.Basically at the end of indexing it only works if I remove the writer.close.It simple can't find the routine despite being able to find writer.optimize. Has anyone else discovered this problem and what

Re: Extract term and its frequency from the index and file?

2005-11-14 Thread MALCOLM CLARK

cheers

Re: Extract term and its frequency from the index and file?

2005-11-14 Thread MALCOLM CLARK

Hi, Could you send me the url for HighFreqTerms.java in cvs? Thanks, Malcolm

Re: Lucene and Sax

2005-10-31 Thread MALCOLM CLARK

Karl, Thanks for your tips.I have considered DOM processing but it seemed to take a hell of a long time to process all the documents(12,125). Malcolm Clark

Re: Lucene and SAX

2005-10-31 Thread MALCOLM CLARK

Grant, Thanks for your tips.I have considered DOM processing but it seemed to take a hell of a long time to process all the documents(12,125).

Re: Lucene and Sax

2005-10-31 Thread MALCOLM CLARK

Grant, Thanks for your help with the problem I was experiencing. I split it all down and realised the problem was the location of the IndexWriting(It was not in the correct place within the SAX processing) and also becuase of some poor error handling on my part. kind thanks, Malcolm

Lucene and SAX

2005-10-25 Thread Malcolm Clark

Hi again, I am desperately asking for aid!! I have used the sandbox demo to parse the INEX collection.The problem being it points to a volume file which references 50 other xml articles.Lucene only treats this as one document.Is there any method of which I'm overlooking that halts after each r

Re: indexwriter and index searcher

2005-10-24 Thread MALCOLM CLARK

Hi all, I am relatively new and scared by Lucene so please don't flame me.I have abandoned Digester and am now just using other SAX stuff. I have used the sandbox stuff to parse an XML file with SAX which then bungs it into a document in a Lucene index.The bit I'm stuck on is how is a element

Lucene and Digester

2005-10-20 Thread MALCOLM CLARK

Hi I have tried as suggested and isolated Digester from Lucene. Digester doesn't trigger an Element Matching Pattern for each element only the last one of each repeating tag.My XML (trimmed a bit looks like this): IEEE Annals of the History of Computing Spring 1995 (Vol. 17, No. 1) Pub

Re: Lucene/Digester

2005-10-19 Thread MALCOLM CLARK

Okay I'll do that. Thanks very much for the advice as it's much appreciated. Malcolm Clark

Re: Lucene/Digester

2005-10-19 Thread MALCOLM CLARK

Hi I used Luke to check the content of the index and they are not there. cheers, MC

Re: Lucene in Action : example code -> document-parsing framework ...

2005-10-18 Thread MALCOLM CLARK

Hi, Could somebody please help me regarding Lucene and Digester. I have discovered this problem during indexing the INEX collection of XML for my MSc project. During the parsing of the XML files all named Volume.xml the parser will only index the last XML element in any repetitive list. For ex

Lucene/Digester

2005-10-16 Thread Malcolm Clark

Hi all, I'm using Lucene/Digester etc for my MSc I'm quite new to these API's. I'm trying to obtain advice but it's hard to say whether the problem is Lucene or Digester. Firstly: I am trying to index the INEX collection but when I try to index repetitive elements only the last one is indexed. F

RE: Index XML file

Re: Index XML file

Output of index

Re: Indexing large sets of documents?

Re: HTML text extraction

Re: Lucene in Action

Scoring

Recommendations please

Re: Reuters

Reuters

Re: search pdf

Re: search pdf

Lucene probabilistic

TREC and INEX

TREC,INEX and Lucene

Re: IndexReader.open crashes JVM

Re: Commit changes

Re: Commit changes

Re: Commit changes

Commit changes

Memory fault

Re: Extract term and its frequency from the index and file?

Re: Extract term and its frequency from the index and file?

Re: Lucene and Sax

Re: Lucene and SAX

Re: Lucene and Sax

Lucene and SAX

Re: indexwriter and index searcher

Lucene and Digester

Re: Lucene/Digester

Re: Lucene/Digester

Re: Lucene in Action : example code -> document-parsing framework ...

Lucene/Digester

33 matches

Site Navigation

Mail list logo

Footer information