Re: parsing Java log file with Lucene 3.0.3

2011-01-01 Thread Hasan Diwan
On 1 January 2011 21:47, Benzion G wrote: > But I'm afraid it will make my index files much bigger. Since I'm indexing > log files the index will be anyway too big so I can't make it even bigger. Have you tried it out? How large are your log files and how large do you expect them to get? -- Sent

Re: parsing Java log file with Lucene 3.0.3

2010-12-31 Thread Hasan Diwan
On 31 December 2010 11:12, Benzion G wrote: > I need to parse the Java log files with Lucene 3.0.3. The StandardAnalyzer is > OK, except it's handling of dots. > > E.g. it handles "java.lang.NullPointerException" as one word andĀ searching for > "NullPointerException" will bring nothing. > > IĀ need

Re: Email Indexing

2010-10-27 Thread Hasan Diwan
On 27 October 2010 18:16, Troy Wical wrote: > Depends on what your trying to index, I suppose. Maildir or mbox? For some > time now, off and on, I have been working to index an ezmlm mailing list > archive. In the end, I went with Swish-E and have made quite a bit of > progress. I am short of m

Email Indexing

2010-10-27 Thread Hasan Diwan
I'd like to provide myself with a searchable index of email. I'm familiar with the Javamail library, so will use this to fetch the mail. Anyone out there done any indexing of email? On Sourceforge, there's zoe[1], which hasn't had a release since 2004, and a couple of other projects. I'm also seein

Re: How to extract 15/20 words around the matched query after getting results from lucene searcher?

2009-05-25 Thread Hasan Diwan
2009/5/24 KK : > There is one more mail I found in the archive[3/4 days old] where someone > asked about extracting 3 neighbors words around the match. I think once you > have the position of matching term/phrase then extracting 3 or 30 neighbors > wont be different, right? because you just have to

Re: How to use regexQuery along with fuzzy logic capabilities

2008-10-22 Thread Hasan Diwan
I seem to recall running the SimpleQueryParser first. If that throws an Exception, I then ran it with the RegexQueryParser with a reduced score. Hth Sent via BlackBerry by AT&T -Original Message- From: "Agrawal, Aashish \(IT\)" <[EMAIL PROTECTED]> Date: Thu, 23 Oct 2008 12:48:46 To: Su

Re: Hiring etiquette

2008-10-19 Thread Hasan Diwan
2008/10/19 Mark Miller <[EMAIL PROTECTED]>: > You might instead limit your email to those that have agreed to be contacted > at http://wiki.apache.org/lucene-java/Support FWIW, the page indicated is immutable. -- Cheers, Hasan Diwan <[

Re: Using lucene as a database... good idea or bad idea?

2008-07-28 Thread Hasan Diwan
Check the nutch or solr projects, both of which are subprojects of lucene. Feel free to drop me a line if you should run into difficulties. Sent via BlackBerry by AT&T -Original Message- From: "John Evans" <[EMAIL PROTECTED]> Date: Mon, 28 Jul 2008 18:53:08 To: Subject: Using lucene as

Re: Does Lucene Java 2.3.2 supports parsing of Microsoft office 2007 documents...

2008-06-27 Thread Hasan Diwan
ed by POI. However, you could write a JNI wrapper around OpenOffice, which does have this support. -- Cheers, Hasan Diwan <[EMAIL PROTECTED]> - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: All results

2008-05-16 Thread Hasan Diwan
writer.print("" + document .get("all")) + ""); } // iterated through every hit if (searcher != null) searcher.close();

Re: All results

2008-05-15 Thread Hasan Diwan
same. However, it's a different document. How do I get lucene to reflect this? -- Cheers, Hasan Diwan <[EMAIL PROTECTED]> 1. http://luke.getopt.org - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: All results

2008-05-15 Thread Hasan Diwan
this minimum score is beyond a cursory glance at the source code -- I'd like lucene to return all matching documents, irrespective of hit score. Many thanks for the help. -- Cheers, Hasan Diwan <[EMAIL PROTECTED]> 1. Search.java, 2.3.1 2. http://lucene.apache.org/java/2_3_1/api/cor

All results

2008-05-15 Thread Hasan Diwan
It would appear that to see all results (including low scoring) I need to pass a different Filter to Searcher.search[1]. If filter is null, only the highest-scoring results are returned. How do I change the threshold for hits returned? -- Cheers, Hasan Diwan <[EMAIL PROTECTED]> 1.

Re: How to Uniquely Identify Documents in a Lucene Index

2008-04-29 Thread Hasan Diwan
ying to figure out if there's a builtin way to retrieve a unique document from the index. -- Cheers, Hasan Diwan <[EMAIL PROTECTED]> - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

How to Uniquely Identify Documents in a Lucene Index

2008-04-27 Thread Hasan Diwan
rieved, then it can be edited and put back in using the updateDocument method. I'm not quite sure how to decorate the data in the document within textarea tags on a click of a button for edit. Many thanks for any help you can provide. -- Cheers, Hasan

Re: Unable to add more than 1 document to Index

2008-04-24 Thread Hasan Diwan
The problem was that I was using the 3-parameter constructor for IndexWriter when I should have been using the 2-parameter one. It works fine now, many thanks for your kind assistance. -- Cheers, Hasan Diwan <[EMAIL PROTEC

Re: Unable to add more than 1 document to Index

2008-04-23 Thread Hasan Diwan
't provide the last argument, it does what it logically should (create new index unless one exists). -- Cheers, Hasan Diwan <[EMAIL PROTECTED]> - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Unable to add more than 1 document to Index

2008-04-23 Thread Hasan Diwan
o a lucene index (and does) in a new document (and does not). This is apparent when I query it using luke and the default lucene web application. Any suggestions or pointers to what to do about this would be eternally helpful? Thanks in

How to create segments files?

2008-03-06 Thread Hasan Diwan
Ladies and Gentlemen: Below is an exception and the source code that generates it: ERROR opening the Index - contact sysadmin! Error message: no segments* file found in org.apache.lucene.store.FSDirectory@/home/hdiwan/public_html/Q4D: files: Stack Trace follows... org.apache.lucene.index.Segme

Re: Help needed

2007-11-23 Thread Hasan Diwan
static methods of the Field class have gone away. I'd use the following in your case: document.add(new Field("fieldname", text, Field.Store.YES, Field.Index.TOKENIZED); to do what you wish. -- Cheers, Hasan Diwan <[EMAIL PROTECTED]> --

Re: Index Stat Functions

2006-08-26 Thread Hasan Diwan
lar to a unix "ls -la /path/to/file) You can get all this using the stat() system call. There's a sample of designing a JNI wrapper at http://java.sun.com/developer/onlineTraining/Programming/JDCBook/jniexamp.html. -- Cheers, Hasan Diwan &

Re: Document Get question

2006-08-26 Thread Hasan Diwan
oc.get("path").split("/").length - 1] -- Cheers, Hasan Diwan <[EMAIL PROTECTED]> - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Tomcat Simple Example

2006-08-23 Thread Hasan Diwan
On 22/08/06, Mag Gam <[EMAIL PROTECTED]> wrote: Does anyone have a simple Tomcat search/result example? I have 4 text files, i would like to index. There's a demonstration war file included with lucene. -- Cheers, Hasan Diwan <[E

Re: EMAIL ADDRESS: Tokenize (i.e. an EmailAnalyzer)

2006-07-28 Thread Hasan Diwan
vamail/javadocs/javax/mail/internet/InternetAddress.html#parse(java.lang.String) and use as: InternetAddress valid = InternetAddress.parse(string)[0]; // far simpler than rewriting it -- Cheers, Hasan Diwan <[EMAIL PROTECTED]>

Re: extract data from mpg/avi etc

2005-04-21 Thread Hasan Diwan
On 21/04/05, Peter Veentjer - Anchor Men <[EMAIL PROTECTED]> wrote: > Does anyone know of a library that can extra metadata from movie > formats? http://computing.ee.ethz.ch/sepp/jmf-1.0-to.html That's advertised to be able to. -- Cheers, Hasan Diwan

Luceneweb.war

2005-03-14 Thread Hasan Diwan
I just checked out a copy of the svn sources and was wondering what the difference is between luceneweb.war and nutch. I'm certain there must be differences, else there wouldn't be two different projects. -- Cheers, Hasan Diwan <[EM