Re: Index maintaining/updating

2009-11-09 Thread Anshum
Hi Wenhao, Its generally better to incrementally buld your index and at the same tiime. Considering by this time you'd be a little aware of implementing/using luceneAPI, here is what you could do. Open the existing index using 'createnew' set to false *IndexWriter(Directory d, Analyzer a, boolean

Re: Index maintaining/updating

2009-11-09 Thread hyj
Wenhao Xu,您好! Index.AddDocment(Docment doc) can do your work. After the previous action, remember to commit the Index. === 2009-11-10 14:40:03 您在来信中写道:=== >Hi, everybody, > I am new to Lucene and have a question about how to update my index. The >following is my situation: >

Index maintaining/updating

2009-11-09 Thread Wenhao Xu
Hi, everybody, I am new to Lucene and have a question about how to update my index. The following is my situation: 1) I create indexes for each text (or varchar) field of a relational database; 2) This database will be continuously inserted into by new records; and I need to add indexes of

Re: building lucene-core from source

2009-11-09 Thread Mark Miller
Right - I followed the release wiki and took it out for 2.9.0 - but then before 2.9.1 some discussion arose about not taking it out. Peter Keegan wrote: > I get your points (btw, I built with 1.6), and I like the easy override. > But, my build of 2.9.0 didn't produce a dev jar, which is inconsiste

Re: building lucene-core from source

2009-11-09 Thread Peter Keegan
I get your points (btw, I built with 1.6), and I like the easy override. But, my build of 2.9.0 didn't produce a dev jar, which is inconsistent with 2.9.1. I guess that's the flux you referred to. Peter On Mon, Nov 9, 2009 at 8:13 PM, Mark Miller wrote: > Yeah - its a debatable point. You can

Re: Questions about SEN patch submissions

2009-11-09 Thread Robert Muir
well i suppose we should do this as a last resort. the sen code is pretty nice, its a lot less complex than smartcn for example. also, if you can't modify the internals (just linking to a lib) you are limited in some regard, like smartcn it looks like this one represents the hmm with an object gr

Re: building lucene-core from source

2009-11-09 Thread Mark Miller
Yeah - its a debatable point. You can have issues when building though - did you build with java 1.5? Then its not like the official build. This keeps you from confusing yourself about what artifacts are what. You can override it, but this way you know what you have done. Just because you have the

Re: Questions about SEN patch submissions

2009-11-09 Thread Mark Miller
Marvin Humphrey wrote: > On Mon, Nov 09, 2009 at 04:07:55PM -0500, Robert Muir wrote: > >> Mark, I think my concern is that Sen itself is LGPL ( >> https://sen.dev.java.net/). >> >> this lucene-ja is just a lucene interface to this LGPL library. >> >> I think this dependency might be a problem,

Re: building lucene-core from source

2009-11-09 Thread Peter Keegan
The -dev version is confusing when it's the target of a build from an official release. A build with patches from an official release might warrant a '-dev' version, I suppose. (just my 2 cents.) Peter On Mon, Nov 9, 2009 at 7:57 PM, Mark Miller wrote: > The build/release formula is always in f

Re: Questions about SEN patch submissions

2009-11-09 Thread Marvin Humphrey
On Mon, Nov 09, 2009 at 07:30:40PM -0500, Robert Muir wrote: > Marvin, in this case its the same folks: > https://sen.dev.java.net/servlets/ProjectDocumentList?folderID=755&expandFolder=755&folderID=0 > ... dunno if that matters Not much -- my example still stands. We still can't distribute code

Re: building lucene-core from source

2009-11-09 Thread Mark Miller
The build/release formula is always in flux - we likely hard coded the change in 2.9.0 when releasing - we likely won't again in the future. Some discussion about it came up recently on the list. -- - Mark http://www.lucidimagination.com Peter Keegan wrote: > OK. I just downloaded the 2.9.0 s

Re: building lucene-core from source

2009-11-09 Thread Peter Keegan
OK. I just downloaded the 2.9.0 sources from http://mirror.candidhosting.com/pub/apache/lucene/java/lucene-2.9.0-src.zipto a clean directory. 'ant jar-core' produced: 'build/lucene-core-2.9.jar' (no -dev version suffix and I changed nothing). Are you saying that it should have produced 'build/lucen

Re: Directory.list() deprecation

2009-11-09 Thread Daniel Noll
On Tue, Nov 10, 2009 at 00:44, Michael McCandless wrote: > Stepping back, since presumably your app knows what it's storing in > the directory, can't you filter for files you know you've created? > What's the larger use case here? The exact use case where we were using list() is to determine whet

Re: Questions about SEN patch submissions

2009-11-09 Thread Robert Muir
Marvin, in this case its the same folks: https://sen.dev.java.net/servlets/ProjectDocumentList?folderID=755&expandFolder=755&folderID=0 ... dunno if that matters On Mon, Nov 9, 2009 at 7:02 PM, Marvin Humphrey wrote: > On Mon, Nov 09, 2009 at 04:07:55PM -0500, Robert Muir wrote: > > Mark, I think

RE: building lucene-core from source

2009-11-09 Thread Uwe Schindler
If you build from sources, it automatically assumes a dev version (you could have changed it). If you want to override the automatically set version (as we do it during build), use "ant -Dversion=2.9.1" Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u..

Re: Questions about SEN patch submissions

2009-11-09 Thread Marvin Humphrey
On Mon, Nov 09, 2009 at 04:07:55PM -0500, Robert Muir wrote: > Mark, I think my concern is that Sen itself is LGPL ( > https://sen.dev.java.net/). > > this lucene-ja is just a lucene interface to this LGPL library. > > I think this dependency might be a problem, but I am not the expert: > http://

building lucene-core from source

2009-11-09 Thread Peter Keegan
I know this has been asked before, but I couldn't find the thread. The jar file produced from a build of 2.9.0 is 'lucene-core-2.9.jar'. For 2.9.1, it is 'lucene-core-2.9.1-dev.jar'. When does the '-dev' get removed? Peter

Re: Questions about SEN patch submissions

2009-11-09 Thread Robert Muir
I think this entire thread is welcome/best-served on the dev list, because you are talking about submitting a patch to change the internals of lucene. On Mon, Nov 9, 2009 at 6:16 PM, Mark Bennett wrote: > Thanks Robert, > > At what point would this whole subject be better served on the dev list?

Re: Questions about SEN patch submissions

2009-11-09 Thread Mark Bennett
Thanks Robert, At what point would this whole subject be better served on the dev list? I've been a bit confused about that in the past (on the similar named Solr lists) Mark -- Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408

Re: Questions about SEN patch submissions

2009-11-09 Thread Robert Muir
Mark, If he agrees, maybe you can bring this up on the java-dev list? I think other lucene developers could assist to make sure we do the proper procedures with minimal hassle. On Mon, Nov 9, 2009 at 6:05 PM, Mark Bennett wrote: > I have emailed one of the authors. I have also asked about the

Re: Questions about SEN patch submissions

2009-11-09 Thread Mark Bennett
I have emailed one of the authors. I have also asked about the other authors and the other packages you mentioned. What is the procedure for him, assuming he agrees? Does he have to sign physical paper, or can this be done electronically? Also, I suspect he doesn't reside in the US, I don't kno

Re: Questions about SEN patch submissions

2009-11-09 Thread Robert Muir
if he is ok with it i think we need to setup a software grant, etc in my opinion though, this would be a great thing feature to have in lucene. (we have similar support for chinese now, but no japanese) On Mon, Nov 9, 2009 at 5:51 PM, Mark Bennett wrote: > I'll ask the author. > > -- > Mark Ben

Re: Questions about SEN patch submissions

2009-11-09 Thread Mark Bennett
I'll ask the author. -- Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513 On Mon, Nov 9, 2009 at 2:49 PM, Robert Muir wrote: > Hi Mark, > > I think apache 2.0 would be easiest. But I think BSD also works. > Its a lit

Re: Questions about SEN patch submissions

2009-11-09 Thread Robert Muir
Hi Mark, I think apache 2.0 would be easiest. But I think BSD also works. Its a little strange Sen is LGPL when the underlying dictionaries, chasen, mecab (what it was ported from), all BSD/bsd-like. or they are multi-licensed with BSD being one of them. I also agree and hear you about how the gl

Re: Questions about SEN patch submissions

2009-11-09 Thread Mark Bennett
Hi Robert, Thank you for helping sort through this, and for the Wiki link. A few thoughts here: 1: I think the author will change the license if I ask, he's been very supportive (though seems to be working on other things these days). If you ran the Universe, which specific license would yo

Re: Questions about SEN patch submissions

2009-11-09 Thread Robert Muir
Mark, I think my concern is that Sen itself is LGPL ( https://sen.dev.java.net/). this lucene-ja is just a lucene interface to this LGPL library. I think this dependency might be a problem, but I am not the expert: http://www.apache.org/legal/resolved.html#category-a On Mon, Nov 9, 2009 at 4:01

Re: Questions about SEN patch submissions

2009-11-09 Thread Mark Bennett
Hello Robert, On Mon, Nov 9, 2009 at 12:34 PM, Robert Muir wrote: > Mark, has there been any change to the LGPL dependency? > > On Mon, Nov 9, 2009 at 2:55 PM, Mark Bennett wrote: > > The only code I'm modifying at the moment is the lucene-ja section, which is the integration between core SEN a

Re: Questions about SEN patch submissions

2009-11-09 Thread Robert Muir
Mark, has there been any change to the LGPL dependency? On Mon, Nov 9, 2009 at 2:55 PM, Mark Bennett wrote: > As some of you may recall I've been working on getting the SEN Japanese > morphological analyzer working with 2.9. (and also with Solr 1.4, but > that's not for this list) > > I'm getti

Questions about SEN patch submissions

2009-11-09 Thread Mark Bennett
As some of you may recall I've been working on getting the SEN Japanese morphological analyzer working with 2.9. (and also with Solr 1.4, but that's not for this list) I'm getting close to having a patch for JIRA. However, a couple items: 1: The code is not currently hosted on Apache (it's over

Re: How to use Lucene to suppot quick search on huge databases where the primary content is of non textual format ?

2009-11-09 Thread Chris Lu
If all you do is exact match, you can create non-unique indexes on columns, or functional indexes. If the database index is optimal, there should not be much performance difference between database approach vs Lucene approach. Lucene's inverted index is just one kind of data structure for qui

Re: Change norm encoding

2009-11-09 Thread Michael McCandless
On Mon, Nov 9, 2009 at 12:19 PM, Benjamin Heilbrunn wrote: > After making my post i found this (without taking a deeper look): > > http://issues.apache.org/jira/browse/LUCENE-1260 > > Looks like a solution for that problem. Indeed the most recent patch there looks almost exactly like what you're

Re: Change norm encoding

2009-11-09 Thread Benjamin Heilbrunn
Hi Mike, thanks for your reply. After making my post i found this (without taking a deeper look): http://issues.apache.org/jira/browse/LUCENE-1260 Looks like a solution for that problem. Why wasn't it applied to lucene? Benjamin -

Re: Change norm encoding

2009-11-09 Thread Michael McCandless
On Mon, Nov 9, 2009 at 11:04 AM, Benjamin Heilbrunn wrote: > i've got a problem concerning encoding of norms. > I want to use int values (0-255) instead of float interpreted bytes. > > In my own Similarity-Class, which I use for indexing and searching, I > implemented the static methods encodeNor

Change norm encoding

2009-11-09 Thread Benjamin Heilbrunn
Hi, i've got a problem concerning encoding of norms. I want to use int values (0-255) instead of float interpreted bytes. In my own Similarity-Class, which I use for indexing and searching, I implemented the static methods encodeNorms, decodeNorms and getNormDecoder. But because they are static a

RE: Lucene - Text Classification.

2009-11-09 Thread Lukas, Ray
There is one on Salmon Run that I am using.. it seems to work pretty well.. add the words "Salmon Run" to your Google search.. -Original Message- From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi Kant Sent: Monday, November 09, 2009 10:41 AM To: java-user@lucene

Solr Training in Europe

2009-11-09 Thread Uri Boness
Hi All, For those who are interested, the official Lucid Solr trainings are now available in Europe. The first training - "Introduction to Solr" is a 3 days training covering the basics and some of the more advance features of Solr. It is scheduled for 30th November (till 2nd December) and wil

Re: Lucene - Text Classification.

2009-11-09 Thread Shashi Kant
Take a look at Bayesian text classification, which might be more efficient for your needs. Google it. There are several other text classification methods - depending your needs, you can dig into them. On Mon, Nov 9, 2009 at 10:33 AM, lucenenew wrote: > > i want to classify sentences stored as s

Re: Directory.list() deprecation

2009-11-09 Thread Michael McCandless
On Sun, Nov 8, 2009 at 4:58 PM, Daniel Noll wrote: >> Well... you can use oal.index.IndexFileNameFilter.getFilter() to >> filter for only the Lucene index files, or, you could filter for the >> additional files you know you've placed in the index directory? > > This is the workaround we're curren

Re: How to use Lucene to suppot quick search on huge databases where the primary content is of non textual format ?

2009-11-09 Thread mark harwood
So many questions.. >>Which one will be better As in. * Faster to implement? * Faster to search? * Faster to update? * Cheaper in licenses? * More robust? * Easier to maintain? * Easier to backup? Are results sorted by : * quality (e.g. when using fuzzy text matching)? * distance? * pric

Re: IndexWriter.close() no longer seems to close everything

2009-11-09 Thread Michael McCandless
Does this look like a real leak John? You're definitely closing every reader you get back from getReader? Mike On Sun, Nov 8, 2009 at 10:41 PM, John Wang wrote: > I am seeing the samething, but only when IndexWriter.getReader is called at > a high rate. > > from lsof, I see file handles growing