Re: problem about backup index file

2010-02-25 Thread luocan19826164
thanks for your paper,Michael McCandlessI have one quetion about thisFor all other files, Lucene is "write once.”This makes doing incremental backups very easy: Simply compare the file names.Once a file is written, it will never change; therefore, if you've already backed up that file, there's no

Re: NAS vs SAN vs Server Disk RAID

2010-02-25 Thread Chris Lu
To my experience, some customers used SAN to store the index. It's pretty good and fast. This may be a good choice for you, but it's costly. -- -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.db

Re: NAS vs SAN vs Server Disk RAID

2010-02-25 Thread Andrew Bruno
Katta looks interesting. I have also been looking at SOLR, but both of these require reworking the application, and possibly re-indexing the world again. Do you know if Katta supports Compass/Lucene v2.0 migration? Also, when I say 1T, what I really mean is that we have about 1200 different inde

Re: Fuzzy membership of a term to the document

2010-02-25 Thread Robert Muir
Hello Reza, I've seen some similar stuff to what you mention, such as http://ece.ut.ac.ir/dbrg/Hamshahri/Papers/FuFaIR.ppt In that experiment, the membership was calculated with tf/idf parameters (it looks like that gave best results). I am scratching my head as to how this model could be easily

Re: If you could have one feature in Lucene...

2010-02-25 Thread Thomas Guttesen
RefCount on the IndexWriter, manually controlled but also controlled by background merges. 2010/2/24 Grant Ingersoll > What would it be? > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional co

Re: If you could have one feature in Lucene...

2010-02-25 Thread Mark Miller
It's also not really the case that committers are mainly here to do work and push the project forward. It's an open source project - its up to the community to push the project as they see fit and have the time. Committers are simply past contributers that have proven trustworthy and capable i

Re: If you could have one feature in Lucene...

2010-02-25 Thread N. Hira
I think it speaks to the maturity of the project ... Lucene has solved some of the easier problems in the problem space and the ones that remain are ... difficult. I recently introduced Lucene/Nutch to a group of ~10 relatively capable Java developers. While they find it easy to use, they

Re: If you could have one feature in Lucene...

2010-02-25 Thread Jason Rutherglen
> Who the heck is in charge here? Maybe it's Colonel Walter E. Kurtz? Intuitively perhaps people expect the committers to drive the project? When they don't see this are they less likely to contribute? On Thu, Feb 25, 2010 at 10:33 AM, Mark Miller wrote: > Hahaha - you have a sly humor. > > I

Re: Why is frequency a float number

2010-02-25 Thread Marek Rei
Not sure about the implementation in Lucene but term frequency is usually normalized. Wikipedia: http://en.wikipedia.org/wiki/Tf%E2%80%93idf#Mathematical_details Marek PlusPlus wrote: > Hi, > >I was wondering why TF method gets a float parameter. Isn't frequency > always considered to be int

Why is frequency a float number

2010-02-25 Thread PlusPlus
Hi, I was wondering why TF method gets a float parameter. Isn't frequency always considered to be integer? public abstract float tf(float freq) Best, Reza -- View this message in context: http://old.nabble.com/Why-is-frequency-a-float-number-tp27714523p27714523.html Sent from the Lucen

Re: IndexWriter.getReader.getVersion behavior

2010-02-25 Thread Peter Keegan
I'm pretty sure this output occurred when the version number skipped +1. The line containing ''. separates the close/open IndexWriter. IFD [Indexer]: setInfoStream deletionpolicy=org.apache.lucene.index.keeponlylastcommitdeletionpol...@646f9dd9 IW 9 [Indexer]: setInfoStream: dir=org.ap

Re: IndexWriter.getReader.getVersion behavior

2010-02-25 Thread Michael McCandless
Do you know the place in the infoStream output where you got a reader with the wrong (unexplained extra +1) version? If so, can you post the infoStream output up to that point? Mike On Thu, Feb 25, 2010 at 10:22 AM, Peter Keegan wrote: > I've reproduced this and I have a bunch of infoStream log

Re: If you could have one feature in Lucene...

2010-02-25 Thread Mark Miller
Hahaha - you have a sly humor. I totally agree though. Features are long overdo, and the committers are lazy. I call for a cancellation of all of their paychecks and a stern warning about slacking off in Lucene land. There are dozens of features that are just taking way to long - whatever

Re: If you could have one feature in Lucene...

2010-02-25 Thread Grant Ingersoll
Yeah, there's an open issue in Solr for this one. It's non-trivial and I would love to have it too. On Feb 24, 2010, at 3:23 PM, Marcelo Ochoa wrote: >> What would it be? > An extended query parser syntax > (http://lucene.apache.org/java/2_9_1/queryparsersyntax.html) including > geo-location s

Re: If you could have one feature in Lucene...

2010-02-25 Thread Grant Ingersoll
On Feb 24, 2010, at 4:22 PM, Paul Libbrecht wrote: > I would wish a highlighting feature that's fully integrated. That's what Solr does. Lucene is still, at the end of the day, a library of APIs for people to build things. Solr/Nutch are the Lucene TLP way of expressing these sentiments. ---

Re: If you could have one feature in Lucene...

2010-02-25 Thread Grant Ingersoll
On Feb 25, 2010, at 12:41 AM, Ganesh wrote: > > 1. Payload per document which could be updated without a need to update the > entire document. > Usecase: The state of our indexed content will change based on the User > action (Created/ Viewed/Deleted etc) and we are using Lucene as our databa

Re: IndexWriter.getReader.getVersion behavior

2010-02-25 Thread Peter Keegan
I've reproduced this and I have a bunch of infoStream log files. Since the messages have no timestamps, it's hard to tell where the relevant entries are. What should I be looking for? Peter On Mon, Feb 22, 2010 at 3:58 PM, Peter Keegan wrote: > I'm pretty sure there are flushes and segment merge

Re: problem about backup index file

2010-02-25 Thread Michael McCandless
This is likely happening because you're attempting to copy a file that IndexWriter is currently writing? You shouldn't do that (copy files that are still being written) -- that just wastes bytes (they aren't used by the index), and causes this failure on Windows. Instead, you should use SnapshotD

Re: how to display text with \n,\r in jsp from textarea?

2010-02-25 Thread Erick Erickson
Uhhhmmm, I admit I just scanned the first part of this e-mail, but is the Lucene users list an appropriate venue for this? Erick On Thu, Feb 25, 2010 at 7:01 AM, tejz wrote: > > I am wondering as how all these sites (like this Expert-Exchange, hotmail, > etc etc) works which are able to show al

RE: Phrase Search and NOT_ANALYZED

2010-02-25 Thread Murdoch, Paul
I would still be interested in knowing why the combination of the StandardAnalyzer, a phrase built using double quotes with no stop words, and the QueryParser doesn't return hits while building the same query with the StadardAnalyzer and a PhraseQuery does? Thanks, Paul -Original Message-

how to display text with \n,\r in jsp from textarea?

2010-02-25 Thread tejz
I am wondering as how all these sites (like this Expert-Exchange, hotmail, etc etc) works which are able to show all kind of chars (KEEPING format) in your mail/postings. All these data are entered in the TextField (like this one, where a I am typing this content), which goes to some database

答复: problem about backup index f ile

2010-02-25 Thread luocanrao
Thanks ,Uwe Schindler In linux,it works fine! I -邮件原件- 发件人: Uwe Schindler [mailto:u...@thetaphi.de] 发送时间: 2010年2月25日 16:30 收件人: java-user@lucene.apache.org 主题: RE: problem about backup index file In Windows you have no chance to do that without closing all IndexWriters and IndexReaders

AnalyzerUtils.getLoggingAnalyzer changing the way the inner analyzer works?

2010-02-25 Thread jm
I have an issue with my custom analyzer...see the following code: public static Analyzer getAnalyzer() { // cache the analyzer if (analyzer == null) { analyzer = new CustomStopAnalyzer(); //does some basic customization, nothing too fancy //test

Re: NAS vs SAN vs Server Disk RAID

2010-02-25 Thread Ian Lea
We've run lucene on NAS, although not with indexes anything like as large as 1Tb, and gave up because NFS and lucene don't really work very well together. Google for "lucene nfs" for some details, and some workarounds. I'd second Kay Kay's suggestion to look at a distributed solution such as Katta

Re: If you could have one feature in Lucene...

2010-02-25 Thread Avi Rosenschein
> > Similarity can only be set per index, but I want to adjust scoring > behaviour at a field level, to faciliate this could we pass make field name > available to all score methods. > Currently it is only passed to some such as lengthNorm() but not others > such as tf() > > +1 -- Avi

RE: problem about backup index file

2010-02-25 Thread Uwe Schindler
In Windows you have no chance to do that without closing all IndexWriters and IndexReaders that modify indexes. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: luocan19826...@sohu.com [mailto:luocan19826

Re: If you could have one feature in Lucene...

2010-02-25 Thread Paul Taylor
Grant Ingersoll wrote: What would it be? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org Similarity can only be set per index, but I want

problem about backup index file

2010-02-25 Thread luocan19826164
I want backup my index file,but I get the follow error. java.io.IOException: another program lock the file! at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(Unknown Source) at com.common.Utils.copyDirectory(Utils.java:149) at com.common.Utils.copyDirectory(Uti

Re: Seattle Hadoop/Scalability/NoSQL Meetup Tonight!

2010-02-25 Thread Bradford Stephens
Thanks for coming, everyone! We had around 25 people. A *huge* success, for Seattle. And a big thanks to 10gen for sending Richard. Can't wait to see you all next month. On Wed, Feb 24, 2010 at 2:15 PM, Bradford Stephens wrote: > The Seattle Hadoop/Scalability/NoSQL (yeah, we vary the title) mee