thanks for your paper,Michael McCandlessI have one quetion about thisFor all
other files, Lucene is "write once.”This makes doing
incremental backups very easy: Simply compare the file names.Once a
file is written, it will never change; therefore, if you've already
backed up that file, there's no
To my experience, some customers used SAN to store the index. It's
pretty good and fast. This may be a good choice for you, but it's costly.
--
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.db
Katta looks interesting. I have also been looking at SOLR, but both of
these require reworking the application, and possibly re-indexing the world
again.
Do you know if Katta supports Compass/Lucene v2.0 migration?
Also, when I say 1T, what I really mean is that we have about 1200 different
inde
Hello Reza,
I've seen some similar stuff to what you mention, such as
http://ece.ut.ac.ir/dbrg/Hamshahri/Papers/FuFaIR.ppt
In that experiment, the membership was calculated with tf/idf parameters (it
looks like that gave best results).
I am scratching my head as to how this model could be easily
RefCount on the IndexWriter, manually controlled but also controlled by
background merges.
2010/2/24 Grant Ingersoll
> What would it be?
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional co
It's also not really the case that committers are mainly here to do work
and push the project forward. It's an open source project - its up to
the community to push the project as they see fit and have the time.
Committers are simply past contributers that have proven trustworthy and
capable i
I think it speaks to the maturity of the project ... Lucene has
solved some of the easier problems in the problem space and the ones
that remain are ... difficult.
I recently introduced Lucene/Nutch to a group of ~10 relatively
capable Java developers. While they find it easy to use, they
> Who the heck is in charge here?
Maybe it's Colonel Walter E. Kurtz?
Intuitively perhaps people expect the committers to drive the project?
When they don't see this are they less likely to contribute?
On Thu, Feb 25, 2010 at 10:33 AM, Mark Miller wrote:
> Hahaha - you have a sly humor.
>
> I
Not sure about the implementation in Lucene but term frequency is
usually normalized.
Wikipedia: http://en.wikipedia.org/wiki/Tf%E2%80%93idf#Mathematical_details
Marek
PlusPlus wrote:
> Hi,
>
>I was wondering why TF method gets a float parameter. Isn't frequency
> always considered to be int
Hi,
I was wondering why TF method gets a float parameter. Isn't frequency
always considered to be integer?
public abstract float tf(float freq)
Best,
Reza
--
View this message in context:
http://old.nabble.com/Why-is-frequency-a-float-number-tp27714523p27714523.html
Sent from the Lucen
I'm pretty sure this output occurred when the version number skipped +1.
The line containing ''. separates the close/open IndexWriter.
IFD [Indexer]: setInfoStream
deletionpolicy=org.apache.lucene.index.keeponlylastcommitdeletionpol...@646f9dd9
IW 9 [Indexer]: setInfoStream:
dir=org.ap
Do you know the place in the infoStream output where you got a reader
with the wrong (unexplained extra +1) version? If so, can you post
the infoStream output up to that point?
Mike
On Thu, Feb 25, 2010 at 10:22 AM, Peter Keegan wrote:
> I've reproduced this and I have a bunch of infoStream log
Hahaha - you have a sly humor.
I totally agree though. Features are long overdo, and the committers are
lazy.
I call for a cancellation of all of their paychecks and a stern warning
about slacking off in Lucene land.
There are dozens of features that are just taking way to long - whatever
Yeah, there's an open issue in Solr for this one. It's non-trivial and I would
love to have it too.
On Feb 24, 2010, at 3:23 PM, Marcelo Ochoa wrote:
>> What would it be?
> An extended query parser syntax
> (http://lucene.apache.org/java/2_9_1/queryparsersyntax.html) including
> geo-location s
On Feb 24, 2010, at 4:22 PM, Paul Libbrecht wrote:
> I would wish a highlighting feature that's fully integrated.
That's what Solr does. Lucene is still, at the end of the day, a library of
APIs for people to build things. Solr/Nutch are the Lucene TLP way of
expressing these sentiments.
---
On Feb 25, 2010, at 12:41 AM, Ganesh wrote:
>
> 1. Payload per document which could be updated without a need to update the
> entire document.
> Usecase: The state of our indexed content will change based on the User
> action (Created/ Viewed/Deleted etc) and we are using Lucene as our databa
I've reproduced this and I have a bunch of infoStream log files. Since the
messages have no timestamps, it's hard to tell where the relevant entries
are. What should I be looking for?
Peter
On Mon, Feb 22, 2010 at 3:58 PM, Peter Keegan wrote:
> I'm pretty sure there are flushes and segment merge
This is likely happening because you're attempting to copy a file that
IndexWriter is currently writing?
You shouldn't do that (copy files that are still being written) --
that just wastes bytes (they aren't used by the index), and causes
this failure on Windows.
Instead, you should use SnapshotD
Uhhhmmm, I admit I just scanned the first part of this e-mail, but is
the Lucene users list an appropriate venue for this?
Erick
On Thu, Feb 25, 2010 at 7:01 AM, tejz wrote:
>
> I am wondering as how all these sites (like this Expert-Exchange, hotmail,
> etc etc) works which are able to show al
I would still be interested in knowing why the combination of the
StandardAnalyzer, a phrase built using double quotes with no stop words, and
the QueryParser doesn't return hits while building the same query with the
StadardAnalyzer and a PhraseQuery does?
Thanks,
Paul
-Original Message-
I am wondering as how all these sites (like this Expert-Exchange, hotmail,
etc etc) works which are able to show all
kind of chars (KEEPING format) in your mail/postings. All these data are
entered in the TextField (like this one,
where a I am typing this content), which goes to some database
Thanks ,Uwe Schindler
In linux,it works fine!
I
-邮件原件-
发件人: Uwe Schindler [mailto:u...@thetaphi.de]
发送时间: 2010年2月25日 16:30
收件人: java-user@lucene.apache.org
主题: RE: problem about backup index file
In Windows you have no chance to do that without closing all IndexWriters and
IndexReaders
I have an issue with my custom analyzer...see the following code:
public static Analyzer getAnalyzer() {
// cache the analyzer
if (analyzer == null) {
analyzer = new CustomStopAnalyzer(); //does some basic
customization, nothing too fancy
//test
We've run lucene on NAS, although not with indexes anything like as
large as 1Tb, and gave up because NFS and lucene don't really work
very well together. Google for "lucene nfs" for some details, and some
workarounds.
I'd second Kay Kay's suggestion to look at a distributed solution such as Katta
>
> Similarity can only be set per index, but I want to adjust scoring
> behaviour at a field level, to faciliate this could we pass make field name
> available to all score methods.
> Currently it is only passed to some such as lengthNorm() but not others
> such as tf()
>
> +1
-- Avi
In Windows you have no chance to do that without closing all IndexWriters and
IndexReaders that modify indexes.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: luocan19826...@sohu.com [mailto:luocan19826
Grant Ingersoll wrote:
What would it be?
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
Similarity can only be set per index, but I want
I want backup my index file,but I get the follow error.
java.io.IOException: another program lock the file! at
java.io.FileInputStream.readBytes(Native Method) at
java.io.FileInputStream.read(Unknown Source) at
com.common.Utils.copyDirectory(Utils.java:149) at
com.common.Utils.copyDirectory(Uti
Thanks for coming, everyone! We had around 25 people. A *huge*
success, for Seattle. And a big thanks to 10gen for sending Richard.
Can't wait to see you all next month.
On Wed, Feb 24, 2010 at 2:15 PM, Bradford Stephens
wrote:
> The Seattle Hadoop/Scalability/NoSQL (yeah, we vary the title) mee
29 matches
Mail list logo