Update Document based on Query instead of Term

2011-04-13 Thread Pulkit Singhal
Lucene's IndexWriter allows users to update documents by Term via this method signature: void updateDocument(Term term, Document doc) But what about updating them by Query? Like so: void updateDocument(Query query, Document doc) 1) How can this be done? As far as I know there is no such method si

Lucene index limit

2011-03-24 Thread Pulkit Singhal
Is there some sort of default limit imposed on the Lucene indexes? I try to index 50k or 60k documents but when I use Luke to go inside the index and check the total # of entries indexed, it shows that there are only 32768 entries. It seems liek some sort of limit ... what should I look at to adjus

Works on Windows, crashes on Linux

2011-02-07 Thread Pulkit Singhal
Hello Folks, I'm using Lucene 3.0, my code runs fine on Windows but when I test it on Linux, I run into the following stack trace: java.io.FileNotFoundException: /opt/apache/tomcat/webapps/myapp/luceneData/backend_IP/en_US/_1.fdt (No such file or directory) at java.io.RandomAccessFile.ope

Where to find non-English dictionaries, thesaurus, synonyms

2011-01-06 Thread Pulkit Singhal
Hello, What's a good source to get dictionaries (for spellcorrections) and/or thesaurus (for synonyms) that can be used with Lucene for non-English languages such as Fresh, Chinese, Korean etc? For example, the wordnet contrib module is based on the data set provided by the Princeton based wordne

Spell Checker for Non English languages

2011-01-06 Thread Pulkit Singhal
Hello, I was wondering if anyone on this mailing list have ever compiled a list of algorithms for various non English languages that work well with the lucene-spellchecker contrib module? For example, with English using an spellchecker index built using ngrams and then searched using LevensteinDi

Dismax in Lucene

2010-11-20 Thread Pulkit Singhal
Hello, I heard Yonik talk about a better dismax query parser for Solr so I was wondering if Lucene already has this functionality contributed to its contrib modules? - Pulkit - To unsubscribe, e-mail: java-user-unsubscr...@lucen

Re: uncorrect results

2010-11-18 Thread Pulkit Singhal
e a match at all and therefore present it in the results? Just a theory (a bad one perhaps) ... but one which can be easily blown away by using ANALYZED in your indexer and then trying again. - Pulkit On Thu, Nov 18, 2010 at 12:55 PM, Pulkit Singhal wrote: > Wow, you live in a really gr

Re: uncorrect results

2010-11-18 Thread Pulkit Singhal
Wow, you live in a really great country and attend an awesome university where they have classes like "Text Analytics" I'm gonna send my kid there to study :) In all seriousness I think the problem may be with how you are collecting your results. I find this very amusing: > 80. 896889 phrase occu

How to combine QueryParser and Wildcard search

2010-11-18 Thread Pulkit Singhal
Hello, I was wondering if there is any API call in Lucene that allows something like the following: Step 1: Take the user input "hello world" you are beautiful Step 2: QueryParser does its thing defaultField:hello world defaultField:you defaultField:are defaultField:beautiful Step 3: And someho

Re: KeywordAnalyzer and Boosting

2010-11-18 Thread Pulkit Singhal
, 2010 at 4:10 AM, Ian Lea wrote: > Have you tried explicitly setting norms on/off the way you want with > Field.setOmitNorms(boolean)? > > > -- > Ian. > > On Thu, Nov 18, 2010 at 12:54 AM, Pulkit Singhal > wrote: >> Based on my experimentation and what it sa

Re: KeywordAnalyzer and Boosting

2010-11-17 Thread Pulkit Singhal
rdAnalyzer during indexing and get NORMS. So much for being elegant, if someone has some way to make it happen, please let me know. Thanks. On Wed, Nov 17, 2010 at 7:09 PM, Pulkit Singhal wrote: > Greetings! > > When using KeywordAnalyzer for indexing a field which has the > Field

KeywordAnalyzer and Boosting

2010-11-17 Thread Pulkit Singhal
Greetings! When using KeywordAnalyzer for indexing a field which has the Field.Index.ANALYZED option selected. Does the use of KeywordAnalyzer automatically mean that there is no point in trying to set the index-time boosts on that field in the document because it will be treated as a full token

Re: Delete Document from Index. How?

2010-11-12 Thread Pulkit Singhal
Looked at 2.2 api and those methods should be there. So the NoSuchMethodException makes no sense. Are you absolutely sure that your integration between PHP & Java is setup properly and you really are using 2.2? Could there be multiple versions of lucene jars in your classpath? such that older ones

Re: IndexWriters and write locks

2010-11-10 Thread Pulkit Singhal
> > > > -Original Message- > > From: Pulkit Singhal [mailto:pulkitsing...@gmail.com] > > Sent: Wednesday, November 10, 2010 2:55 PM > > To: java-user@lucene.apache.org > > Subject: Re: IndexWriters and write locks > > > > You know that really

Re: IndexWriters and write locks

2010-11-10 Thread Pulkit Singhal
, Uwe Schindler wrote: > Are you using NFS as filesystem? NFS is incompatible to lucene :-) > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -----Original Message- > > From: Pu

Re: IndexWriters and write locks

2010-11-10 Thread Pulkit Singhal
k file itself > is just a placeholder which is not cleaned up on Ctrl-C. The lock is not the > file itself, its *on* the file. > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -

Re: IndexWriters and write locks

2010-11-10 Thread Pulkit Singhal
[correctly] release the lock on process > exit. > > Mike > > On Wed, Nov 10, 2010 at 9:38 AM, Pulkit Singhal > wrote: > > Hello, > > > > 1) On Windows, I often shut down my application server (which has active > > IndexWriters open) using the ctrl+c keys.

IndexWriters and write locks

2010-11-10 Thread Pulkit Singhal
Hello, 1) On Windows, I often shut down my application server (which has active IndexWriters open) using the ctrl+c keys. 2) I inspect my directories on the file system I see that the write.lock file is still there. 3) I start the app server again, and do some operations that would require IndexWr

Re: Implementing indexing of Versioned Document Collections

2010-11-10 Thread Pulkit Singhal
1) You can attach byte array "Payloads" for every occurrence of a term during indexing. It will be stored at each term position, during indexing, and then can be retrieved during searching. You may want to consider taking this approach rather than writing bitvectors to a text file. If you feel that

Re: Lucene index update

2010-10-27 Thread Pulkit Singhal
or that document then it would list fields > from index1 and index2. > > On Wed, Oct 27, 2010 at 9:04 PM, Pulkit Singhal >wrote: > > > Look interesting, what is the merit in having a second index in order to > > keep the document id the same? Perhaps I have misund

Re: Lucene index update

2010-10-27 Thread Pulkit Singhal
Look interesting, what is the merit in having a second index in order to keep the document id the same? Perhaps I have misunderstood. Just want to understand your motivation here. On Wed, Oct 20, 2010 at 2:57 PM, Nilesh Vijaywargiay wrote: > I've written a blog regarding a work around for updati

Re: Use of Lucene to store data from RSS feeds

2010-10-15 Thread Pulkit Singhal
When you ask: a) will each feed would form a Lucene document, or b) will each database row would form a lucene document I'm inclined to say that really depends on what type of aggregation tool or logic you are using. I don't know if "Tika" does it but if there is a tool out there that can be point

Re: Checksum and transactional safety for lucene indexes

2010-09-24 Thread Pulkit Singhal
dexExceptions, NPE or > array out-of-bounds exceptions. There is no checksumming of the index files. > > Lance > > Pulkit Singhal wrote: >> >> Hello Everyone, >> >> What happens if: >> a) lucene index gets written half-way to the disk and then somethin

Re: How to count entries in an index file?

2010-09-24 Thread Pulkit Singhal
Is using IndexReader.numDocs() on the Directory instance, the only way to count the indexed entries? On Fri, Sep 24, 2010 at 9:40 AM, Pulkit Singhal wrote: > Hello Everyone, > > I want to load the indexed data from the file system using FSDirectory. > But I also want to be sure if s

How to count entries in an index file?

2010-09-24 Thread Pulkit Singhal
Hello Everyone, I want to load the indexed data from the file system using FSDirectory. But I also want to be sure if something was actually loaded or if a new empty directory was created and returned to me. How can I count the # of entries in the Directory object returned to me? Thanks! - Pulkit

Checksum and transactional safety for lucene indexes

2010-09-20 Thread Pulkit Singhal
Hello Everyone, What happens if: a) lucene index gets written half-way to the disk and then something goes wrong? b) the index gets corrupted on the file system? When we open that directory location again using FSDirectory implementations: a) Is there any provision for the code to clean out the p

How to close the wrapped directory implementation

2010-09-17 Thread Pulkit Singhal
With RAMDirectory we have the option of providing another Directory implementation such as FSDirectory that can be wrapped and loaded into memory: Directory directory = new RAMDirectory(FSDirectory.open(new File(fileDirectoryName))); But after building the index, if I close the IndexWriter then t