Using Lucene to Query File properties in Windows

2010-05-17 Thread vijay reddy
Hi , I am planning to use Apache lucense in one of my projects, I want to index files based on the file properties (I won’t be indexing the data) and I want lucense to query the index so that I can quickly find list

Lock obtain timed out

2010-05-17 Thread Saurabh Agarwal
Hi, I am using Lucene 3.0.0 to index (with the demo application IndexFiles) a 6 GB corpus which is on NFS, more over I am storing my index on NFS too. But when I run the program I get following exception caught a class org.apache.lucene.store.LockObtainFailedException with message: Lock obtain

Re: NRT and Caching based on IndexReader

2010-05-17 Thread Shay Banon
Right, make sense. On Tue, May 18, 2010 at 4:23 AM, Yonik Seeley wrote: > On Mon, May 17, 2010 at 9:14 PM, Shay Banon wrote: > > Oh, and one more thing. Deleted docs is a sub case, with NRT, most people > > will almost always add docs as well... . So it is still not really usable > > for field c

Re: NRT and Caching based on IndexReader

2010-05-17 Thread Yonik Seeley
On Mon, May 17, 2010 at 9:14 PM, Shay Banon wrote: > Oh, and one more thing. Deleted docs is a sub case, with NRT, most people > will almost always add docs as well... . So it is still not really usable > for field cache, right? FieldCache should be fine for the general cases - the same entry wil

Re: NRT and Caching based on IndexReader

2010-05-17 Thread Yonik Seeley
On Mon, May 17, 2010 at 9:12 PM, Shay Banon wrote: > Just saw that you opened a case for that. I think that its important in your > test case to also test for object identity, not just equals. This is because > the IndexReader (or the FieldCacheKey) are used as keys in weak hash maps, > which uses

Re: NRT and Caching based on IndexReader

2010-05-17 Thread Shay Banon
Oh, and one more thing. Deleted docs is a sub case, with NRT, most people will almost always add docs as well... . So it is still not really usable for field cache, right? On Tue, May 18, 2010 at 4:12 AM, Shay Banon wrote: > Just saw that you opened a case for that. I think that its important in

Deciding memory requirements for Lucene indexes proactively -- How to?

2010-05-17 Thread Maduranga Kannangara
Hi guys Is there a way (perhaps a formulae) to accurately judge the memory requirement for a Lucene index? (May be based on number of documents or index size etc?) Reason I am asking is that we had two indexes running on separate Tomcat instances and we decided to move both these webapps (Solr)

Re: NRT and Caching based on IndexReader

2010-05-17 Thread Shay Banon
Just saw that you opened a case for that. I think that its important in your test case to also test for object identity, not just equals. This is because the IndexReader (or the FieldCacheKey) are used as keys in weak hash maps, which uses identity (==) equality for keys. If FieldCacheKey is suppo

Re: NRT and Caching based on IndexReader

2010-05-17 Thread Yonik Seeley
On Mon, May 17, 2010 at 9:00 PM, Shay Banon wrote: > Great, so I am not imagining things this late into the night ... ;), not so > great, since using NRT with field cache (like sorting) or caching filters, > or anything that caches based on IndexReader not really an option. This > makes NRT very p

Re: NRT and Caching based on IndexReader

2010-05-17 Thread Shay Banon
Great, so I am not imagining things this late into the night ... ;), not so great, since using NRT with field cache (like sorting) or caching filters, or anything that caches based on IndexReader not really an option. This makes NRT very problematic to use in a real application. -shay.banon On Tu

Re: NRT and Caching based on IndexReader

2010-05-17 Thread Yonik Seeley
Yep, confirmed what you are seeing. I'll check into it and open an issue. -Yonik http://www.lucidimagination.com On Mon, May 17, 2010 at 5:54 PM, Shay Banon wrote: > Yea, I noticed that ;). Even so, I think that with NRT, even the lower level > readers are cloned, meaning that you always get a

Re: Reverse Searching

2010-05-17 Thread Siraj Haider
Hi Steven, Thanks for the information, its very useful. I am definitely going to give it a try and will ask if I get into any problem. thanks -siraj On 5/17/2010 5:59 PM, Steven A Rowe wrote: Hi Siraj, The usual answer to questions like yours ("Will performance of Lucene component X against

RE: Reverse Searching

2010-05-17 Thread Steven A Rowe
Hi Siraj, The usual answer to questions like yours ("Will performance of Lucene component X against my N records be good enough?") is "It depends": on the nature of the queries, the nature of the documents, the hardware you run on, etc. That said, if you construct your query objects once and r

Re: NRT and Caching based on IndexReader

2010-05-17 Thread Shay Banon
Yea, I noticed that ;). Even so, I think that with NRT, even the lower level readers are cloned, meaning that you always get a new instance... . Here is a sample program that tests this behavior, am I doing something wrong? By the way, if what I say is correct, it affects field cache as well p

Re: Reverse Searching

2010-05-17 Thread Siraj Haider
Hi Steven, Thanks for the quick reply. I checked the documentation of MemoryIndex and it seems like, you have to create an index in memory with one document and will have to run the queries against that single document. But my dilemma is, I might have upto 100,000 queries to run against it.

Re: NRT and Caching based on IndexReader

2010-05-17 Thread Yonik Seeley
On Mon, May 17, 2010 at 5:00 PM, Shay Banon wrote: >   I wanted to verify if my understanding is correct. Assuming that I use > NRT, and refresh, say, every 1 second, caching based on IndexReader, such is > what is used in the CachingWrapperFilter is basically useless No, it's fine. Searching in

RE: Reverse Searching

2010-05-17 Thread Steven A Rowe
Hi Siraj, Lucene's MemoryIndex can be used to serve this purpose. >From >: [T]his class targets fulltext search of huge numbers of queries over comparatively small transient r

NRT and Caching based on IndexReader

2010-05-17 Thread Shay Banon
Hi, I wanted to verify if my understanding is correct. Assuming that I use NRT, and refresh, say, every 1 second, caching based on IndexReader, such is what is used in the CachingWrapperFilter is basically useless, since, even if there is an open sub reader, it gets clones meaning there is a ne

Reverse Searching

2010-05-17 Thread Siraj Haider
Hello there, In oracle text search there is a feature to reverse search using ctxrule. What it does is, you create an index (ctxrule) on a column having your search criteria(s) and then throw a document on it and it tells you which criteria(s) it satisfies. Is there something in Lucene that

Re: Problem of getTermFrequencies()

2010-05-17 Thread Grant Ingersoll
Note, depending on your downstream use, you may consider using a TermVectorMapper that allows you to construct your own data structures as needed. -Grant On May 17, 2010, at 3:16 PM, Ian Lea wrote: > terms and freqs are arrays. Try terms[i] and freqs[i]. > > > -- > Ian. > > > On Mon, May

Re: Storing The content

2010-05-17 Thread Saurabh Agarwal
ok thanks :) Saurabh Agarwal On Mon, May 17, 2010 at 8:57 PM, Uwe Schindler wrote: > No ist not possible, as For storing+indexing the content must be read > twice, > which is not possible with Reader. > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > e

RE: Storing The content

2010-05-17 Thread Uwe Schindler
No ist not possible, as For storing+indexing the content must be read twice, which is not possible with Reader. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Saurabh Agarwal [mailto:srbh.g...@gmail.com

Re: Storing The content

2010-05-17 Thread Anshum
Hi Saurabh, I don't think there's a way to do that? Why not use other constructs? -- Anshum Gupta http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opinions to me. The distinction is yours to draw On Mon, May 17, 2010 at 8:04 PM, Saurabh Agarwal wrote: >

Storing The content

2010-05-17 Thread Saurabh Agarwal
Hi, if I want to store the Content field through the constructor Field(string,Reader). Is there any possible way of doing it?? Regards Saurabh Agarwal

Re: Problem of getTermFrequencies()

2010-05-17 Thread manjula wijewickrema
Dear Ian, I changed it as you said and now it is working nicely. Thanks a lot for your kind help. Manjula On Mon, May 17, 2010 at 6:46 PM, Ian Lea wrote: > terms and freqs are arrays. Try terms[i] and freqs[i]. > > > -- > Ian. > > > On Mon, May 17, 2010 at 12:23 PM, manjula wijewickrema > wr

Re: Problem of getTermFrequencies()

2010-05-17 Thread Ian Lea
terms and freqs are arrays. Try terms[i] and freqs[i]. -- Ian. On Mon, May 17, 2010 at 12:23 PM, manjula wijewickrema wrote: > Hi, > > I wrote a code with a view to display the indexed terms and get their term > frequencies of a single document. Although it displys those terms in the > index,

CFP for Lucene Revolution Conference, Boston, MA October 7 & 8 2010

2010-05-17 Thread Grant Ingersoll
Lucene Revolution Call For Participation - Boston, Massachusetts October 7 & 8, 2010 The first US conference dedicated to Apache Lucene and Solr is coming to Boston, October 7 & 8, 2010. The conference is sponsored by Lucid Imagination with additional support from community and other commercia

Problem of getTermFrequencies()

2010-05-17 Thread manjula wijewickrema
Hi, I wrote a code with a view to display the indexed terms and get their term frequencies of a single document. Although it displys those terms in the index, it does not give the term frequencies. Instead it displays ' frequencies are:[...@80fa6f '. What's the reason for this. The code I have wri

Re: How to call high fre. terms using HighFreTerms class

2010-05-17 Thread manjula wijewickrema
hi Erick, Thanx On Sat, May 15, 2010 at 5:37 PM, Erick Erickson wrote: > It looks like a stand-alone program, so you don't call it. > You probably want to get the source code and take a look at > how that program works to get an idea of how to do what you want. > > See the instructions here for g

Fwd: [Travel Assistance] - Applications Open for ApacheCon NA 2010

2010-05-17 Thread Grant Ingersoll
Begin forwarded message: > he Travel Assistance Committee is now taking in applications for those > wanting to attend ApacheCon North America (NA) 2010, which is taking place > between the 1st and 5th November in Atlanta. > > The Travel Assistance Committee is looking for people who would like