Re: Per user data store

2008-08-05 Thread Antony Bowesman
Ganesh - yahoo wrote: Hello all, Documents coressponding to multiple users are to be indexed. Each user is going to search only his documents. Only Administrator could search all users data. Is it good to have one database for each User or to have only one database for all Users? Which will be

Re: Paging & Sorting

2008-08-05 Thread Yonik Seeley
On Tue, Aug 5, 2008 at 6:03 PM, Neeraj Gupta <[EMAIL PROTECTED]> wrote: > It means before Iteration Lucene has already spent time and memory in > finding all the 50k documents and sorting them Lucene uses a priority queue to only sort the top results, not all matching results. To more precisely sp

Re: Paging & Sorting

2008-08-05 Thread Neeraj Gupta
Thank you for the reply, As you said Sure, just iterate over the first 100 entries in your Hits object It means before Iteration Lucene has already spent time and memory in finding all the 50k documents and sorting them, then i will retrieve 100 documents. Hence it is costing me a search for 5

Re: Paging & Sorting

2008-08-05 Thread Erick Erickson
Sure, just iterate over the first 100 entries in your Hits object (or topdocs). If you're asking how to ignore 49,900 of your documents (that is, not even consider them at all), you're asking the impossible because you can't know whether to ignore those other docs unless you sort them first. If y

Paging & Sorting

2008-08-05 Thread Neeraj Gupta
Hi, I need first 100 documents in a sorted order lets say sorted on the document id and there are more then 50K documents in the index. My search query is matching all those 50K documents. Is there any way to get only first 100 documents that too in a sorted order of document id. I mean Lucene

Re: Sorting

2008-08-05 Thread Erick Erickson
Your sort object has no relation to your query in terms of fields. You can search on anything you want, but construct your Sort object with your untokenized field. Best Erick On Tue, Aug 5, 2008 at 4:12 PM, Andre Rubin <[EMAIL PROTECTED]> wrote: > Sounds like a good alternative. But how do I per

Re: Sorting

2008-08-05 Thread Andre Rubin
Sounds like a good alternative. But how do I perform the search on the tokenized filed and sort on the un_tokenized one? Thanks, Andre On Tue, Aug 5, 2008 at 12:51 PM, <[EMAIL PROTECTED]> wrote: > This is what I did and it works fine. My untokenized fields where named: > "__AMSUNTOK__" + fiel

Re: Sorting

2008-08-05 Thread Robert . Hastings
This is what I did and it works fine. My untokenized fields where named: "__AMSUNTOK__" + fieldName. Where fieldName was the name of the tokenized field. Bob Hastings Ancept Inc. Mark Miller <[EMAIL PROTECTED]> 08/05/2008 02:38 PM Please respond to java-user@lucene.apache.org To java-use

Re: Sorting

2008-08-05 Thread Mark Miller
Hey Andre, The reason the javadoc says the field should not be tokenized stems from the issue you point out. What you want to do is possible of course, but making the Lucene code change would complicate a process that can be quite memory and cpu intensive on large collections. Done right, it

Sorting

2008-08-05 Thread Andre Rubin
Hi there! I'm new to Lucene, so forgive any misconceptions on my part. I created an Index and now I want to search on it based on a field. The field is a String field and Field.Store.YES and Field.Index.TOKENIZED. No problems with the search. Now, I wanted to sort the results, and according to t

Re: Term Based Meta Data

2008-08-05 Thread Martin Owens
Thank you very much, I'm using Solr so it's very relivent to me. Even though the indexing is being done by a smaller RMI method (since Solr doesn't support streaming of very large files and has term limits) but all the searching is done through solr. Thanks again, Best Regards, Martin Owens On T

Re: Term Based Meta Data

2008-08-05 Thread Tricia Williams
Hi Martin, Take a look at what I've done with SOLR-380 (https://issues.apache.org/jira/browse/SOLR-380). It might solve your problem, or at least give you a good starting point. Tricia Michael McCandless wrote: I think you could use payloads (= arbitrary/opaque byte[]) for this? You ca

Re: Term Based Meta Data

2008-08-05 Thread Michael McCandless
I think you could use payloads (= arbitrary/opaque byte[]) for this? You can attach a payload to each term occurrence during tokenization (indexing), and then retrieve the payload during searching. Mike Martin Owens wrote: Hello Users, I'm working on a project which attempts to store dat

Term Based Meta Data

2008-08-05 Thread Martin Owens
Hello Users, I'm working on a project which attempts to store data that comes from an OCR process which describes the pixel co-ordinates of each term in the document. It's used for hit highlighting. What I would like to do is store this co-ordinate information alongside the terms. I know there is

Re: how to get all unique documents based on keyword feild

2008-08-05 Thread Ian Lea
I've never used it, but from my reading of the thread referenced below, that is exactly what DuplicateFilter does: remove duplicates at query i.e. retrieval time, based on whatever you tell it. -- Ian. On Tue, Aug 5, 2008 at 11:54 AM, sandyg <[EMAIL PROTECTED]> wrote: > > Hi Ian, > Tnx for the

RE: folder path prefix filtering

2008-08-05 Thread Steven A Rowe
Hi Nico, On 08/05/2008 at 9:44 AM, Nico Krijnen wrote: > On 5 aug 2008, at 11:11, Karsten F. wrote: > > Can't you store only the relevant path in an extra lucene > > field and set the maximum of query-terms to e.g. 2048 ? > > @Karsten: We did think about simplifying permissions to just top-level >

Re: folder path prefix filtering

2008-08-05 Thread Nico Krijnen
Thanks for the replies, We'll try the filters then, possibly with cache if required for performance. @Karsten: We did think about simplifying permissions to just top-level folders, which is probably suitable for 80% of our clients. If the filter is too slow we may have to. In that case it

Re: Per user data store

2008-08-05 Thread Erick Erickson
I'd start out with one index, if for no other reason than keeping track of one index for each user would be a royal pain in the neck. You haven't told us how many users or documents you expect, so that's just a guess. There's one answer perhaps if you wind up with a 10M index, another if it's 10T..

Re: folder path prefix filtering

2008-08-05 Thread Erick Erickson
This situation is pretty much the kind of thing PrefixFilters were written for, so I'd certainly try those first, with or without caching. I was surprised at how fast filters get constructed, so I'd just try it and take a few measurements. Best Erick On Tue, Aug 5, 2008 at 3:40 AM, Nico Krijnen <

Per user data store

2008-08-05 Thread Ganesh - yahoo
Hello all, Documents coressponding to multiple users are to be indexed. Each user is going to search only his documents. Only Administrator could search all users data. Is it good to have one database for each User or to have only one database for all Users? Which will be better? My opinion i

[IMPORTANT] Fieldable and LUCENE-1349

2008-08-05 Thread Grant Ingersoll
Per https://issues.apache.org/jira/browse/LUCENE-1349, we have made an exception to Lucene's backward compatibility rules and marked Fieldable as "changeable", namely meaning we will allow, on a case-by- case basis, changes to the interface, meaning anyone who implements there own Fieldable

Re: how to get all unique documents based on keyword feild

2008-08-05 Thread sandyg
Hi Ian, Tnx for the reply . But i have already duplicate records and the issue is not for controlling duplicate documents, but when i retreive documents i should get unique documents based on keyword feild. Ian Lea wrote: > > Sounds like a thread from a few weeks ago. > http://www.gossamer-th

Re: failed to open an indexer after about 20 queries

2008-08-05 Thread Grant Ingersoll
You say the stack trace is null, does that mean you are getting an NullPointerEx. or that you aren't getting any exception but the reader is still null? Are you sure indexName isn't changing? Also, it's not very good to open up the Searcher/Reader for every query anyway, not that solves th

Re: folder path prefix filtering

2008-08-05 Thread Karsten F.
Hi Nico Krijnen, I think it is ok, to store a filter for each user-session im memory. And I think that a cached filter is the correct approach for permissions. (extra memory usage = one bit for each user and each document) Hopefully someone with more experience will also answer your question. B

Re: next release

2008-08-05 Thread Michael McCandless
Cam Bazz wrote: yes, thats why I asked any news for release of 2.3.3. Ahh, OK, but: 2.3.3 won't have deletion by query because it's just another point release after 2.3.2. Ie it will only have bug fixes specifically backported from the trunk. 2.4 is the next release off the trunk (that

Re: failed to open an indexer after about 20 queries

2008-08-05 Thread xh sun
Thanks, John and Marcus. Below is the related code in jsp file, and the stacktrace is null even it failed to open the index.        try     {   out.print("To open index");      reader = IndexReader.open(indexName);    out.print("have opened the index");

folder path prefix filtering

2008-08-05 Thread Nico Krijnen
Hello, Need some help with prefix filtering... We ran into the max clause count problem with our usage of the wildcard query. Essentially what we are trying to do is: One of the fields in our index contains a 'path' representing a file system location. For example: /folder A/subfolder/doc

Re: failed to open an indexer after about 20 queries

2008-08-05 Thread Marcus Herou
Hi. And some exception stacktrace would be nice as well. Kindly //Marcus On Tue, Aug 5, 2008 at 4:58 AM, John Griffin <[EMAIL PROTECTED]>wrote: > Xh, > > Sorry about those questions. I received two copies of your email. The first > was corrupt. > > We still need to see more code. No there isn'