Ganesh - yahoo wrote:
Hello all,
Documents coressponding to multiple users are to be indexed. Each user is
going to search only his documents. Only Administrator could search all users
data.
Is it good to have one database for each User or to have only one database
for all Users? Which will be
On Tue, Aug 5, 2008 at 6:03 PM, Neeraj Gupta <[EMAIL PROTECTED]> wrote:
> It means before Iteration Lucene has already spent time and memory in
> finding all the 50k documents and sorting them
Lucene uses a priority queue to only sort the top results, not all
matching results.
To more precisely sp
Thank you for the reply, As you said
Sure, just iterate over the first 100 entries in your Hits object
It means before Iteration Lucene has already spent time and memory in
finding all the 50k documents and sorting them, then i will retrieve 100
documents. Hence it is costing me a search for 5
Sure, just iterate over the first 100 entries in your Hits object
(or topdocs).
If you're asking how to ignore 49,900 of your documents (that
is, not even consider them at all), you're asking the impossible
because you can't know whether to ignore those other docs
unless you sort them first.
If y
Hi,
I need first 100 documents in a sorted order lets say sorted on the
document id and there are more then 50K documents in the index. My search
query is matching all those 50K documents. Is there any way to get only
first 100 documents that too in a sorted order of document id. I mean
Lucene
Your sort object has no relation to your query in terms of fields. You can
search on anything you want, but construct your Sort object with your
untokenized field.
Best
Erick
On Tue, Aug 5, 2008 at 4:12 PM, Andre Rubin <[EMAIL PROTECTED]> wrote:
> Sounds like a good alternative. But how do I per
Sounds like a good alternative. But how do I perform the search on the
tokenized filed and sort on the un_tokenized one?
Thanks,
Andre
On Tue, Aug 5, 2008 at 12:51 PM, <[EMAIL PROTECTED]> wrote:
> This is what I did and it works fine. My untokenized fields where named:
> "__AMSUNTOK__" + fiel
This is what I did and it works fine. My untokenized fields where named:
"__AMSUNTOK__" + fieldName.
Where fieldName was the name of the tokenized field.
Bob Hastings
Ancept Inc.
Mark Miller <[EMAIL PROTECTED]>
08/05/2008 02:38 PM
Please respond to
java-user@lucene.apache.org
To
java-use
Hey Andre,
The reason the javadoc says the field should not be tokenized stems from
the issue you point out. What you want to do is possible of course, but
making the Lucene code change would complicate a process that can be
quite memory and cpu intensive on large collections. Done right, it
Hi there!
I'm new to Lucene, so forgive any misconceptions on my part.
I created an Index and now I want to search on it based on a field.
The field is a String field and Field.Store.YES and
Field.Index.TOKENIZED. No problems with the search.
Now, I wanted to sort the results, and according to t
Thank you very much, I'm using Solr so it's very relivent to me. Even
though the indexing is being done by a smaller RMI method (since Solr
doesn't support streaming of very large files and has term limits) but
all the searching is done through solr.
Thanks again,
Best Regards, Martin Owens
On T
Hi Martin,
Take a look at what I've done with SOLR-380
(https://issues.apache.org/jira/browse/SOLR-380). It might solve your
problem, or at least give you a good starting point.
Tricia
Michael McCandless wrote:
I think you could use payloads (= arbitrary/opaque byte[]) for this?
You ca
I think you could use payloads (= arbitrary/opaque byte[]) for this?
You can attach a payload to each term occurrence during tokenization
(indexing), and then retrieve the payload during searching.
Mike
Martin Owens wrote:
Hello Users,
I'm working on a project which attempts to store dat
Hello Users,
I'm working on a project which attempts to store data that comes from an
OCR process which describes the pixel co-ordinates of each term in the
document. It's used for hit highlighting.
What I would like to do is store this co-ordinate information alongside
the terms. I know there is
I've never used it, but from my reading of the thread referenced
below, that is exactly what DuplicateFilter does: remove duplicates at
query i.e. retrieval time, based on whatever you tell it.
--
Ian.
On Tue, Aug 5, 2008 at 11:54 AM, sandyg <[EMAIL PROTECTED]> wrote:
>
> Hi Ian,
> Tnx for the
Hi Nico,
On 08/05/2008 at 9:44 AM, Nico Krijnen wrote:
> On 5 aug 2008, at 11:11, Karsten F. wrote:
> > Can't you store only the relevant path in an extra lucene
> > field and set the maximum of query-terms to e.g. 2048 ?
>
> @Karsten: We did think about simplifying permissions to just top-level
>
Thanks for the replies,
We'll try the filters then, possibly with cache if required for
performance.
@Karsten: We did think about simplifying permissions to just top-level
folders, which is probably suitable for 80% of our clients. If the
filter is too slow we may have to. In that case it
I'd start out with one index, if for no other reason
than keeping track of one index for each user would
be a royal pain in the neck. You haven't told us
how many users or documents you expect,
so that's just a guess. There's one answer perhaps
if you wind up with a 10M index, another if it's 10T..
This situation is pretty much the kind of thing PrefixFilters
were written for, so I'd certainly try those first, with or
without caching. I was surprised at how fast filters
get constructed, so I'd just try it and take a few measurements.
Best
Erick
On Tue, Aug 5, 2008 at 3:40 AM, Nico Krijnen <
Hello all,
Documents coressponding to multiple users are to be indexed. Each user is going
to search only his documents. Only Administrator could search all users data.
Is it good to have one database for each User or to have only one database for
all Users? Which will be better?
My opinion i
Per https://issues.apache.org/jira/browse/LUCENE-1349, we have made an
exception to Lucene's backward compatibility rules and marked
Fieldable as "changeable", namely meaning we will allow, on a case-by-
case basis, changes to the interface, meaning anyone who implements
there own Fieldable
Hi Ian,
Tnx for the reply .
But i have already duplicate records and the issue is not for controlling
duplicate documents, but when i retreive documents
i should get unique documents based on keyword feild.
Ian Lea wrote:
>
> Sounds like a thread from a few weeks ago.
> http://www.gossamer-th
You say the stack trace is null, does that mean you are getting an
NullPointerEx. or that you aren't getting any exception but the reader
is still null? Are you sure indexName isn't changing?
Also, it's not very good to open up the Searcher/Reader for every
query anyway, not that solves th
Hi Nico Krijnen,
I think it is ok, to store a filter for each user-session im memory.
And I think that a cached filter is the correct approach for permissions.
(extra memory usage = one bit for each user and each document)
Hopefully someone with more experience will also answer your question.
B
Cam Bazz wrote:
yes, thats why I asked any news for release of 2.3.3.
Ahh, OK, but: 2.3.3 won't have deletion by query because it's just
another point release after 2.3.2. Ie it will only have bug fixes
specifically backported from the trunk.
2.4 is the next release off the trunk (that
Thanks, John and Marcus.
Below is the related code in jsp file, and the stacktrace is null even it
failed to open the index.
try
{
out.print("To open index");
reader = IndexReader.open(indexName);
out.print("have opened the index");
Hello,
Need some help with prefix filtering...
We ran into the max clause count problem with our usage of the
wildcard query. Essentially what we are trying to do is:
One of the fields in our index contains a 'path' representing a file
system location. For example:
/folder A/subfolder/doc
Hi.
And some exception stacktrace would be nice as well.
Kindly
//Marcus
On Tue, Aug 5, 2008 at 4:58 AM, John Griffin <[EMAIL PROTECTED]>wrote:
> Xh,
>
> Sorry about those questions. I received two copies of your email. The first
> was corrupt.
>
> We still need to see more code. No there isn'
28 matches
Mail list logo