How can i do sorting on the results i get (if already hits are there then
how to sort on the hits),instead of sorting on all the values b4 getting
results
Chris Lu wrote:
>
> This is because sorting will load all values in that sortFirled into
> memory.
>
> If it's an integer, you will need 4*
Thanks for the reply.
Actually am sorting on a specific field that is on keyword feild which is
unique
and i had 1 gb ram
markrmiller wrote:
>
> To sort on 13mil docs will take like at least 400 mb for the field
> cache. Thats if you only sort on one field...it can grow fast if you
> allow mu
lucene-seme1 s wrote:
Can you please share the custom Analyzer you have ?
Unfortunately it's not mine to share but see the Lucene Token and
Analyzer classes - it's not particularly hard to code.
-
To unsubscribe, e-mail: [
See https://issues.apache.org/jira/browse/LUCENE-794
Spencer Tickner wrote:
Hi List,
Thanks in advance for any help. I'm working with the contrib
highlighting class and am having issues when doing searches with a
phrase. I've been able to duplicate this behaviour in the
HighlighterTest class.
You're going to want to change your TokenFilter so that it emits the split
pieces tokens immediately after the original token and with a
positionIncrement of "0" .. don't buffer then up and wait for the entire
stream to finish first.
it true order of the tokens in the tokenstream and the posit
: Ah, thanks. So since solr-1.2.0 is using
: lucene-*-2007-05-20_00_04-53.jarin its distribution,
: is this why SOLR-261 is still open?
SOLR-261 was left open because it hadn't been verified yet -- I just did
that and resolved the issue against the trunk.
: I thought that maybe it would be a s
On Wednesday 19 March 2008 01:44:33 Ramdas M Ramakrishnan wrote:
> I am using a MultiFieldQueryParser to parse and search the index. Once I
> have the Hits and iterate thru it, I need to know the following?
>
> For every hit document I need to know under which indexed field was this
> Hit originati
Thanks, I'll give that a try.
Cheers,
Spencer
On Tue, Mar 18, 2008 at 1:50 PM, Mark Miller <[EMAIL PROTECTED]> wrote:
> The contrib Highlighter is not position sensitive. You can try out the
> patch I have been working here if you are interested:
> https://issues.apache.org/jira/browse/LUCENE-
Ah, thanks. So since solr-1.2.0 is using
lucene-*-2007-05-20_00_04-53.jarin its distribution,
is this why SOLR-261 is still open?
I thought that maybe it would be a simple drop in replacement, but when I
tossed in
lucene-*-2.3.1.jar to solr, it didn't fix the problem, so maybe something in
solr n
Ian can you attach your version of SegmentMerger.java? Somehow my
lines are off from yours.
Mike
Ian Lea wrote:
Mike
Latest patch produces similar exception:
Exception in thread "Lucene Merge Thread #0"
org.apache.lucene.index.MergePolicy$MergeException:
java.lang.AssertionError: after
The contrib Highlighter is not position sensitive. You can try out the
patch I have been working here if you are interested:
https://issues.apache.org/jira/browse/LUCENE-794
Spencer Tickner wrote:
Hi List,
Thanks in advance for any help. I'm working with the contrib
highlighting class and am
hi Jake, yes it was commited in Lucene - this is visible in the JIRA issue
when if you switch to the "Subversion Commits" tab. where you can also
see the actual diffs that took place.
Best,
Doron
On Tue, Mar 18, 2008 at 7:14 PM, Jake Mannix <[EMAIL PROTECTED]> wrote:
> Hey folks,
> I was wonder
Hi List,
Thanks in advance for any help. I'm working with the contrib
highlighting class and am having issues when doing searches with a
phrase. I've been able to duplicate this behaviour in the
HighlighterTest class.
When calling the testGetBestFragmentsPhrase() method I get the correct:
John K
Can you please share the custom Analyzer you have ? In particular, I am
interested in knowing how to get access to the position, offset values for
each token.
Regards,
JK
On Tue, Mar 18, 2008 at 10:48 AM, mark harwood <[EMAIL PROTECTED]>
wrote:
> I've used a custom analyzer before now to "blend
Hi Ian,
Sheesh that's odd. The SegmentMerger produced an .fdx file that is
one document too short.
Can you run with this patch now, again applied to head of 2.3
branch? I just added another assert inside the loop that does the
field merging.
I will scrutinize this code...
Mike
I
Hey folks,
I was wondering what the status of LUCENE-933 (stop words can cause the
queryparser to end up with no results, due to an e.g. +(the) clause in the
resultant BooleanQuery). According to the tracking bug, it's resolved, and
there's a patch, but where has that patch been applied? I trie
Whoops...10 times to much there. more like 40 meg I think. A string
sort could be a bit higher though, you also need to store all of terms
to index into.
sandyg wrote:
this is my search content
QueryParser parser = new QueryParser("keyword",new StandardAnalyzer());
Query query = parser.parse
To sort on 13mil docs will take like at least 400 mb for the field
cache. Thats if you only sort on one field...it can grow fast if you
allow multi field sorting.
How much RAM are you giving your app?
sandyg wrote:
this is my search content
QueryParser parser = new QueryParser("keyword",new
Ian,
Could you apply the attached patch applied to the head of the 2.3
branch?
It only adds more asserts, to try to pinpoint where exactly this
corruption starts.
Then, re-run the test with asserts enabled and infoStream turned on
and post back. Thanks.
Mike
Ian Lea wrote:
It'
This is because sorting will load all values in that sortFirled into memory.
If it's an integer, you will need 4*N bytes, which is additional 52M for you.
There is no programatical way to increase memory size.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Databa
Hi
I am using a MultiFieldQueryParser to parse and search the index. Once I
have the Hits and iterate thru it, I need to know the following?
For every hit document I need to know under which indexed field was this
Hit originating from? Say I have indexed 2 Fields how will I know from the
Hit whi
It's failed on servers running SuSE 10.0 and 8.2 (ancient!)
$ uname -a shows
Linux phoebe 2.6.13-15-smp #1 SMP Tue Sep 13 14:56:15 UTC 2005 x86_64
x86_64 x86_64 GNU/Linux
and
Linux phobos 2.4.20-64GB-SMP #1 SMP Mon Mar 17 17:56:03 UTC 2003 i686
unknown unknown GNU/Linux
The first one has a 2.8G
On Tue, Mar 18, 2008 at 7:38 AM, Ian Lea <[EMAIL PROTECTED]> wrote:
> Hi
>
>
> When bulk loading into a new index I'm seeing this exception
>
> Exception in thread "Thread-1"
> org.apache.lucene.index.MergePolicy$MergeException:
> org.apache.lucene.index.CorruptIndexException: doc counts differ
I don't see an attachment here -- maybe the mailing list software
stripped it off. If so can you send directly to me? Thanks.
Mike
Ian Lea wrote:
Documents are biblio records. All have title, author etc. stored,
some have a few extra fields as well. Typically around 25 fields per
doc.
I came across an interesting quirk when using Lucene's PriorityQueue.
It's not a bug per se but I thought it might be worth logging here if anyone
else experiences it.
I was using a PriorityQueue to support a GUI that pages through the top terms
in an index. It was observed that terms were ofte
Documents are biblio records. All have title, author etc. stored,
some have a few extra fields as well. Typically around 25 fields per
doc. The index is created with compound format, everything else as
default.
I've rerun the job until failure. Different numbers this time, but
basically the sa
The data is loaded in chunks of up to 100K docs in separate runs of
the program if that helps answer the first question. All buffers have
default values, docs are small but not tiny, JVM is running with
default settings.
Answers to previous questions, and infostream, will follow once the
job has
One question: do you know whether 67,861 docs "feels like" a newly
flushed segment, or, the result of a merge?
Ie, roughly how many docs are you buffering in IndexWriter before it
flushes? Are they very small documents and your RAM buffer is large?
Mike
Ian Lea wrote:
Hi
When bulk l
this is my search content
QueryParser parser = new QueryParser("keyword",new StandardAnalyzer());
Query query = parser.parse("1");
Sort sort = new Sort(new SortField(sortField));
Hits hits = searcher.search(query,sort);
And i had huge data about 13 millions of records
i am not
OK, opening two writers at once is definitely a recipe for disaster.
Please post back on whether this does or doesn't resolve it.
Previous versions of Lucene didn't write the fdt/fdx files until a
segment is flushed, so it's possible you escaped index corruption
(but, lost documents) before.
Michael McCandless wrote:
Yes fdt/fdx hold stored fields. When the first buffered document is
added these files are created.
The only way they disappear (through Lucene's APIs) is if a writer is
opened on that directory, and, those files are not referenced by the
current segments file. Th
Yes fdt/fdx hold stored fields. When the first buffered document is
added these files are created.
The only way they disappear (through Lucene's APIs) is if a writer is
opened on that directory, and, those files are not referenced by the
current segments file. This is why I'm concerned
Can you call IndexWriter.setInfoStream(...) and get the error to
happen and post back the resulting output? And, turn on assertions
(java -ea) since that may catch the issue sooner.
Can you describe you are setting up IndexWriter (autoCommit,
compound, etc.), and what your documents are
It looks like you ignore any IOException coming out of
IndexWriter.close? Can you put some code in the catch clause around
writer.close to see if you are hitting some exception there?
Also, you forcefully remove the write lock if it's present. But are
you absolutely certain there isn't
<[EMAIL PROTECTED]> wrote:
Does each searchable have it's own copy of Term and TermInfo
arrays? So the amount in memory would grow with each new
Searchable instance? If so, it might be worthwhile to implement a
singleton MultiSearcher that is closed and re-opened periodically.
What d
Hi
When bulk loading into a new index I'm seeing this exception
Exception in thread "Thread-1"
org.apache.lucene.index.MergePolicy$MergeException:
org.apache.lucene.index.CorruptIndexException: doc counts differ for
segment _4l: fieldsReader shows 67861 but segmentInfo shows 67862
at
or
Does each searchable have it's own copy of Term and TermInfo arrays? So the
amount in memory would grow with each new Searchable instance? If so, it might
be worthwhile to implement a singleton MultiSearcher that is closed and
re-opened periodically. What do you think?
Thanks again,
Rich
___
Hi Michael
Sorry for the late reply. As you guessed, it missed my attention.
Michael McCandless wrote:
Hi,
Can you describe what led up to this?
My application indexes emails. In this particular instance, I had
reindexed all emails from their original sources. The error occurred
while I w
I've used a custom analyzer before now to "blend in" GATE annotations as tokens
at the same position as the words they relate to.
E.g.
Fred Smith works for Microsoft
would be tokenized ordinarily as the following tokens:
positionoffsettext
========
1
39 matches
Mail list logo