Sil,
When you switched over to using the Fast Vector Highlighter, did you
change your schema so that the fields that you want to highlight provide
term vector information, and reindex your documents? Term vectors are
necessary when using the Fast Vector Highlighter. Posting your schema may
show va
Eric,
Your example document is quite long. Are you setting hl.maxAnalyzedChars?
If you don't, the highlighter you appear to be using will not look past
the first 51,200 characters of the document for snippet candidates.
http://wiki.apache.org/solr/HighlightingParameters#hl.maxAnalyzedChars
-- Br
>> I’m having some issues with Solr search results (using Solr 1.4 ) . I
have enabled highlighting of searched text (hl=true) and set the fragment
size as 500 (hl.fragsize=500) in the search query.
Below is the (screen shot) results shown when I searched for the term
‘grandfather’ (2 results are
> Hey Bryan, Thanks for the response! To make use of the
> FastVectorHighlighter
> you need to enable termVectors, termPositions, and termOffsets correct?
> Which takes a considerable amount of space, but is good to know and I
may
> possibly pursue this solution as well. Just starting to look at
> I'm trying to find a way to best highlight search results even though
> those
> results are not stored in my index. Has anyone been successful in
reusing
> the SOLR highlighting logic on non-stored data?
I was able to do this by slightly modifying the FastVectorHighlighter so
that it returned b
hen
> > I turn on highlighting that I take the huge performance hit.
> >
> > Again, I'm using the FastVectorHighlighting. The hl.fl is set to "name
> > name_par description description_par content content_par" so that it
> > returns highligh
description_par content content_par" so that it
> returns highlights for full and partial word matches. All of those
> fields have indexed, stored, termPositions, termVectors, and termOffsets
> set to "true".
>
> It all seems redundant just to allow for partial
atches/highlighting. I have setup another request handler that
> only searches the whole word fields and it returns in 850 ms with
> highlighting.
>
> Any ideas?
>
> - Andy
>
>
> -Original Message-
> From: Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic
My guess is that the problem is those 200M documents.
FastVectorHighlighter is fast at deciding whether a match, especially a
phrase, appears in a document, but it still starts out by walking the
entire list of term vectors, and ends by breaking the document into
candidate-snippet fragments, both p
I’m using Solr/Lucene 3.6 under Tomcat 6.
When shutting down an indexing server after much indexing activity,
occasionally, I see the following NullPointerException trace from Tomcat:
INFO: Stopping Coyote HTTP/1.1 on http-1800
Exception in thread "Lucene Merge Thread #1"
org.apache.lucene.i
estion.
>
> I haven't used VisualVM before but I am going to use it to see where CPU
> is
> going. I saw that CPU is overly used. I haven't seen so much CPU use in
> testing.
> Although I think GC is not a problem, splitting the jvm per shard would
be
> a good idea.
>
5 min is ridiculously long for a query that used to take 65ms. That ought
to be a great clue. The only two things I've seen that could cause that
are thrashing, or GC. Hard to see how it could be thrashing, given your
hardware, so I'd initially suspect GC.
Aim VisualVM at the JVM. It shows how muc
Regarding the large number of files, even after optimize, we found that
when rebuilding a large, experimental 1.7TB index on Solr 3.5, instead of
Solr 1.4.1, there were a ton of index files, thousands, in 3.5, when there
used to be just 10 (or 11?) segments worth (as expected with mergeFactor
set t
ieve it was a different exception, just brainstorming. (it was
a
> null reference iirc)
>
> Does a *:* query with no sorting work?
>
> Cody
>
> -Original Message-
> From: Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic.com]
> Sent: Thursday, June 21, 2012 1
indexed.
> I've had a problem with distributed not working when the uniqueKey field
> was indexed but not stored.
Was it the same exception I'm seeing?
-- Bryan
>
> -Original Message-
> From: Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic.com]
> Sent
> Hi Bryan,
>
> What is the fieldtype of the groupField? You can only group by field
> that is of type string as is described in the wiki:
> http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters
>
> When you group by another field type a http 400 should be returned
> instead if this error.
I am doing a search on three shards with identical schemas (I
double-checked!), using the group feature, and Solr/Lucene 3.5. Solr is
giving me back the exception listed at the bottom of this email:
Other information:
My schema uses the following field types: StrField, DateField,
TrieDateFiel
Apologies. I meant to type “1.4 TB” and somehow typed “1.4 GB.” Little
wonder that no one thought the question was interesting, or figured I must
be using Sneakernet to run my searches.
-- Bryan Loofbourrow
--
*From:* Bryan Loofbourrow [mailto:bloofbour
A couple of thoughts:
We wound up doing a bunch of tuning on the Java garbage collection.
However, the pattern we were seeing was periodic very extreme slowdowns,
because we were then using the default garbage collector, which blocks
when it has to do a major collection. This doesn't sound like yo
.
Thanks,
-- Bryan Loofbourrow
> > OK, I think see what you're up to. Might be pretty viable
> > for me as well.
> > Can you talk about anything in your mappings.txt files that
> > is an
> > important part of the solution?
>
> It is not important. I just copied it. Plus html strip char filter does
> not have mappings parameter.
> -Original Message-
> From: Ahmet Arslan [mailto:iori...@yahoo.com]
> Sent: Wednesday, June 08, 2011 11:56 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Displaying highlights in formatted HTML document
>
>
>
> --- On Thu, 6/9/11, Bryan Loofbourrow
Ludovic,
>> how do you index your html files ? I mean do you create fields for
different
parts of your document (for different stop words lists, stemming, etc) ?
with DIH or solrj or something else ? <<
We are sending them over http, and using Tika to strip the HTML, at
present.
We do not split
Here is my use case:
I have a large number of HTML documents, sizes in the 0.5K-50M range, most
around, say, 10M.
I want to be able to present the user with the formatted HTML document, with
the hits tagged, so that he may iterate through them, and see them in the
context of the document, wit
24 matches
Mail list logo