: i'll try to get some graphs commited and linked to from the javadocs that
: make it more clear how tweaking the settings affect the formula
http://svn.apache.org/viewvc?rev=1294920&view=rev
-Hoss
-
To unsubscribe, e-mail:
: A picture -- or more precisely a graph -- would be worth a 1000 words.
fair enough. I think the reason i never committed one initially was
because the formula in the javadocs was trivial to plot in gnuplot...
gnuplot> min=0
gnuplot> max=2
gnuplot> base=1.3
gnuplot> xoffset=10
gnuplot> set
Hi all,
I have a question. Is there a way to distinguish queries like 'hotel' and
'hotel restaurant', queries with overlapping patterns, effectively?
For example, if I want the search to return 'hotel' in the top 100 results
while 'hotel restaurant' results come after those of 'hotel', when I sear
> Wow, that was quick! Thanks!
The power of open source and coffee break, combined...
> I don't think we'll have too many terms per query term - as I said earlier,
> we're restricting the expansions to those with an edit distance of 1. But
> this looks cool anyway.
Shouldn't make much of a d
Wow, that was quick! Thanks!
I don't think we'll have too many terms per query term - as I said earlier,
we're restricting the expansions to those with an edit distance of 1. But this
looks cool anyway.
On 28 Feb 2012, at 16:01, Dawid Weiss wrote:
> The issue has a patch -- feel free to try
The issue has a patch -- feel free to try it out.
Dawid
On Tue, Feb 28, 2012 at 4:48 PM, Dawid Weiss wrote:
> I filed an issue for that.
> https://issues.apache.org/jira/browse/LUCENE-3832
>
> I'll try to port it myself actually. It shouldn't be a big problem.
>
> Dawid
>
> On Tue, Feb 28, 2012
Dear List,
I need for example to know the frequency of the phrase "phd finger
protein 6" - not only the niumber of document where this phrase appears.
With a simpleAnalyzer or an other, I must parse each hits, each
document, each position for each term and compute all these data, or is
there
I filed an issue for that.
https://issues.apache.org/jira/browse/LUCENE-3832
I'll try to port it myself actually. It shouldn't be a big problem.
Dawid
On Tue, Feb 28, 2012 at 2:31 PM, Michael McCandless
wrote:
> Neat :) It's like a FuzzyQuery w/ a custom (binary?) cost matrix for
> the insert/
> For steps 2 and 3 you shouldn't use FST at all. Instead, for 2) use
> BasicAutomata.makeString(String) on each of your expanded terms, then
> BasicOperations.union on all of those automata to make a single
How many input strings do you have? The API Mike mentioned in from a
port of the Brics li
>>
>> We're only allowing expansions within an edit distance of 1, which should
>> keep the numbers of terms down.
>
> Ahh, ok. So even if the term has two occurrences of cl, only one of
> them is allowed to substitute d?
Yes, exactly - "cloocl" will be expanded to "doocl" and "clood" only. I
On Tue, Feb 28, 2012 at 8:42 AM, Alan Woodward
wrote:
>
> On 28 Feb 2012, at 13:31, Michael McCandless wrote:
>
>> Neat :) It's like a FuzzyQuery w/ a custom (binary?) cost matrix for
>> the insert/delete/transposition changes...
>>
>> Is the number of edits smallish? Ie you're not concerned abo
On 28 Feb 2012, at 13:31, Michael McCandless wrote:
> Neat :) It's like a FuzzyQuery w/ a custom (binary?) cost matrix for
> the insert/delete/transposition changes...
>
> Is the number of edits smallish? Ie you're not concerned about
> combinatoric explosion of step 1?
We're only allowing ex
Neat :) It's like a FuzzyQuery w/ a custom (binary?) cost matrix for
the insert/delete/transposition changes...
Is the number of edits smallish? Ie you're not concerned about
combinatoric explosion of step 1?
For steps 2 and 3 you shouldn't use FST at all. Instead, for 2) use
BasicAutomata.mak
Hello,
I'm trying to create a Lucene Query that will take a term and expand it to
include common OCR errors (for example, 'cl' is often misread as 'd', so a
search for 'clog' should also hit 'dog'). My plan is to do this by generating
all the possible variants of a term, using an existing list
Thanks. I use this field for Rangequery and sort. I think it is best to use Int
to gain some heap.
Regards
Ganesh
- Original Message -
From: "Uwe Schindler"
To:
Sent: Tuesday, February 28, 2012 5:08 PM
Subject: [Bulk] RE: RE: Date time as String or Numeric field
> Hi,
>
> The long
Hi,
The long or int size mostly only affects the size of e.g. FieldCache during
sorting (which doubles its size). The term dictionary's size depends on the
number of unique terms and that does not really change by the data type. The
size of the values is of minor importance because how the data is
I tried NumericField with Integer value and Long value. There is no difference
in space and heap utilization. Will it be? Are both are same?
Regards
Ganesh
- Original Message -
From: "Uwe Schindler"
To:
Sent: Tuesday, February 28, 2012 3:52 PM
Subject: [Bulk] RE: Date time as String
Then I don't know. Something trivial like white space? What does
line.equals("Jesus Christ") say?
--
Ian.
On Mon, Feb 27, 2012 at 7:42 PM, Damerian wrote:
> Στις 27/2/2012 11:45 πμ, ο/η Ian Lea έγραψε:
>>
>> Does your analyzer look for a field called content, not contents?
>>
>>
>> --
>> Ian
Hi,
NumericField takes more space on disk and (possibly more heap because term
dictionary is larger), but is much faster on RANGE searches
(NumericRangeQuery). Depending on index size this can be hundreds of times
faster.
If you don't want to do numeric searches (like range from...to) but only
so
Hello all,
I was using DateTime as String and now i am using NumericField. Using
NumericField takes more heap and storage space then the earlier String version.
Is it good to move to NumericField or stick with String. I am using this field
for search and sort.
Regards
Ganesh
If I understand 'group search' correctly, you mean grouping search results
by some criteria?
The main difference between grouping search results to faceted search is
that when you group search results by some criteria, your request is
something like "give me the top 3 results from each movie categ
21 matches
Mail list logo