Wow Karl, thank you so much for writing this up! It was a great help!
I have the ngram tokenizing working as you described. Searches are
very good!
In order to verify the hits are of high quality, I use the
Smith-Waterman algorithm. Other approximate string comparisons I
evaluated didn't work well
This usually means that your index was created using a newer version
of Lucene than is bundled with Luke. You will need to get the Luke
minimal jars (no Lucene) and use that along with the Lucene versions
you have.
On May 8, 2009, at 12:42 PM, Timon Roth wrote:
hello list
i am using luc
Which version of luke are you using?
Timon Roth wrote:
hello list
i am using lucene 2.9. when i try to open the index with luke i got an error:
unknown format version: -8
any hints?
-
To unsubscribe, e-mail: java-user-unsubs
hello list
i am using lucene 2.9. when i try to open the index with luke i got an error:
unknown format version: -8
any hints?
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: j
Thank you for the Replay, i have got it.
Kamal.
Original Message:
What does the searcher.explain() method say?
-Grant
On May 6, 2009, at 2:18 AM, Kamal Najib wrote:
> hi,
> thanks for the reply.see:
http://lucene.apache.org/java/2_4_1/api/index.html
> you will find there the Similarity have cr
You are correct, other vulnerabilities will of course be the Swap file,
which is much easier to dump than the memory contents, since it may
persist even when the process dies or the machine is turned off, and of
course a process dump or snapshot file.
In either case, those cracks would be on a sys
There will always be levels of where data will be insecurely available. Most
notably within the memory of an application once it's running. Unless you
want to go down the path of encrypting and decrypting each and every string.
At which point you loose dictionary functionality and well any useful
e
I might be missing something here, but why not just store the index on
a cryptographic virtual file system?
karl
8 maj 2009 kl. 19.09 skrev >:
Michael,
Thanks for the comments they are very insightful.
I hadn't thought about the Random Access issues until you brought it
up.
T
Michael,
Thanks for the comments they are very insightful.
I hadn't thought about the Random Access issues until you brought it up.
This makes the project a little tougher, but not impossible.
I was searching last night and there have been a couple of papers
written on the topic of Encrypt
8 maj 2009 kl. 13.13 skrev Nate:
Is it possible to get a count for how many terms a result matched?
Currently I think you can only do that by using Searcher.explain().
But that is not a very nice solution. A better solution is beeing
worked on and might be available in a few months or so.
Ngrams can be use for lots of stuff. In your case it has nothing to do
with spellchecking, it was the "until" vs. "'till" that made me think
of them as they would allow you to get at least partial matching of
the text. Also, ngrams gives you a bit of phrase functionallity.
Create the grams
On 5/8/2009 at 9:13 AM, Ian Lee wrote:
> I'm surprised that it matches either - don't you need ".*in" where .*
> means match any character zero or more times? See the javadoc for
> java.util.regex.Pattern, or for Jakarta Regexp if you are using that
> package.
>
> Unless you're an expert in regex
I'm surprised that it matches either - don't you need ".*in" where .*
means match any character zero or more times? See the javadoc for
java.util.regex.Pattern, or for Jakarta Regexp if you are using that
package.
Unless you're an expert in regexps it is probably worth playing with
them outside y
I don't understand your regex at all. Isn't it looking for in with any
*single* character in front and back? Given your example, I don't
see how you're getting anything back at all. Is this code you're
actually executing or just an example?
What does toString and/or Explain show? Think about getti
Ganesh wrote:
My opinion is Stemming process is to get the base word. Here it is not
doing so.
Unfortunately this is where your problem lies, stemming doesn't do this,
it breaks words that are almost lexically equivalent down into a similar
root word. thus cat = cats.
From the wiki: "*Stemm
Hi,
I am using RegexQuery for searching in a set of records wich are phrases of
several words each. My aim is to find any phrase that contains the given
group of letters (e.g. "in"). For that case, I am building the query with
the regular expression ".in.", so it should return all phrases with co
Hello all,
I am using Lucene 2.4.1 and Snowball Analyzer for my indexing.
I am facing some issues with stemming.
Raining stemmed to Rain
cats stemmed to cat
but
Harder is not stemmed to hard
Stronger is not stemmed to Strong.
Even Keyword and Standard analyzer does the same. My opinion is Stemm
Is it possible to get a count for how many terms a result matched?
Googling, it doesn't appear to be done easily. I tried it out by
breaking my query into words myself, then doing a search for each one
and keeping track of the results and counts. This way I know if 4 out
of 5 terms matched a docume
Thanks Erick
it solves
On Thu, May 7, 2009 at 8:13 PM, Erick Erickson wrote:
> You haven't forced the double quotes through to the parser. Try
> Query query = qp.parse("\"word1 word2\"");
>
> On Thu, May 7, 2009 at 11:14 AM, Seid Mohammed wrote:
>
>> I have set the slop for my search to be some
19 matches
Mail list logo