Hi,
The package org.apache.lucene.index.memory belongs to a contrib jar. Try to
add lucene-memory-.jar to your classpath.
Regards,
Adriano Crestani
On Thu, Jul 16, 2009 at 9:23 PM, prashant ullegaddi <
prashullega...@gmail.com> wrote:
> Hi
>
> I'm unable to find this class in lucene-core-2.4.1.
Hi
I'm unable to find this class in lucene-core-2.4.1.jar. Is there other jar
file I need to
download to get this?
Regards,
Prashant.
OK, I'm feeling old today. But do any of you kids out there have any
idea how miraculous this thread is? In "the bad old days", or "when
I was your age", getting to the bottom of a problem like this would
have involved on-sited consultants at $150/hour and about 6 months.
Assuming that the product
The first thing I'd do is get a copy of Luke and look in my index
to see exactly what's there. Nothing in your e-mails indicates that you
*should* get any hits. Although I admin not getting jakarta lucene in
50M pages seems unlikely.
But Ian's suggestion that you start with a smaller index is spot
Well, if the .net port mimics the java library, look at the Analyzer class.
There
you'll see a bunch of different language analyzers. Also, look in the
contrib
section for others. The trick is that you must know what language you're
using. Indexing multiple languages in a single index is difficult.
: The same here, even with trunk from yesterday. If you create a field, it
: stays there forever, even after deleting *all* documents from index,
: reindexing without the field and optimizing.
Uwe: if you have a quick test case already written can you try it against
2.4 (and maybe 2.3) because i
The same here, even with trunk from yesterday. If you create a field, it
stays there forever, even after deleting *all* documents from index,
reindexing without the field and optimizing.
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
>
: After deleting documents from the index it can happen that fields become
: unused (i.e. no document has this field anymore). And
: IndexReader.getFieldNames() still returns these unused fields, even
: after optimizing the index. Is there any chance to get rid of these
: unused fields?
that's od
> > How do you handle stop words in phrase queries?
ok, good point! You found another item for list of BADs... but not for me as we
do not use phrase Qs to be honest, I do not even know how they are
implemented... but no, there are no positions in such cache...
well, they remain slowe
> caching them (as OpenBitSet)
How do you handle stop words in phrase queries?
On Thu, Jul 16, 2009 at 11:30 AM, eks dev wrote:
>
> Sure, If you have enough memory to do postings caching, with or without P4...
> I see P4 as a generally faster postings format, with stopwords or not.
>
> I wouldn'
Sure, If you have enough memory to do postings caching, with or without P4... I
see P4 as a generally faster postings format, with stopwords or not.
I wouldn't blow Term dictionary, that just moves the problem to another place.
What I am thinking of is quite simple, probably not the most elegan
Another approach could be splitting the text into chars and returning each
char as a token(in a custom analyzer).
For ex: for the document [some text]
Tokens would be [s] [o] [m] [e] [t] [e] [x] [t] and searches such as
[ome] or [ex] would get hits.
Sample code written in C# is below:
http
Do we think that we'll be able to support indexing stop words
using PFOR (with relaxation on the compression to gain
performance?) Today it seems like the best approach to indexing
stop words is to use shingles? However this blows up the term
dict because shingles concatenates phrases together.
On
I figured "c++." would be a problem. Here's what I did to get around it:
value.toLowerCase().replaceAll("\\.( ?\t?\n?\r?)+", " ")
I'm not escaping +'s from the query so I should be good there.
thanks alot.
Sincerely,
Chris Salem
Development Team
Main Sequence Technologies, Inc.
PCRecruiter.net -
If you escape the character + or #, the sentence:
"I know java + c++" would not skip +, furthermore, it breaks query parsing,
where + is reserved.
-John
On Thu, Jul 16, 2009 at 9:04 AM, John Wang wrote:
> This runs into problems when you have such following sentence:
> "I dislike c++."
>
> If y
This runs into problems when you have such following sentence:
"I dislike c++."
If you use WSA, then last token is "c++.", not "c++", the query would not
find this document.
-John
On Thu, Jul 16, 2009 at 8:29 AM, Chris Salem wrote:
> That seems to be working. you don't have to escape the plus
That seems to be working. you don't have to escape the pluses though. Also,
it appears that the WhitespaceAnalyzer is case sensitive, but I guess I could
lowercase everything that gets indexed.
thanks alot for your help.
Sincerely,
Chris Salem
Development Team
Main Sequence Technologies, In
We did it for us, gave something back to community... all happy... open source
works just fine here in lucene land :)
Re, 10%
I did not expect that much, but our index is quite dense, a lot of documents
and not too many unique terms, omitTf ... so it is really hard pressure on
DocIDSetIterato
Try WhitespaceAnalyzer for both indexing and searching.
On search-time you may also need to escape "+", "(", ")" with "\".
"#" shouldn't need escaping.
On Thu, Jul 16, 2009 at 17:23, Chris Salem wrote:
> I'm using the StandardAnalyzer for both searching and indexing.
> Here's the code to parse the
Super, thanks for testing!
And, the 10% speedup overall is good progress...
Mike
On Thu, Jul 16, 2009 at 9:16 AM, eks dev wrote:
>
> and one final touch, 4X slow down does not exist with new Lucene...
> I did not verify it again on the old one, but hey, who cares. Trunk is clean
> and, at least
I'm using the StandardAnalyzer for both searching and indexing.
Here's the code to parse the query:
Searcher searcher = new IndexSearcher(reader);
Analyzer analyzer = new StandardAnalyzer(stopwords);
System.out.println(queryString);
QueryParser qp = new QueryParser(searchField,analyzer);
Query quer
No, but I recall some discussion to move it up out of Analysis into a
more generally useful place, as it can be appropriate for autosuggest
and other things.
On Jul 14, 2009, at 7:27 PM, Jason Rutherglen wrote:
Just wondering if it works and if it's a good fit for autosuggest?
They are upgrading our mail servers here, so if you are seeing.. many
MANY duplicates of things I posted.. I'm really sorry about that. T_T
Matt
--
Matthew Hall
Software Engineer
Mouse Genome Informatics
mh...@informatics.jax.org
Assuming your dataset isn't incredibly large, I think you could.. cheat
here, and optimize your data for searching.
Am I correct in assuming that BC, should also match on ABCD?
If so, then yes your current thoughts on the problems that you face are
correct, and everything you do will be turnin
take a look at WordDelimiterFilter from Solr [you can use it in your
lucene app too]
On Thu, Jul 16, 2009 at 9:04 AM, JesL wrote:
>
> Hello,
> Are there any suggestions / best practices for using Lucene for searching
> non-linguistic text? What I mean by non-linguistic is that it's not English
>
Hi Jes,Good to see you here. You could try something like an n'gram
analyzer. You'd have to explore, though 'm assuming it'd be helpful for
you.
--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com
The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to d
and one final touch, 4X slow down does not exist with new Lucene...
I did not verify it again on the old one, but hey, who cares. Trunk is clean
and, at least so far, our favourite QA team has nothing to complain about ...
They will keep it under stress for a while... so if somethings comes up
Hello,
Are there any suggestions / best practices for using Lucene for searching
non-linguistic text? What I mean by non-linguistic is that it's not English
or any other language, but rather product codes. This is presenting some
interesting challenges. Among them are the need for pretty lax wi
You might like to start with a smaller index ...
There are many suggestions in the "Why am I getting no hits /
incorrect hits?" of the Lucene FAQ at
http://wiki.apache.org/lucene-java/LuceneFAQ. Maybe if you work
through those you'll find the problem.
--
Ian.
On Thu, Jul 16, 2009 at 1:42 PM,
50 million HTML pages (part of clueweb09 dataset for TREC) were indexed
using Hadoop into 56 indexes. 56 indexes were merged into a single index.
Analyzer is the StandardAnalyzer.
On Thu, Jul 16, 2009 at 6:07 PM, Anshum wrote:
> Hi Prashant,
>
> What did you index? how did you index? what anal
ok new facts, less chaos :)
- LUCENE-1744 fixed it definitely; I have it confirmed
Also, we found another example of the Query that was stuck (t1 t2 t3)~2 ...
this is also fixed with LUCENE-1744
Re: "some queries are 4X slower than before". Was that a different issue?
(Because this issu
Hi Prashant,
What did you index? how did you index? what analyzer did you use? without
all of these, perhaps it'd be difficult to figure out the issue.
--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com
The facts expressed here belong to everybody, the opinions to me. The
distinction is yo
Sorry, subject should have been: Unable to do proximity search.
Also, how to do exact search in Lucene?
~
Prashant
On Thu, Jul 16, 2009 at 6:04 PM, prashant ullegaddi <
prashullega...@gmail.com> wrote:
> Hi,
>
> I tried searching:
> "Apache Jakarta"~10
>
> Nothing was returned. What might be w
Hi,
I tried searching:
"Apache Jakarta"~10
Nothing was returned. What might be wrong?
Regards,
Prashant.
On Thu, Jul 16, 2009 at 6:38 AM, eks dev wrote:
> and this String has exactly that form
> (x OR y OR z) OR (a OR b OR c),
> That is exactly how I construct the Query, have a look at brackets on this
> toString result .
Duh! OK, I had missed that your large query actually had 2 clauses at
the to
I am getting lost as well, maybe I managed to confuse myself and everybody else
here.
But all agree, it would be good to know why it works now
Re. Query rewriting.
This Query gets printed with
///
BooleanQuery q;
q.toString()
search(q, null, 200):
///
=> this is the Query that enters
Ok, thanks, I will try in the spring users mailing list
2009/7/16 Simon Willnauer
> I guess you will get much more help on the spring mailinglist than you
> will get from java-users.
> you problem is related to your configuration and not to lucene as far
> as I can tell.
>
> simon
>
> On Thu, Ju
I guess you will get much more help on the spring mailinglist than you
will get from java-users.
you problem is related to your configuration and not to lucene as far
as I can tell.
simon
On Thu, Jul 16, 2009 at 12:20 PM, Pablo Mosquera
Saenz wrote:
> Hi, I have downloaded the springmodule for lu
Hi, I have downloaded the springmodule for lucene, version 0.9 and tried to
test the sample
I have used the lucene core library 2.4.1
The first problem I found is that with the initial configuration
With SingleSearcherFactory, in the startup I have an error because ther
On Thu, Jul 16, 2009 at 5:21 AM, eks dev wrote:
> Trace taken on trunk version (with fixed Yonik's bug and LUCENE-1744 tha
> fixed the problem somehow)
Whoa, so LUCENE-1744 did in fact fix the problem? (I thought you had
accidentally failed to setAllowDocsOutOfOrder(true) and that made us
false
Trace taken on trunk version (with fixed Yonik's bug and LUCENE-1744 tha fixed
the problem somehow)
full trace is too big (3.5Mb for this list), therefore only beginning and end:
Query: +(((NAME:maria NAME:marae^0.25171682 NAME:marai^0.2365632
NAME:marao^0.2365632 NAME:marau^0.2365632 NAME:mar
Hi
Escaping should work. See
http://lucene.apache.org/java/2_4_1/queryparsersyntax.html and
QueryParser.escape(). And you need to be sure that your analyzer
isn't removing the plus signs and that you use the same analyzer for
indexing and searching.
Googling for something like "lucene escape"
42 matches
Mail list logo