unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h..
If I recall correctly the highlighter also has an analyzer passed to
it. Ensure that this is the same one as well.
Matt
m.harig wrote:
Thanks erick ,
It works fine , if i use the (code snippet found from nabble) same
analyzer for both indexing & querying .
But the highlighter has gone f
hem.
Is there a way to handle this situation such that at index time I can turn
SAP.EM.FIN.AM into something that will be found with a query for "SAP EM
FIN AM"?
Thanks for any pointers
Donna
--
Matthew Hall
Software Engineer
Mouse Genome Informatics
mh...@informatics.jax.org
(207)
alyzer results could differ dramatically.
* then, if a query succeeds in matching one or more documents, open
this document and view its fields using "Reconstruct & edit",
especially the "Tokenized" version of the field. At this point any
potential mismatch in query term
emove the space from "about us" and search on that.
--
Ian.
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
--
Matthew Hall
Software Engi
-mail: java-user-h...@lucene.apache.org
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
--
Matthew Hall
Software Engineer
Mouse Genome Informatics
m
going to end up
doing here anyhow) When you do, just don't include stop word removal in
the processing of your token stream.
Matt
Phil Whelan wrote:
Hi Matthew / Paul,
On Thu, Jul 30, 2009 at 4:32 PM, Paul Cowan wrote:
Matthew Hall wrote:
Place a delimiter between the email addr
123 c...@bar.foo
Matt
Phil Whelan wrote:
On Thu, Jul 30, 2009 at 11:22 AM, Matthew Hall
wrote:
1. Sure, just have an analyzer that splits on all non letter characters.
2. Phrase queries keep the order intact. (And yes, the positional information
for the terms is kept, which is what allows
n the positional information of tokens in the
index? Knowing this will help me anwer question 1.
Thanks,
Phil
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.
Iasi County, Iasi 700049, Romania. Registered in Romania.
Registration number J40/12967/2005.
-----
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
Fo
t; should have been false, and
because I am using an AND)?
Like I said, maybe I've been staring at this too long, and need to do some more
structured testing :)...
Sorry.
Later,
Jim
Matthew Hall wrote:
You can choose to do either,
Having items in multiple fields allows you to
7;m talking about
(starting with the Lucene demo and demo webapp, and trying to be able to index and search more than
just the "contents" field), do I not need to use the MultiFieldQueryParser.parse() or do
what they call "create a synthentic content"?
Thanks,
Jim
Matthew
nal commands, e-mail: java-user-h...@lucene.apache.org
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---
Oh, also check to see which Analyzer the demo webapp/indexer is using.
Its entirely possible the analyzer that has been chosen isn't
lowercasing input, which could also cause you issues.
I'd be willing to bet your issue lies in one of these two problems I've
mentioned ^^
Mat
as "summary", "path", etc.?
Can anyone explain what else I need to do, esp. in the luceneweb web
app, to be able to search these other fields?
Thanks!
Jim
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.
ay at the bottom of your list score wise.
It might be worth investigating something like this, where you cut off
displaying documents that don't match a certain score thresh hold. Thus
cutting out the matches that you don't want (The term3 only ones)
--
Matthew Hall
nce issues caused by searching twice.
Is there a way to search on subset of documents and then combining the hits
for the document? For example, if Term 1 and Term 2 are found in Document1,
and Term3 is also later found in Document1, I want to be able
This was at least one of the threads that was bouncing around... I'm
fairly sure there were others as well.
Hopefully its worth the read to you ^^
http://www.opensubscriber.com/message/java-...@lucene.apache.org/11079539.html
Phil Whelan wrote:
On Wed, Jul 22, 2009 at 12:28 PM, Matthew
--
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
--
Matthew Hall
Software Engineer
Mouse Genome Informatics
mh...@informatics.jax.org
(207) 288-6012
better performance than having
a ton of simultaneous searches making HDFS seek all over the place.
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
--
Matthew Hall
Software Engineer
Mouse Genome Informatics
mh...@informatics.jax.org
(207) 288-6012
---
ng an Analyzer that already uses WhiteSpaceTokenizer... but you
likely are)
OBender wrote:
Hi All,
I need to make ? and ! characters to be a separate token e.g. to split [how
are you?] in to 4 tokens [how], [are], [you] and [?] what would be the best
way to do this?
Thanks
--
Matthew
They are upgrading our mail servers here, so if you are seeing.. many
MANY duplicates of things I posted.. I'm really sorry about that. T_T
Matt
--
Matthew Hall
Software Engineer
Mouse Genome Informatics
mh...@informatics.ja
search, with rules to treat all special
characters as single-character, optional wildcards. I'm concerned that the
performance of this will be disappointing, though.
Any help would be much appreciated. Thanks!
- Jes
--
Matthew Hall
Software Engineer
Mouse Genome Informatics
mh...@infor
ving.
Cheers
Mark
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
--
Matthew Hall
Software Engineer
Mouse Genome Informatics
mh...@informati
is
gets boosted even if same word matches in multiple fields (say obama is
present in title: and content: ).
Searching for solutions, I have not got any results which talk about similar
requirement... I guess I am not using right keywords
Thanks
Chandrakant K.
--
Matthew Hall
Software
Are there any problems?
> Thanks in advance
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail:
e
quotes with the name in query using lowercase analyzer?
thanks,
Vanshi
Matthew Hall-7 wrote:
Yeah, he's gotta be.
You might be better of using something like a lowercase analyzer here,
since punctuation in a name is likely important.
Matt
Sudarsan, Sithu D. wrote:
Do
ing%21-tp23735920p23735920.htm
l
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.ap
a:619)
this changes everytime one time it is: no segments* file found in
org.apache.lucene.store.ramdirect...@*1c2ec05*
the second it is no segments* file found in
org.apache.lucene.store.ramdirect...@*170b819*
On the standalone it works perfectly.
Marco Lazzara
2009/5/22 Matthew Hall
hum
to solve the problem..I've tried all rationals
things.Maybe the last thing is to try to index not with FSDirectory but with
something else.I have to peruse the api documentation.
But.IF IT WAS A LUCENE'S BUG???
2009/5/22 Matthew Hall
because that's the default index
;RDFIndexLucene/" but the folder doesn't exist,*Lucene
create an empty folder named "RDFIndexLucene"* in my home folder...WHY???
MARCO LAZZARA
2009/5/22 Matthew Hall
For writing indexes?
Well I guess it depends on what you want.. but I personally use this:
(2.3.2 API)
For writing indexes?
Well I guess it depends on what you want.. but I personally use this:
(2.3.2 API)
File INDEX_DIR = "/data/searchtool/thisismyindexdirectory"
Analyzer analyzer = new WhateverConcreteAnalyzerYouWant();
writer = new IndexWriter(/INDEX_DIR/, /analyzer/, true);
Your best bet w
ble to use 4GB though!
If there is any setting that will let us use 4GB do let me know.
Thanks,
Sithu D Sudarsan
-Original Message-
From: Matthew Hall [mailto:mh...@informatics.jax.org]
Sent: Friday, May 22, 2009 8:59 AM
To: java-user@lucene.apache.org
Subject: Re: Parsing large xml files
2g... should not be a maximum for any Jvm that I know of.
Assuming you are running a 32 bit Jvm you are actually able to address a
bit under 4G of memory, I've always used around 3.6G when trying to max
out a 32 bit jvm. Technically speaking it should be able to address 4g
under a 32 bit or,
Its been a few days, and we haven't heard back about this issue, can we
assume that you fixed it via using fully qualified paths then?
Matt
Ian Lea wrote:
Marco
You haven't answered Matt's question about where you are running it
from. Tomcat's default directory may well not be the same as y
Right, so again, you are opening your index by reference there. You
application has to assume that the index that its looking for exists in
the same directory as the application itself lives. Since you are
deploying this application as a deployable war file that's not going to
work really wel
Since everyone else seems to be trying to start these up I figured I
would poll the community and see if there is any interest in the greater
new england ares for a Lucene users group. Searching over on Google
leads me to believe that such a group doesn't currently exist, and I
think it would
Sorry, anyhow looking over this quickly here's a summarization of what I
see:
You have documents in your index that look like the following:
name which is indexed and stored.
synonyms which are indexed and stored
path, which is stored but not indexed
propin, which is stored and indexed
propinnu
Things that could help us immensely here.
Can you post your indexReader/Searcher initialization code from your
standalone app, as well as your webapp.
Could you further post your Analyzer Setup/Query Building code from both
apps.
Could you further post the document creation code used at ind
nds, e-mail: java-user-h...@lucene.apache.org
--
Matthew Hall
Software Engineer
Mouse Genome Informatics
mh...@informatics.jax.org
(207) 288-6012
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
--
Matthew Hall
Software Engineer
Mouse Genome Informatics
mh...@informatics.jax.org
(207) 288-6012
-
To unsubscribe, e-mail: java
t this effect
you will need to either change the snowball algorithm, or process your
words into a more base form before they go into the stemmed, which is a
hairy road indeed ^^
Hope this helps.
Matt
--
Matthew Hall
Software Engineer
Mouse Genome Informatics
mh...@informatics.jax.org
(207) 288-
:
welcome to download
http://www.ultraie.com/admin/flist.php
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
--
Matthew Hall
Software Engineer
Same here, sadly there isn't much call for Lucene user groups in Maine.
It would be nice though ^^
Matt
Amin Mohammed-Coleman wrote:
I would love to come but I'm afraid I'm stuck in rainy old England :(
Amin
On 18 Apr 2009, at 01:08, Bradford Stephens
wrote:
OK, we've got 3 people... t
Erm, I likely should have mentioned that this technique requires the use
of a MultiFieldQueryParser.
Matt
Matthew Hall wrote:
If you can build an analyzer that tokenizes the second field so that
it filters out the words you don't want, you can then take advantage
of more intelligent qu
sers mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@luce
.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>
>>>
>>>
>> --
>> 王巍巍(Weiwei Wang)
>> Department of Computer Science
>> Gulou Campus of Nanjing University
>> Nanjing, P.R.China, 210093
>
n I make an analyzer that ignore the numbers o
the texts like the stop words are ignored ??? For example that the terms :
3.8, 100, 4.15, 4,33 don't be added to the index.
How can I do that ???
Regards
Ariel
--
Matthew Hall
Software Engineer
Mouse Genome Informatics
mh...@informatics.jax.or
... erm.. I'm still not quite sure what you are talking about.
But what you are trying to do, really isn't that hard. Here's some
sample code that should get you to where you want to be:
During document creation time do something like this:
doc.add(new Field("data",
/da
Which analyzer are you using here? Depending on your choice the comma
separated values might be being kept together in your index, rather than
tokenized as you expected.
Secondly, you should get Luke, and take a look into your index, this
should give you a much better idea of what's going on
Perhaps this is a simple question, but looking at your stack trace, I'm
not seeing where it was set during the tomcat initialization, so here goes:
Are you setting up the jvm's heap size during your Tomcat initialization
somewhere?
If not, that very well could be part of your issue, as the st
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
--
Matthew Hall
Software Engineer
Mouse Genome Informatics
mh...@informatics.jax.org
(207) 288-6012
-
nal commands, e-mail: java-user-h...@lucene.apache.org
--
Matthew Hall
Software Engineer
Mouse Genome Informatics
mh...@informatics.jax.org
(207) 288-6012
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For
Do you NEED to be using 7 fields here?
Like Erick said, if you could give us an example of the types of data
you are trying to search against, it would be quite helpful.
Its possible that you might be able to say collapse your 7 fields down
to a single field, which would likely reduce the ove
If you are constrained in such a way as to not use the French Analyzer
you might instead consider transforming the input as an additional step
at both search/indexing time.
Use something like a regex that looks for é and always replaces it with
e in the index, and at search time. (expand this
query parser to make a phrase query.
Why not?
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
--
Matthew Hall
Software Engineer
Mouse Genome
Are you absolutely, 100% sure that the -2 token has actually made it
into your index?
As a VERY basic way to check this try something like this:
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.TermEnum;
public class IndexTerms {
public static void main(Stri
Which Analyzer have you assigned per field?
The PerFieldAnalyzerWrapper uses a default analyzer (the one you passed
during its construction), and then you assign specific analyzers to each
field that you want to have special treatment.
For example:
PerFieldAnalyzerWrapper aWrapper = n
Another thing you could consider is that rather than meshing all this
data into a single index, logically break out the data you need for
searching into one index, and the data you need for display into another
index.
This is the technique we use here and its been wildly successful for us,
as
We have a similar requirement here at our work.
In order to get around it we create two indexes, one of which
punctuation is relevant, and one in which all punctuation is treated as
a place to break tokens.
We then do a search against both indexes and merge the results, it seems
that such a
get Luke, it will help you
tremendously ^^
Matthew Hall wrote:
The reason the wildcard is being dropped is because you have wrapped
it in a phrase query. Wildcards are not supported in Pharse Queries.
At least not in any Analyzers that I'm aware of.
A really good tool to see the transformat
The reason the wildcard is being dropped is because you have wrapped it
in a phrase query. Wildcards are not supported in Pharse Queries. At
least not in any Analyzers that I'm aware of.
A really good tool to see the transformations that happen to a query is
Luke, open it up against your ind
to be in the query string,
you cannot make "ll" appear as a token in documentTokenStream.
Actually the Highlighter logic is a fair bit more involved than this
(especially when using SpanQueryScorer) but the basis of it is there in the
above pseudo code.
- Original Message
Well, you could certainly manipulate your search string, removing the
wildcard punctuations, and then use that for what you pass to the
highlighter.
That should give you the functionality you are looking for.
-Matt
mark harwood wrote:
Is this possible?
Not currently, the highlighter
JavaCC camp.
Steve
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Matthew Hall
Software Engineer
Mouse Genome Informatics
[EMAIL PROTECTED]
(207) 288-6012
-
To unsubsc
series of changes, etc.
Any advice/input/theories anyone can contribute would be greatly
appreciated.
Thanks,
-
John
--
Matthew Hall
Software Engineer
Mouse Genome Informatics
[EMAIL PROTECTED]
(207) 288-6012
-
To unsubscri
Erm.. if its not tokenized that's your problem.
You are setting up an Analyzer when indexing.. but then not actually
USING it.
Whereas when you are searching you are running your query through the
analyzer, which transforms your text in such a way that it no longer
matches against your untok
_1a5d.cfs _1a7n.cfs _1ahf.cfs _1ahh.cfs
_qzl.cfs segments.gen
_1993.cfs _1a0w.cfs _1a7c.cfs _1a9m.cfs _1ahg.cfs _1ahi.cfs
segments_158j
Aside from Luke (which requires a GUI), it is there a command line
utility that can check the integrity of the index?
Jamie
Matthew Hall wrote
Did you try to open the index using Luke?
Luke will be able to tell you whether or not the index is in fact
corrupted, but looking at your stack trace, it almost looks like the
file.. simply isn't there?
Matt
Jamie wrote:
Hi Everyone
I am getting the the following error when executing Hi
ow can be proved that ?
Is there any official document of apache lucene where says that ?
I hope somebody can help me.
Thanks.
Ariel
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
ormation in its explain method,
but the api call is currently eluding me.
Thanks,
Matt
--
Matthew Hall
Software Engineer
Mouse Genome Informatics
[EMAIL PROTECTED]
(207) 288-6012
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
ibed above. We have had to get a little creative
with other documents and fields in order for it work correctly. I'd be happy
to elaborate if anybody is interested. There may be better ways to do it.
Like I said I'm fairly new to Lucene. Was just trying to keep it simple.
--
Bill
---
rgy * 1194 Oak Valley
Drive * Ann Arbor, MI 48103
Tel 734-332-4405 * Fax 734-332-4440 * [EMAIL PROTECTED]
www.sungard.com/energy
-Original Message-
From: Matthew Hall [mailto:[EMAIL PROTECTED]
Sent: Friday, June 27, 2008 3:33 PM
To: java-user@lucene.apache.org
Subject: Re: Can you cre
27;
Then my query for all Properties would be:
+data:South
My query for only 'City' Properties would be:
+data:South +data_type:City
Is that right?
I think that would work. Very nice. Thank you very much
--
Bill Chesky * Sr. Software Developer * SunGard * FAME
esky * Sr. Software Developer * SunGard * FAME Energy * 1194
Oak Valley Drive * Ann Arbor, MI 48103
Tel 734-332-4405 * Fax 734-332-4440 * [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]
www.sungard.com/energy http://www.sungard.com/energy>
--
Grant Ingersoll
http://www.lucidimagination.com
Lucene Helpful Hints:
ve so ^^
Anyhow, best of luck!
Matt
renou oki wrote:
Thanks for the reply.
I will try to add an other data field.
I thought about this solution but i was not very sure. I thought that was an
easier solution to do that...
best regards
Renou
2008/6/26 Matthew Hall <[EMAIL PROTECTED]>:
You
y to do this, I mean to search an exact word in a stemmed
index
?
I suppose that I have to use the same analyzer for indexing and searching.
I try with a PhraseQuery, with quotes...
Ps : I use lucene 1.9.1
Thanks
Renald
--
Matthew Hall
Software Engineer
Mouse Genome Informatics
[EMAIL
there is need for fetching the hits.
Best.
-C.B.
On Thu, Jun 12, 2008 at 8:47 PM, Matthew Hall <[EMAIL PROTECTED]>
wrote:
I assume you want all of your queries to function in this way?
If so, you could just translate the * character into a ? at search time,
which should give you the f
I assume you want all of your queries to function in this way?
If so, you could just translate the * character into a ? at search time,
which should give you the functionality you are asking for.
Unless I'm missing something.
Matt
Cam Bazz wrote:
Hello,
Imagine I have the following documen
literal * character
here which I would have assumed would be a completely fine thing to do
in a search, but somehow its triggering the leading wildcard checking
logic.
Well, anyhow thanks much for the suggestion, things are working properly
now.
Matt
Karl Wettin wrote:
15 maj 2008 kl. 18.3
2008 kl. 18.33 skrev Matthew Hall:
12:23:05,602 INFO [STDOUT]
org.apache.lucene.queryParser.ParseException: Cannot parse '\*ache*':
'*' not allowed as first character in PrefixQuery
12:23:05,602 INFO [STDOUT] Failure in QS_MarkerSearch.searchMarkerNomen
1
Greetings,
I'm searching against a data set using lucene that contains searches
such as the following:
*ache*
*aChe*
etc and so forth, sadly this part of the dataset is imported via an
external client, so we have no real way of controlling how they format it.
Now, to make matters a bit mor
Does anyone know how to set the MaxClauseCount in luke?
I'm in a situation where I've had to override it when searching against
my indexes, but now I can't use luke to examine what's going on with my
queries anymore.
Any help would be appreciated.
Matt
--
Matthew Ha
This is more of a trying to understand the design sort of question, but
its still something I need to able to succinctly express to my project
manager.
I know that lucene is by design not allowing us to see which fields were
hit for a given document in an easy manner. Instead it presents us w
would be much easier to assemble
document, one line instead of three. Chaining rules.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Mat
I suspect you are using a different analyzer to highlight than you are
using to search.
A couple of things you can check:
Immediately after your query simply print out hits.length, this should
conclusively tell you that you query is in fact working, after that
ensure that you are using the sa
Fellows,
I'm working on a project here where we are trying to use our lucene
indexes to return concrete objects. One of the things we want to be
able to match by is by vocabulary terms annotated to that object, as
well as all of the child vocabulary terms of that annotated term.
So, what I
What you need is to set the allow leading wildcard flag.
qp.setAllowLeadingWildcard(true);
(where qp is a query parser instance)
That will let you do it, be warned however there is most definitely a
significant performance degradation associated with doing this.
Matt
[EMAIL PROTECTED] wrote
Also, ensure that you didn't inadvertently add an older version of your
Jar file somewhere in your classpath. Eclipse will take the first it
comes to, and skip any others found later on in the path.
Right Click on your Project -> Properties -> Java Build Path and ensure
you don't have an olde
90 matches
Mail list logo