1. Payload per document which could be updated without a need to update the
entire document.
Usecase: The state of our indexed content will change based on the User action
(Created/ Viewed/Deleted etc) and we are using Lucene as our database and we
cannot use relational database only for this
Hi,
I want to change the Lucene's similarity in a way that I can add Fuzzy
memberships to the terms of a document. Thus, TF value of a term in one
document is not always 1, it can add 0.7 to the value of the TF ( (In my
application, each term is contained in a document at most once). This
memb
It might be useful to check out katta , from an infrastructure perspective.
On 2/24/10 3:54 PM, Andrew Bruno wrote:
Hello,
I am working with an application that offers its customers their own index,
primary two indexes for different needs per customer.
As our business is growing and growing,
Hello,
I am working with an application that offers its customers their own index,
primary two indexes for different needs per customer.
As our business is growing and growing, I now have a situation where the web
application has its customer's index on one volume, and its getting close to
1Tbyte
When I use a WildcardQuery with the highlighter, I don't get any fragments
back, I get null returned to strBetText. If I just use a term query then it
works.
TokenStream tokenStream = TokenSources.getTokenStream(indexReader, docId,
strFieldName);
QueryScorer scorer = new QueryScorer(query, s
The Seattle Hadoop/Scalability/NoSQL (yeah, we vary the title) meetup
is tonight! We're going to have a guest speaker from MongoDB :)
As always, it's at the University of Washington, Allen Computer
Science building, Room 303 at 6:45pm. You can find a map here:
http://www.washington.edu/home/maps/s
PhraseQuery appears to be working. Thanks to all.
Paul
-Original Message-
From: java-user-return-45155-paul.b.murdoch=saic@lucene.apache.org
[mailto:java-user-return-45155-paul.b.murdoch=saic@lucene.apache.org] On
Behalf Of Murdoch, Paul
Sent: Wednesday, February 24, 2010 5:0
Thanks,
I've been looking at that one too. I'm trying to make it happen with the
StandardAnalyzer. Unfortunately, I think I see some redesign for more
robustness in the future.
Cheers,
Paul
-Original Message-
From: java-user-return-45154-paul.b.murdoch=saic@lucene.apache.org
check out KeywordAnalyzer!
On Wed, Feb 24, 2010 at 4:51 PM, Murdoch, Paul wrote:
> It still happens if there are no stop words in the fieldValue. For
> instance if fieldValue was "paul murdoch", Luke would show the query as
> name:"paul murdoch" but no hits are returned. If I change to
> Field.I
It still fails even when there are no stop words. I'm going to try a
PhraseQuery instead of relying on the QueryParser.
Regards,
Paul
-Original Message-
From: java-user-return-45151-paul.b.murdoch=saic@lucene.apache.org
[mailto:java-user-return-45151-paul.b.murdoch=saic@lucene
It still happens if there are no stop words in the fieldValue. For
instance if fieldValue was "paul murdoch", Luke would show the query as
name:"paul murdoch" but no hits are returned. If I change to
Field.Index.ANALYZED it works. The problem with ANALYZED is that there
is a possibility of pickin
Since it is not analyzed, your text is stored as a single term in the index
[something in the index].
But the query
name:"something in the index"
is translated as :
find 4 consecutive terms which have values "something", "in","the" and
"index" respectively.
or if stop words are removed
On Wed, Feb 24, 2010 at 11:20 PM, Aaron Lav wrote:
> On Wed, Feb 24, 2010 at 10:18:27PM +0200, Avi Rosenschein wrote:
> > On Wed, Feb 24, 2010 at 3:42 PM, Grant Ingersoll >wrote:
> >
> > > What would it be?
> > >
> >
> > For scoring to take into account the non-analyzed token stream.
> >
> > Tha
What does Luke's explain show you? That'll show you a lot about how
the query gets transformed..
My first guess is that stop words are messing you up
Erick
On Wed, Feb 24, 2010 at 3:51 PM, Murdoch, Paul wrote:
> Hi,
>
>
>
> I'm indexing a field using the StandardAnalyzer 2.9.
>
>
>
> fi
I would wish a highlighting feature that's fully integrated.
paul
On 24-févr.-10, at 14:42, Grant Ingersoll wrote:
What would it be?
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands,
On Wed, Feb 24, 2010 at 10:18:27PM +0200, Avi Rosenschein wrote:
> On Wed, Feb 24, 2010 at 3:42 PM, Grant Ingersoll wrote:
>
> > What would it be?
> >
>
> For scoring to take into account the non-analyzed token stream.
>
> That is, if a field is analyzed (stemmed, lowercased, maybe even stop wor
Hi,
I'm indexing a field using the StandardAnalyzer 2.9.
field = new Field(fieldName, fieldValue, Field.Store.YES,
Field.Index.NOT_ANALYZED);
Let's say fieldName is "name" and fieldValue is "something in the
index". When I perform the query...
name:"something in the index"
...
On 2010/02/24 03:42 PM, Grant Ingersoll wrote:
What would it be?
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
Stop words counting when i
> What would it be?
An extended query parser syntax
(http://lucene.apache.org/java/2_9_1/queryparsersyntax.html) including
geo-location search.
For example:
hsin (great circle): name:Minneapolis
AND _val_:"recip(hsin(0.78, -1.6, lat_rad, lon_rad, 3963.205), 1, 1
On Wed, Feb 24, 2010 at 3:42 PM, Grant Ingersoll wrote:
> What would it be?
>
For scoring to take into account the non-analyzed token stream.
That is, if a field is analyzed (stemmed, lowercased, maybe even stop words
removed), that is fine for indexing. But tokens in the query matching the
orig
It sounds to me like you'll have to pre-process your text, then use
something
like KeywordAnalyzer. The idea here is to do something like lowercase the
strings (both index and query), and remove all non-letter (or whatever)
characters, normalize whitespace (e.g. remove leading and trailing, turn
al
A pluggable scoring model that can incorporate BM25, TF/IDF and other variants
of scoring.
-Original Message-
From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant Ingersoll
Sent: Wednesday, February 24, 2010 3:42 PM
To: java-user@lucene.apache.org
Subject: If you could have
On Wed, Feb 24, 2010 at 08:42:02AM -0500, Grant Ingersoll said:
> What would it be?
Adding, deleting and updating of individual fields in a document.
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For addit
>
> I tried the WhitespaceAnalyzer and liked the way the comma (among other
> punctuation) was preserved. I'm running tests with that right now.
> Unfortunately, if I want to look for "groupC" I have to append the comma
> which won't make sense to a user. Also the query choice:"groupC, night"
> d
I manually change all indexed and searched content to lowercase. The
whole groupC thing was just for the example...sorry. My main problem is
with the comma and whitespace. I would like to query for "night" and
only get the one hit. The only reason changing StandardAnalyzer "may"
:-) not be an o
OK, I'm confused. In your original message, you said that
changing analyzers is NOT an option. Then you said you'll
give WhitespaceAnalyzer a shot
Assuming your original constraint is accurate,
why isn't changing analyzers an option? Are you aware of
PerFieldAnalyzerWrapper which allows you to
Thanks for the input. I'll give the WhitespaceAnalyzer a shot. Also,
AFAIK, Field.Index.NOT_ANALYZED means that the content you index is not
split into separate tokens so it is searchable, but only for exact
matches. I may be able to get what I want with the WhitespaceAnalyzer
and Field.Index.NO
2 features: Search and serializeable Query class in java
serializable object format, or XML, or json format.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search
Personally punctuation matters in my queries so I use WhitespaceAnalyzer. I
also only want exact hits, so that analyzer works well for me.
Also, AFAIK you don't set NOT_ANALYZED if you want to search through it.
On Wed, Feb 24, 2010 at 10:33 AM, Murdoch, Paul wrote:
> I'm using Lucene 2.9. How
I'm using Lucene 2.9. How do I make a comma behave like a regular
character using the StandardAnalyzer? Example:
I have a field called "choice" and some field values:
groupA, morning
groupB, noon
groupC, night
morning
noon
night
So a query choice:night returns "groupC, night" an
What would it be?
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
Hi,
I'm using Lucene 3.0.0 and have large documents to search (logfiles
0,5-20MB). For better search results the query tokens are truncated left and
right. A search for "user" is made to "*user*". The performance of searching
even complex queries with more than one searchterm is quite good. But
h
halbtuerderschwarze wrote:
query.rewrite() didn't help, for queries like ipod* or *ipod I still didn't
get fragments.
Arne
You're right. This is still an open issue:
https://issues.apache.org/jira/browse/LUCENE-1889
Koji
--
http://www.rondhuit.com/en/
--
query.rewrite() didn't help, for queries like ipod* or *ipod I still didn't
get fragments.
Arne
chrislusf wrote:
>
> This should be a common wildcard query highlighting problem.
> You will need to query.rewrite() first, and pass the result to the
> highlighter.
>
> --
> Chris Lu
> -
34 matches
Mail list logo