Hi Chris,
Answering your question depends in part whether your kind of scalability
is dependent on sharding (your index size is expected to grow to very large)
or just replication (your query load is large, and you need failover). It
sounds like you're mostly thinking about the latter.
1) Each
Have you investigated using Terracotta / Compass? We need real-time updates
across the index using multiple web servers. I recently got this up and
running and we're going to be doing some performance testing. It's very
easy, essentially you just replace your FSDirectoryProvider with a
Terracott
There are many uses for shingles.
I've used them to find common phrases in text, which is my
understanding of what you try to achieve. It works rather well, is a
very simple solution and easy on resources compared to real semantic
analysis.
You'll be getting a lot of shingles such as "the
Hi Erick,
If you want to query, you should know the "phase" right? but I want to
discover the phase, or which words came together so often and by the natural
way, we use that as a phase.
On Tue, Oct 6, 2009 at 8:12 PM, Erick Erickson wrote:
> Maybe I'm missing the problem entirely, but can you
Hi Karl,
I think shingle is designed to make the phase search faster, it'll generate
a lot of "seemed like" phase by pos only and completely disregard the
meaning, that's not good enough.
Regards,
Andrew
On Tue, Oct 6, 2009 at 11:51 PM, Karl Wettin wrote:
> Hi Andrew,
>
> I think you are looki
Right, Vasu, I think NLP is good, I should take some time to look at that.
Thanks.
On Tue, Oct 6, 2009 at 8:10 PM, Vasudevan Comandur wrote:
> Hi,
>
> Take the NLP route and use modules like POS tagger and NP chunker.
>
> OpenNLP has a stack for English language. Try to use them.
>
> Regards
Chris,
It sounds like you're on the right track. Have you looked at
Solr which uses the rsync/Java replication method you mentioned?
Replication and near realtime in Solr aren't quite there yet,
however it wouldn't be too hard to add it.
-J
On Tue, Oct 6, 2009 at 3:57 PM, Chris Were wrote:
> Hi
Yes, I'm injecting the service now and it works fine. My head is not
completely around struts2 yet but there would seem to be considerable
advantage to the interceptor/plug-in approach, not the least of which is
you wouldn't have to write an action class each time you need to drop
search resu
Hi,
I've been using lucene for a project and it works great on the one dev.
machine. Next step is to investigate the best method of deploying lucene so
that multiple web servers can access the lucene directory of indexes.
I see four potential options:
1) Each web server indexes the content separa
Hi,
a call to IndexSearcher.doc(docId) will load the document. Internally
this call forwards to IndexReader.document(docId) which could be very
expensive because this method will load all stored document fields.
I would recommend to have a look at IndexSearcher.doc(docId,
FieldSelector). This meth
Michael,
this sounds like a pretty good usecase for CustomScoreQuery
(http://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/search/function/CustomScoreQuery.html)
The org.apache.lucene.search.function package provides flexible
programmatic control over document scores. You boost up documen
Just as you can add a query that will boost better things with a
higher quality, you can add a query for a higher revenue.
Basically, the default operator "should" in boolean-clauses can be
used exactly for that: do not force this query to be matched but raise
boost if there's something tha
My initial description may have been a little abstract. Maybe I should
explain exactly what I'm trying to do. My company has various revenue
channels, one of which is per click. If a user does a search, we would
like to show results with the greatest revenue, although we don't want
people to be abl
: I figured it might be less expensive if search() (I have extended
: IndexSearcher) were to check that the underlying IndexReader is still
if you're extending IndexSearcher anyway you can override the close()
method to update a boolean and then add your own isClosed() method.
: open - and re
> I'm porting one of my Struts1 Lucene search apps to Struts2. The basics are
> working but I need to remove the Lucene search service out of the action
> classes. I'm ready to write an interceptor but can perhaps also see using a
> plug-in like is done with Tiles. As I'm a Struts2 newbie
6 okt 2009 kl. 18.54 skrev David Causse:
David, your timing couldn't be better. Just the other day I proposed
that we deprecate InstantiatedIndexWriter. The sum of the reasons to
this is that I'm a bit lazy. Your mail makes me reconsider.
https://issues.apache.org/jira/browse/LUCENE-1948
Hi,
Karl prefer to answer on the ml so here is some informations he asked on
how we use InstantiatedIndex.
- Forwarded message from David Causse -
Date: Tue, 6 Oct 2009 15:45:57 +0200
From: David Causse
To: Karl Wettin
Subject: Re: InstatiatedIndex questions
Hi,
sorry for the delay.
Hi Andrew,
I think you are looking for the shingle package in contrib/analyzers.
karl
6 okt 2009 kl. 13.42 skrev Andrew Zhang:
Hi guys,
The requirement is very simple here, e.g. for this sentence, 'The NBA
formally announced its new *social media* guidelines Wednesday', I
want to
t
I'm porting one of my Struts1 Lucene search apps to Struts2. The
basics are working but I need to remove the Lucene search service out of
the action classes. I'm ready to write an interceptor but can perhaps
also see using a plug-in like is done with Tiles. As I'm a Struts2
newbie, any ti
In my Application currently I am indexing Object with One Field[ID] to Hold
ID of the Object which is stored and attributes of Object into Another Field
[Content] to hold attribute information seperated by space and this Field is
tokenized. When I search for information related to the Object I get
Hi,
Which of the following method actually loads the document from disk?
(1) Document document = searcher.doc (docId);
OR
(2) string value = document.get ("FirstNameField");
It's probably searcher.doc but I just want to be sure. Thank you.
___
Maybe I'm missing the problem entirely, but can you use phrase queries?or
one of the Span* queries with a slop of 0 when searching?
Best
Erick
On Tue, Oct 6, 2009 at 7:42 AM, Andrew Zhang wrote:
> Hi guys,
>
> The requirement is very simple here, e.g. for this sentence, 'The NBA
> formally anno
Hi,
Take the NLP route and use modules like POS tagger and NP chunker.
OpenNLP has a stack for English language. Try to use them.
Regards
Vasu
On Tue, Oct 6, 2009 at 5:12 PM, Andrew Zhang wrote:
> Hi guys,
>
> The requirement is very simple here, e.g. for this sentence, 'The NBA
> form
Why do you care? That is, what is the problem you want to solve with a
reversestemmer? Note that if you STORE the field, the *original* text is
available, storing
and indexing are orthogonal. So if all you want is to get the original text
back,
you can freely index with a stemming analyzer, but jus
Hi guys,
The requirement is very simple here, e.g. for this sentence, 'The NBA
formally announced its new *social media* guidelines Wednesday', I want to
treat '*social media*' as a whole phase term. The default english analyzers
came with lucene all deal with single word, so it you want to get t
Hello,
I've been using Lucene in a very basic way for some time now, and I'm
starting to take advantage of some of the linguistic capabilities only
now.
I am making use of the snowball analyzer for stemming, and it works
very well.
Question: is there any such thing as a "reverse stemm
26 matches
Mail list logo