16 jan 2008 kl. 00.33 skrev solr_user:
I did try the Lucene SpellChecker. Currently the lucene
SpellChecker does
not have the ability to suggest splitting of combined words. Is
there a
plan to add this capability to the Lucene SpellChecker any time soon?
Very few plans in this project,
On Jan 15, 2008 7:15 PM, Alexei Dets <[EMAIL PROTECTED]> wrote:
> Hi!
> I'm curious, is there any particular reason why Lucene offers
> IndexReader.deleteDocument(int docNum) but not
> IndexWriter.deleteDocument(int docNum)?
Document ids are transient and can change.
To figure out which ids you wa
Hi!
I'm curious, is there any particular reason why Lucene offers
IndexReader.deleteDocument(int docNum) but not
IndexWriter.deleteDocument(int docNum)?
Rather typical (I think) potential use case:
for (int i = 0; i < indexReader.maxDoc(); ++i) {
if (!indexReader.isDeleted(i)) {
Document do
I did try the Lucene SpellChecker. Currently the lucene SpellChecker does
not have the ability to suggest splitting of combined words. Is there a
plan to add this capability to the Lucene SpellChecker any time soon?
I also did not quite understand your idea of producing N-word shingles and
then
Erick Erickson wrote:
doc.add(
new Field(
"f",
"This is Some Mixed, case Junk($*%& With Ugly
SYmbols",
Field.Store.YES,
Field.Index.TOKENIZED));
pr
When I run through and delete a few documents from my index, is it
wise to call .flush() afterwards? Or is it better to close the index?
Thanks!
Michael
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-m
Have you tried the Lucene spellchecker first? I think it could be adapted to
do want, esp with the help of LUCENE-400 to produce N-word shingles (which you
can then index with the Spellchecker). I'm quite sure this could be done, in
fact, and would be a nice addition to Spellchecker in general
I don't have a list of common "combined word" queries. Splitting of words
seem to be quite a standard thing, most search engines and spell checkers
have this ability. It would be nice if Lucene provides this out of the box.
karl wettin-3 wrote:
>
>
> 14 jan 2008 kl. 19.47 skrev solr_user:
>
Dominique, look at LUCENE-400 issue in JIRA, that will help. It will be in
Lucene 2.4.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Dominique Béjean <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, January 15, 2008 12:13:4
Hi,
Does anybody know an implementation of Lucene in order to generate tag
clouds.
The idea is to index some documents in a temporary index in order to find
most frequent 1-term, 2-terms and 3-terms sequences.
Stop word list will eliminate common words.
Ideally, terms like driver, d
Yes, it works much better if you help Lucene to find the sort field type
with
new Sort(new SortField("pubdate", SortField.STRING, true))
Thank you
-Message d'origine-
De : Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Envoyé : mardi 15 janvier 2008 17:19
À : java-user@lucene.apache.org
Obje
Re indexing performance, you are not making use of various IndexWriter
parameters. My suggestion: wait another week, Lucene 2.3 will be out then.
Check IndexWriter javadocs for various knobs for improving indexing
performance. Actually, check the Wiki, there is a page about just that there.
Aha, I see, a Document represents a node and all other nodes connected to it.
So you can really only find 2 nodes connected with an edge with a single query,
but not, say, the number of edges (degrees?) between any 2 nodes?
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
Tobias,
The question is a little too open, I think. Perhaps start by saying what
you've tried, what doesn't work, what you think won't work, the actual rate of
change, the size of your index and, very importantly, how quickly you need to
see index changes (adds, deletes, updates).
How about t
Aron,
I believe we now have class ExtendedFieldCacheImpl extends FieldCacheImpl
implements ExtendedFieldCache
And this should support sorting by longs.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Aron Sogor <[EMAIL PROTECTED]>
To: java-user
well lets say I have a list representation of a graph like
src:1 dst:2
src:2 dst:3
src:1 dst 3
outgoingEdgesOf(1) returns 2 and 3.
incomingEdgesOf(3) returns 1 and 2.
in a lucene index it does work out nice with term queries. I can search for
incoming outgoing or edgeExist with a boolean term qu
I have a more architectural question, which is maybe sort of off topic, but as
I want to implement it using Java and Lucene, it's the right forum however:
I'm thinking of an approach to design a system that integrates dynamic
information into a search (and a ranking) functionality using Lucene.
Hi,
If some mispellings are very common, you could also turn them into synonyms. I
have not tried finding any information about this, but I *think* Google may be
doing that. I run a social service called Simpy at simpy.com and have Google
Alerts for "simpy", but those alerts often contain matc
Hi,
- Original Message
From: Cam Bazz <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, January 15, 2008 8:50:07 AM
Subject: Re: lucene as a graph store
Usually for implementing things like page rank, or doing centrality
metric
calculations or maybe dijkstras shortest
Dominique,
I don't have the javadoc/source in front of me, but souldn't that be new
Sort(new SortField(.)) ? I'm not sure if the underlying sort
implementation is smart enough to avoid re-doing the same work when you call
these constructors for *every* every search. If it's not smart enoug
Hi Sergey,
On 01/15/2008 at 9:57 AM, Sergey Kabashnyuk wrote:
> Hi all.
> I try to build mavan artifacts using from tags/lucene_2_2_0.
> By calling "ant generate-maven-artifacts"
>
> But BUILD FAILED
> /java/src/lucene/svn/java/tags/lucene_2_2_0/build.xml:366: The following
> error occurred while
Hello,
I have been running some experiments on lucene. To speed up index time, I
have disabled autocommit,
and I flush the indexwriter each 512 objects. So far I have tried with
256,512,1024,and 2048 and I have seen a really incredible speed difference
indexing.
However, if I the time required to
Hi,
This option is new in the soon-to-be-released 2.3 version of Lucene
(not present in 2.2.0).
Mike
Cam Bazz wrote:
Hello;
Has the IndexWriter.DISABLE_AUTO_FLUSH been depreceated?
I am using lucene core 2.2.0 and although it is in the
documentation I can
not access IndexWriter.DISABL
Hi,
I need to sort my search results by descending publication date.
To do this, I added a field like this in all documents
doc.add(new Field("pubdate", date, Field.Store.YES,
Field.Index.UN_TOKENIZED));
Where date contains string formatted in this way mmddhhmmss
Searche
Hi all.
I try to build mavan artifacts using from tags/lucene_2_2_0.
By calling "ant generate-maven-artifacts"
But BUILD FAILED
/java/src/lucene/svn/java/tags/lucene_2_2_0/build.xml:366: The following
error occurred while executing this line:
/java/src/lucene/svn/java/tags/lucene_2_2_0/common-bui
You really have to tell us more about what you're trying to do
to get a meaningful reply.
What do you mean you create the index on a table? Are you
using some sort of embedded SQL to query the table then
creat a lucene index? How big is the index? What search
are you submitting? What does your sea
Hello;
Has the IndexWriter.DISABLE_AUTO_FLUSH been depreceated?
I am using lucene core 2.2.0 and although it is in the documentation I can
not access IndexWriter.DISABLE_AUTO_FLUSH
Best,
C.B.
Usually for implementing things like page rank, or doing centrality metric
calculations or maybe dijkstras shortest term, this kind of (list of edges)
graph is not best at performance.
I like to use lucene for simple operations like neighboors of this node, or
2 degree neighboors of this node.
is
14 jan 2008 kl. 19.47 skrev solr_user:
Does Lucene spell checker have the ability to suggest splitting of
combined
words. So for e.g. if I have got the word "apple" and "computer" in
my
index and if I type "applecomputer" then how can I make it suggest
"apple computer"
It would probably
Lucky guy who gets the same problem.
Found the issue:
http://issues.apache.org/jira/browse/LUCENE-463
Lucene see numbrs in the field and thinks it is an int... than overflows
the int.
Force the sort field to be a SortField.String.
Aron Sogor wrote:
Let me qualify my question:
Sort is not wor
I guess the question comes down to what kind of things are you going
to do w/ this graph? How often are you updating links, etc? I can't
say Lucene was designed for this kind of thing, but I am constantly
amazed at what people use Lucene for, so I won't say it can't be
done. I don't know
> But it also seems that the parallel/not parallel decision is
> something you control on the back end, so I'm not sure the user
> is involved in the merge question at all. In other words, you could
> easily split the indexing task up amongst several machines and/or
> processes and combine all the
15 jan 2008 kl. 13.17 skrev Cam Bazz:
Typically, when number of objects in BTree based structure in an
oodbms for
example increase, the search and add times also increase.
Will lucene have the same problem and how can I overcome it if it
does.
There is a benchmark package in the contri
15 jan 2008 kl. 07.02 skrev rakeshxp:
Hello Everyone,
Hi Rakesh,
is there any way in which I can dynamically add
records to the spell checker ? ( Reindexing everytime is a big
overkill )
Start by getting the source code if you don't have it. It should not
be a big deal, but it might t
Hello;
I like to use lucene as a graph store. The graph representation is a list of
edges. Consider the code below:
final int commitCount = 16 * 1024;
final int numObj = 1024 * 1024;
Analyzer analyzer = new KeywordAnalyzer();
FSDirectory directory = FSDirectory.g
Selon Chris Hostetter <[EMAIL PROTECTED]>:
>
> : Trying config file at path /var/www/.lsearch.conf
> : Trying config file at path /usr/local/search/ls2/lsearch.conf
> : 0[main] INFO org.wikimedia.lsearch.util.UnicodeDecomposer - Loaded
> unicode
> : decomposer
> : java.rmi.ConnectIOException:
Wildcard "terms" get expanded by the rewrite() method on WildcardQuery
to Term instances during processing. Thus, you would have to TermEnum
that the WildCardQuery uses in order to get the individual terms
first, then you could get the term positions.
-Grant
On Jan 15, 2008, at 3:39 AM, T
Hi,ALL
Playing with an algorithm(Summarize/Highlight Based on Slide Windows),
i find that IndexerReader.termPositions(Term term) not support
wildcard term. Is it meaningful or not to write a patch to support
wildcard term?
-
To u
On Mon, 2008-01-14 at 10:58 -0500, Alex Wang wrote:
> Toke, you mentioned "Using a Collator works but does take a fair amount
> of memory", can you please elaborate a little more on that. Thanks.
We have an index with 10 million records that takes up 37GB. Practically
all records have a title, whi
39 matches
Mail list logo