t; in the past but seems it is specific to geo points? The use case is to
>>>>> index image feature vectors to search for similar images in a corpus.
>>>>>
>>>>> Currently we are using lucene to text search and we would like to not
>>>>> have to manage two different index structures, synchronize commits, so
>>> on.
>>>>>
>>>>> Thank you,
>>>>> Luis Nassif
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra
is there a better way to handle this? I’m particularly curious about
splicing this into something like Solr.
Thanks,
— Ken
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
t; http://numere.stela.org.br
>>>
>>> -----
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>
>>&
e but it
doesn't seem
to offer what I need either.
Thanks for any hints!!!
- Mike
aka...@gmail.com
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
--
s message in context:
http://www.nabble.com/Any-Tokenizator-friendly-to-C%2B%2B%2C-C-%2C-.NET%2C-etc---tp25063175p25063964.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------
To unsubscribe, e-mail: java-user-unsubs
on logic (perhaps kind
of like a database's query optimizer) at work here that makes the I/O and
RAM requirements more difficult to model from the query? (Remember that
we're not doing any sorting.)
I'm hoping that with some of this knowledge, I'll be able to better model
the RAM
e Katta has added an index to both systems, then
you can switch to it (and eventually remove the old index).
The fact that you'd need two Katta "masters" makes things a bit more
interesting, as you'd have to coordinate when they both decide to
switch to using the new index(es).
buted search support inside of Nutch.
And Solr has distributed search support, though it's still pretty new.
-- Ken
--
Ken Krugler
+1 530-210-6378
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional c
;t use the typical approach of having a doc
field with every group in it, then adding a required subclause to
your query with every group as a boolean OR term.
-- Ken
--
Ken Krugler
+1 530-210-6378
-
To unsubscribe, e-mail: java-user
---------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
--
Ken Krugler
+1 530-210-6378
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
h on subject ==
"alternative scoring algorithm for PhraseQuery".
I believe Paul Elschot gave him some useful input, but then Philipp
seemed to have dropped off the list...and he didn't respond to my
email asking him if he was able to co
essentially synonym
processing, where you turn a single term into multiple terms based on
the automatic splitting of the term using '_', '-', camelCasing,
letter/digit transitions, etc.
-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
here helped you finish your
FuzzyPhraseQuery (or FuzzySpanQuery) addition to Lucene.
Thanks,
-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"If you can't find it, you can't fix it"
-
To unsubscribe, e-ma
]
Nutch already supports distributed Lucene searchers, using Hadoop RPC.
-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"If you can't find it, you can't fix it"
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For
like below would
work
maybe this is a silly question but why not create a title field and a
description field and
boost them separately?
Donna L. Gresh
Services Research, Mathematical Sciences Department
IBM T.J. Watson Research Center
(914) 945-2472
http://www.research.ibm.com/people/g/donna
s a little slow).
---------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"
--
/search/lucene/query/DateIntervalQuery.java
-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
I don't
think it's a bad index. After seeing a few postings about this same
general problem, I'm guessing there's a bug hiding someplace.
Sorry to not have a better answer...
-- Ken
--
Ken Krugler
Krugle, Inc.
+1 53
g required
to pick the right cut-off value for searches.
Thanks,
-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
ill need a big sum though. MD5?
Just as a reference, Nutch uses an MD5 digest to detect duplicate web
pages. It works fine, except of course when two docs differ by only
an insignificant text delta. There's some recent work in this area -
check out TextProfileSignature.
-- Ken
--
Ken K
On Donnerstag 18 Mai 2006 18:36, Ken Krugler wrote:
> >Could someone describe how the results from multiple indices are merged
> when using a MultiSearcher? My naive intuition is that the scores for
> documents found in each index could be wildly different, so what
> crit
selection of indices that get merged to
form the N final indices. This randomization helps avoid the IDF skew
problem.
There's an Jira issue on the Nutch side (see NUTCH-92) around this
same problem.
-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Fi
t scoring algorithm.
You can always add the log of the score versus doing a
multiplication, but that would still involve a lot of source code
changes.
-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"
the Lucene code RAMDirectory.java
i see an int cast of the index file size, meaning there is a 2GB limit
did i miss something?
has anyone loaded more then a single 2GB index into RAM ??
> thanks,
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"
project
files - and I don't put them into the Eclipse Workspace directory.
b. Then launch Eclipse and create a new Java project, importing the
files from the external (SVN-controlled) location.
-- Ken
--
Ken Krugler
Krugle,
";
open (my $virtual_filehandle, "+<:utf8", \$data);
print <$virtual_filehandle>;
--
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-470-9200
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
are tokenizers already built for lucene.
Search the archives for a discussion about this,
back in June I believe. I'd suggested using ICU
to generate sort keys, and indexing those.
-- Ken
--
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 5
M product(s) to get it) so what you've done is great
for the open source community - thanks!
Also I could post to the Unicode list re training data in multiple
languages, as that's a good place to find out about multilingual
corpora.
-- Ken
--
Ken Krugler
TransPac Software, Inc.
; or "21MAGAB". Is the best way to accomplish this by
creating synonyms for the 3 different ways when punctuation is in parts
to search for? I know I can stop punctuation in the index but what about
grouping the information together or with spaces?
Thanks all in advance,
Tom
in
a Java implementation, so this shouldn't be all that hard. See
<http://www-306.ibm.com/software/globalization/topics/thaiusabilities/text.jsp>
-- Ken
--
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
t; "eni" "niz" "ize" "zed".
That would help you find *foo*, but not *ha*.
-- Ken
--
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-470-9200
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
31 matches
Mail list logo