Re: N-dimensional Point Indexing

2018-10-17 Thread Ken Krugler
t; in the past but seems it is specific to geo points? The use case is to >>>>> index image feature vectors to search for similar images in a corpus. >>>>> >>>>> Currently we are using lucene to text search and we would like to not >>>>> have to manage two different index structures, synchronize commits, so >>> on. >>>>> >>>>> Thank you, >>>>> Luis Nassif -- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com Custom big data solutions & training Flink, Solr, Hadoop, Cascading & Cassandra

Best way to plug in alternative range query support

2016-05-19 Thread Ken Krugler
is there a better way to handle this? I’m particularly curious about splicing this into something like Solr. Thanks, — Ken -- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr

RE: in-memory terms dictionary/Lucene-3069

2015-01-26 Thread Ken Krugler
t; http://numere.stela.org.br >>> >>> ----- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>&

Re: Lucene Challenge - sum, count, avg, etc.

2010-03-31 Thread Ken Krugler
e but it doesn't seem to offer what I need either. Thanks for any hints!!! - Mike aka...@gmail.com Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g --

Re: Any Tokenizator friendly to C++, C#, .NET, etc ?

2009-08-20 Thread Ken Krugler
s message in context: http://www.nabble.com/Any-Tokenizator-friendly-to-C%2B%2B%2C-C-%2C-.NET%2C-etc---tp25063175p25063964.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------- To unsubscribe, e-mail: java-user-unsubs

Re: Analyzing performance and memory consumption for boolean queries

2009-06-23 Thread Ken Krugler
on logic (perhaps kind of like a database's query optimizer) at work here that makes the I/O and RAM requirements more difficult to model from the query? (Remember that we're not doing any sorting.) I'm hoping that with some of this knowledge, I'll be able to better model the RAM

Re: Synchronizing Lucene indexes across 2 application servers

2009-06-20 Thread Ken Krugler
e Katta has added an index to both systems, then you can switch to it (and eventually remove the old index). The fact that you'd need two Katta "masters" makes things a bit more interesting, as you'd have to coordinate when they both decide to switch to using the new index(es).

Re: Distributed Lucene Questions

2009-06-01 Thread Ken Krugler
buted search support inside of Nutch. And Solr has distributed search support, though it's still pretty new. -- Ken -- Ken Krugler +1 530-210-6378 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional c

Re: Restricting the result set with hierarchical ACL

2009-03-02 Thread Ken Krugler
;t use the typical approach of having a doc field with every group in it, then adding a required subclause to your query with every group as a boolean OR term. -- Ken -- Ken Krugler +1 530-210-6378 - To unsubscribe, e-mail: java-user

Re: How to compute the simlarity of a web page?

2009-02-25 Thread Ken Krugler
--------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- Ken Krugler +1 530-210-6378 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Implement a relaxed PhraseQuery?

2008-03-23 Thread Ken Krugler
h on subject == "alternative scoring algorithm for PhraseQuery". I believe Paul Elschot gave him some useful input, but then Philipp seemed to have dropped off the list...and he didn't respond to my email asking him if he was able to co

Re: Indexing source code files

2008-02-28 Thread Ken Krugler
essentially synonym processing, where you turn a single term into multiple terms based on the automatic splitting of the term using '_', '-', camelCasing, letter/digit transitions, etc. -- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378

Re: alternative scoring algorithm for PhraseQuery

2007-10-17 Thread Ken Krugler
here helped you finish your FuzzyPhraseQuery (or FuzzySpanQuery) addition to Lucene. Thanks, -- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378 "If you can't find it, you can't fix it" - To unsubscribe, e-ma

Re: Serving remote lucene client - RMI vs HTTP

2007-07-15 Thread Ken Krugler
] Nutch already supports distributed Lucene searchers, using Hadoop RPC. -- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378 "If you can't find it, you can't fix it" - To unsubscribe, e-mail: [EMAIL PROTECTED] For

Re: boosting different parts of the same field

2007-05-31 Thread Ken Krugler
like below would work maybe this is a silly question but why not create a title field and a description field and boost them separately? Donna L. Gresh Services Research, Mathematical Sciences Department IBM T.J. Watson Research Center (914) 945-2472 http://www.research.ibm.com/people/g/donna

Re: UTF8 accents & umlauts filter?

2006-09-12 Thread Ken Krugler
s a little slow). --------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Ken Krugler Krugle, Inc. +1 530-210-6378 "Find Code, Find Answers" --

Re: Where to find drill-down examples (source code)

2006-07-21 Thread Ken Krugler
/search/lucene/query/DateIntervalQuery.java -- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378 "Find Code, Find Answers" - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Multisearcher Lucene IOException

2006-06-04 Thread Ken Krugler
I don't think it's a bad index. After seeing a few postings about this same general problem, I'm guessing there's a bug hiding someplace. Sorry to not have a better answer... -- Ken -- Ken Krugler Krugle, Inc. +1 53

Re: BufferedIndexInput.readByte performance

2006-05-26 Thread Ken Krugler
g required to pick the right cut-off value for searches. Thanks, -- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378 "Find Code, Find Answers" - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Checking for duplicates inside index

2006-05-22 Thread Ken Krugler
ill need a big sum though. MD5? Just as a reference, Nutch uses an MD5 digest to detect duplicate web pages. It works fine, except of course when two docs differ by only an insignificant text delta. There's some recent work in this area - check out TextProfileSignature. -- Ken -- Ken K

Re: How are results merged from a multisearcher?

2006-05-18 Thread Ken Krugler
On Donnerstag 18 Mai 2006 18:36, Ken Krugler wrote: > >Could someone describe how the results from multiple indices are merged > when using a MultiSearcher? My naive intuition is that the scores for > documents found in each index could be wildly different, so what > crit

Re: How are results merged from a multisearcher?

2006-05-18 Thread Ken Krugler
selection of indices that get merged to form the N final indices. This randomization helps avoid the IDF skew problem. There's an Jira issue on the Nutch side (see NUTCH-92) around this same problem. -- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378 "Find Code, Fi

Re: Scoring without floating point calculations

2006-04-28 Thread Ken Krugler
t scoring algorithm. You can always add the log of the score versus doing a multiplication, but that would still involve a lot of source code changes. -- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378 "Find Code, Find Answers"

Re: Can Lucene load more then 2GB into RAM memory?

2006-03-16 Thread Ken Krugler
the Lucene code RAMDirectory.java i see an int cast of the index file size, meaning there is a 2GB limit did i miss something? has anyone loaded more then a single 2GB index into RAM ?? > thanks, -- Ken Krugler Krugle, Inc. +1 530-210-6378 "

Re: Multiple terms with the same position in PhraseQuery

2005-11-06 Thread Ken Krugler
project files - and I don't put them into the Eclipse Workspace directory. b. Then launch Eclipse and create a new Java project, importing the files from the external (SVN-controlled) location. -- Ken -- Ken Krugler Krugle,

Re: Lucene does NOT use UTF-8.

2005-08-27 Thread Ken Krugler
"; open (my $virtual_filehandle, "+<:utf8", \$data); print <$virtual_filehandle>; -- Ken Krugler TransPac Software, Inc. <http://www.transpac.com> +1 530-470-9200 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: i18n query normalization

2005-08-23 Thread Ken Krugler
are tokenizers already built for lucene. Search the archives for a discussion about this, back in June I believe. I'd suggested using ICU to generate sort keys, and indexing those. -- Ken -- Ken Krugler TransPac Software, Inc. <http://www.transpac.com> +1 5

Re: NGram Language Categorization Source

2005-08-20 Thread Ken Krugler
M product(s) to get it) so what you've done is great for the open source community - thanks! Also I could post to the Unicode list re training data in multiple languages, as that's a good place to find out about multilingual corpora. -- Ken -- Ken Krugler TransPac Software, Inc.

Re: Indexing puncutation

2005-06-29 Thread Ken Krugler
; or "21MAGAB". Is the best way to accomplish this by creating synonyms for the 3 different ways when punctuation is in parts to search for? I know I can stop punctuation in the index but what about grouping the information together or with spaces? Thanks all in advance, Tom

Re: Looking for someone to develop Thai Lucene Analyzer

2005-06-22 Thread Ken Krugler
in a Java implementation, so this shouldn't be all that hard. See <http://www-306.ibm.com/software/globalization/topics/thaiusabilities/text.jsp> -- Ken -- Ken Krugler TransPac Software, Inc. <http://www.transpac.com>

Re: Question for Wildcard Search:

2005-06-22 Thread Ken Krugler
t; "eni" "niz" "ize" "zed". That would help you find *foo*, but not *ha*. -- Ken -- Ken Krugler TransPac Software, Inc. <http://www.transpac.com> +1 530-470-9200 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]