Re: Twitter analyser

2013-11-08 Thread Lance Norskog
This is a parts-of-speech analyzer for tweets. It would make your index far more useful. http://www.ark.cs.cmu.edu/TweetNLP/ On 11/04/2013 11:40 PM, Stéphane Nicoll wrote: Hi, I am building an application that indexes tweet and offer some basic search facilities on them. I am trying to find

Re: JLemmaGen project

2013-11-04 Thread Lance Norskog
This is very cool! Lemmatization is an important tool for making search work better. Would you consider changing the licensing to the Apache 2.0 license? On 10/23/2013 08:17 AM, Michal Hlavac wrote: Hi, I rewrote lemmatizer project LemmaGen (http://lemmatise.ijs.si/) to java. Originally it's

Re: posting list strings

2013-07-14 Thread Lance Norskog
Is there a Trie-based term index? Seems like this would be smaller, and very fast on non-leading wildcards. On 07/09/2013 02:34 PM, Uwe Schindler wrote: Hi, You can replace the term by their hash directly in the analyzer chain. Just write a custom TermToBytesRef attribute that hashes the term

Re: In memory index (current status in Lucene)

2013-07-01 Thread Lance Norskog
an't store anything on disk in the clear. Lance On 07/01/2013 07:07 AM, Emmanuel Espina wrote: Hi Erick! Nice to hear from you again! From time to time my interest in these "Lucene things" returns and I do some experiments :p Just to add to this conversation, I found an intere

Re: Content based recommender using lucene/solr

2013-06-29 Thread Lance Norskog
Solr/Lucene has two features for this: 1) the MoreLikeThis code, and 2) the clustering project in solr/contrib. Lance On 06/28/2013 11:15 AM, Luis Carlos Guerrero Covo wrote: I only have about a million docs right now so scaling is not a big issue. I'm looking to provide a quick implement

Please add me as a wiki editor

2013-06-09 Thread Lance Norskog
I'm responsible for the OpenNLP wiki page: https://wiki.apache.org/solr/OpenNLP Please add me to the list of editors.

Re: Taking backup of a Lucene index

2013-06-05 Thread Lance Norskog
flushed to disk and then the segment* files change. At any point in this sequence, all of the files in the directory form one consistent index. This isn't like MySQL or other databases where you have to shut down the DB to get a safe copy of the files. Lance On 04/17/2013 03:57 AM, Ashish

Re: Zero-position query?

2013-06-03 Thread Lance Norskog
) On Mon, Jun 3, 2013 at 6:46 AM, Lance Norskog wrote: What is a Lucene query that will find two words at the same term position? Is there a class that will do this? Is the feature available from the Lucene query syntax or any other syntax parsers? For example, if I'm using synonyms at

Zero-position query?

2013-06-02 Thread Lance Norskog
t the same position. What is a query that will find a document with the synonym substituted, but will not find a document which has the base word and a synonym at two different positions? Thanks, Lance. - To unsubscribe, e

Re: StandardAnalyzer: Support for Japanese

2013-01-14 Thread Lance Norskog
3.x and 4.0 Solr releases have nice analyzers just for Japanese. In 4.0 they are the "Kuromoji" package. In 4.0, the JapaneseAnalyzer probably does what you need: http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-analyzers-kuromoji/4.0.0/org/apache/lucene/analysis/ja/Japan

Re: Pulling lucene 4.1

2013-01-02 Thread Lance Norskog
4.x does not promise backwards compatibility with 3.x. Have you made your own extensions? On 01/02/2013 04:38 AM, Shai Erera wrote: There's no specific branch for 4.1 yet. All development still happens on the 4x branch ( http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/). Note tha

Re: potential memory leak when using RAMDirectory ,CloseableThreadLocal and a thread pool .

2013-01-02 Thread Lance Norskog
There were memory leak problems with earlier versions of Java. You should upgrade to Java 6_30. Lance On 01/02/2013 05:26 AM, Alon Muchnick wrote: Hello All , we are using Lucune 3.6.2 in our web application on tomcat 5.5 and recently we started testing our application on tomcat 7

Re: Which token filter can combine 2 terms into 1?

2012-12-28 Thread Lance Norskog
How do you choose t2 and t2a? If you have a full inventory of these pairs, you can make these multi-word synonyms and use the Synonym filter to combine them. On 12/20/2012 11:50 PM, Xi Shen wrote: Hi, I am looking for a token filter that can combine 2 terms into 1? E.g. the input has been to

Re: how to implement a TokenFilter?

2012-12-26 Thread Lance Norskog
Go to the top directory and do this: cp dev-tools/eclipse/dot.project .project cp dev-tools/eclipse/dot.classpath .classpath cp -r dev-tools/eclipse/dot.settings .settings The 'ant eclipse' target does this setup. On 12/24/2012 10:45 PM, Xi Shen wrote: Hi Lance, I got the lucene 4

Re: how to implement a TokenFilter?

2012-12-23 Thread Lance Norskog
You need to use an IDE. Find the Attribute type and show all subclasses. This shows a lot of rare ones and a few which are used a lot. Now, look at source code for various TokenFilters and search for other uses of the Attributes you find. This generally is how I figured it out. Also, after the

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-13 Thread Lance Norskog
n put parts of speech as either payloads (PartOfSpeechAttribute?) on a token or at the same position." This adds it to a token, not a span. 'same position' does not suggest it also records the end position. -Glen On Thu, Dec 13, 2012 at 4:45 PM, Lance Norskog wrote: Parts-of-spe

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

2012-12-13 Thread Lance Norskog
Parts-of-speech is available now, in the indexer. LUCENE-2899 adds OpenNLP to the Lucene&Solr codebase. It does parts-of-speech, chunking and Named Entity Recognition. OpenNLP is an Apache project for natural-language processing. Some parts are in Solr that could be in Lucene. https://issues

Re: Which stemmer?

2012-11-16 Thread Lance Norskog
coded by people who think in good grammar, and are perfect spellers. If you find 'too aggressive' and 'too mild' to be a problem, what you want is 'lemmatization' where you work from a dictionary of word forms. Solr supports using Wordnet for this purpose.

Re: A large number of files in an index (3.6)

2012-10-28 Thread Lance Norskog
An option: instead of merging continuously as you run, you can optimize with 'maxSegments=10'. This mean 'optimize but only until there are 10 segments'. If there are fewer than 10 segments, nothing happens. This lets you schedule merging I/O. Is the number of files a problem due to file space

Re: Lucene 4.0 delete by ID

2012-10-28 Thread Lance Norskog
Scott, did you mean the Lucene integer id, or the unique id field? - Original Message - | From: "Martijn v Groningen" | To: java-user@lucene.apache.org | Sent: Sunday, October 28, 2012 2:24:29 PM | Subject: Re: Lucene 4.0 delete by ID | | A top level document ID can change over time. For

Re: Efficient string lookup using Lucene

2012-08-26 Thread Lance Norskog
gt; Dawid > > ----- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- Lance Norskog goks...@gmail.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: easy way to figure out most common tokens?

2012-08-19 Thread Lance Norskog
t >> doesn't help me). > > > I'm wrong, its there, but eclipse isn't seeing it (haven't tried javac by > itself), even though it sees HighFreqTerms just fine. > > > ----

Re: RE: Re:RE: Does the string "Cla$$War" affect Lucene?

2012-08-14 Thread Lance Norskog
ontaining $$? >> > >> > >> > -- >> > Ian. >> > >> > >> > On Tue, Aug 14, 2012 at 9:13 AM, zhoucheng2008 >> > wrote: >> > > Hi, >> > > >> > > >> > > I have a big index, and when I s

Re: RAM or SSD...

2012-07-18 Thread Lance Norskog
bucket. SSDs speeds up almost everything, saves > RAM and spares a lot of work hours optimizing I/O-speed. > > Regards, > Toke Eskildsen > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apac

Re: Direct memory footprint of NIOFSDirectory

2012-07-12 Thread Lance Norskog
gh estimate of direct memory usage per GB of >> indexed data, or per directory/writer instance, if applicable. >> >> Thanks, >> -V > > - > To unsubscribe, e-mail: java-user-unsubs

Re: RAMDirectory with FSDirectory merging Versus large mergeFactor and RAMBufferSizeMB

2012-06-05 Thread Lance Norskog
anyhow for large mergeFactor and large > RAMBufferSizeMB. > > Maxim > > > -- Lance Norskog goks...@gmail.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene

Re: lucene (search) performance tuning

2012-05-28 Thread Lance Norskog
And, no RamDirectory does not help. On Mon, May 28, 2012 at 5:54 PM, Lance Norskog wrote: > Can you use filter queries? Filters short-circuit a lot of search > processing. "City:San Francisco" is a classic filter - it is a small > part of the documents and it is reused a lot.

Re: lucene (search) performance tuning

2012-05-28 Thread Lance Norskog
it seems that io is not the >> >>> >> > I am reading >> >>> > http://www.cnlp.org/presentations/slides/AdvancedLuceneEU.pdf >> >>> >> > it mentions >> >>> >> > Size >> >>> >> > – Stop

Re: Sort runs out of memory

2012-05-23 Thread Lance Norskog
the smallest one is byte. It is possible to use only > ceil(log2(#unique_values)) bits/document, although that requires a bit > of custom coding. > > Regards, > Toke Eskildsen > > > ------

Re: Clear/Remove attribute from Token

2012-05-14 Thread Lance Norskog
, 2012 at 1:09 AM, Lance Norskog wrote: > I would like to remove a payload attribute from a token before it is > indexed. PayloadAttribute lets you set the payload to null. > AttributeSource (parent of all Tokens) does not have a 'remove > Attribute' method. You cannot capture

Clear/Remove attribute from Token

2012-05-14 Thread Lance Norskog
then monkey with it (at least Eclipse does not show me its methods). If I set the payload to null, when the Token is saved in the index, will a null payload be saved? Or does the payload get quietly dropped? -- Lance Norskog goks...@gmail.com

Re: Here a merge thread, there a merge thread ...

2012-02-25 Thread Lance Norskog
ommands, e-mail: java-user-h...@lucene.apache.org >> > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- Lance Norskog goks...@gmail.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Ignore this - just testing - restrict fuzzy search to longer words

2012-01-25 Thread Lance
- Lance - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: restrict fuzzy search to longer words

2012-01-23 Thread Lance
Hi Ian, Thanks for your help. I was just checking to see if my solr/lucene developer understands this. He's in India. He says this takes place at the time of integration. Is that correct? - Lance On Jan 20, 2012, at 3:40 PM, Ian Lea wrote: No idea about

restrict fuzzy search to longer words

2012-01-20 Thread Lance
HI, Could you please help me with a quick question - Is there a way to restrict lucene/solr fuzzy search to only analyze words that have more than 5 characters and to ignore words with less than that (i.e. less than 6 character words)? Thanks - Lance

restrict fuzzy search to longer words

2012-01-19 Thread Lance
HI, Could you please help me with a quick question - Is there a way to restrict lucene/solr fuzzy search to only analyze words that have more than 5 characters and to ignore words with less than that (i.e. less than 6 character words)? Thanks - Lance

Re: Spatial Search

2012-01-01 Thread Lance Java
a bit worried about this solution since Yonik has pointed out that the tier based approach is broken. Yonik, any more info on why this is broken? Perhaps a bounding box that works is better than a circle that doesn't ;) Cheers, Lance. On 31 December 2011 18:07, Yonik Seeley wrote: > O

Re: Retrieving large numbers of documents from several disks in parallel

2011-12-22 Thread Lance Norskog
s) >> >> >> Any other suggestions? I have tried some of the basic ideas on the Lucene >> wiki, such as leaving the IndexSearcher open for the life of the process (a >> servlet). Any help would be greatly appreciated! >> >> >> Rob > > > - &g

Re: semanticvectors

2011-09-07 Thread Lance Norskog
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org -- Lance Norskog goks...@gmail.com

Re: RAMDirectory doesn't win over FSDirectory all the time, why?

2011-06-16 Thread Lance Norskog
useless. Maybe the Instantiated index stuff is more what you want? Lance On Tue, Jun 7, 2011 at 2:52 AM, zhoucheng2008 wrote: > Makes sense. Thanks > > -Original Message- > From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] > Sent: Tuesday, June 07, 2011 4:28 PM

I need an available solr lucene consultant

2011-05-17 Thread Lance
blem solving and analytical abilities. You must have a solid grasp of English – written and verbal. Please note that I am a start-up and I am not going to be able to pay what a large established company can pay. Thank you, Lance ----- Lance

Re: Solr 1.4.1: Weird query results

2011-04-19 Thread Lance Norskog
t;> >> I'm sure this has quite a simple explanation but I'm unable to find it right >> now ;-) Perhaps you can help with that. >> >> Thanks a lot! >> >> Best regards, >> >>    Erik >> >> ---

Re: RE: ParallelMultisearcher

2011-03-22 Thread Lance Norskog
- >> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> >> >> >> >> >> >> > - >> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> >> >> >> >> >> >> > - >> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> >> >> >> >> >> > >> >> > - >> >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >> > For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > >> >> >> >> - >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > >> > >> > >> > - >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: java-user-h...@lucene.apache.org >> > >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Lance Norskog goks...@gmail.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Help!

2011-03-01 Thread Lance Norskog
Check out the Mahout project: mahout.apache.org -> there is a lucene-based text classifier project in there. Lance On Tue, Mar 1, 2011 at 9:25 PM, Sundus Hassan wrote: > I am doing MS-Thesis on content-based text categorization. > For This purpose I intend to use LUCENE.I need so

Re: Using Lucene to search live, being-edited documents

2011-01-21 Thread Lance Norskog
a time.  That being said, basic grep/regex is probably >> >> fast >> >> > enough. >> >> > > >> >> > >> >> > In cases where you are doing a 'find' in a document similar to what a >> >> > wordprocessor would do (especially if you want to iterate >> >> > forwards/backwards through matches etc), you might want to consider >> >> > something like >> >> > >> http://icu-project.org/apiref/icu4j/com/ibm/icu/text/StringSearch.html >> >> > >> >> > - >> >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >> > For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > >> >> > >> >> >> > >> >> >> >> -- >> --- >> Thanks & Regards >> Umesh Prasad >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > -- Lance Norskog goks...@gmail.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: 3.0.3 Contrib Query Parser : Custom Field Name Builder

2011-01-08 Thread Lance Norskog
p, >>> presumably because UCS doesn't have a hash an equals >>> or hash method. >>> >>> Suggestions? I've worked around it by registering a class >>> based builder, checking for the field name and either >>> delegating to the original builder or doing my custom >>> processing, but it'

Re: Using Lucene/Solr for Plagiarism detection

2010-12-30 Thread Lance Norskog
the Similarity class. >> > >> > How can I change the scoring formula? ( by customizing only the >> Similarity >> > class? or Scorer?) >> > >> > Do you have an Example of this use case? >> > >> > Thank for your help. >> > >&g

Re: Using Lucene to search live, being-edited documents

2010-12-29 Thread Lance Norskog
I couldn't find it, so... >> >> Is it possible / advisable / practical to use Lucene as the  basis of a >> live >> document search capability? By "live document" I mean a largish document >> such as a word processor might be able to handle which is

Re: PDF text extracted without spaces

2010-12-02 Thread Lance Norskog
SMS to your Friends on Mobile from your Yahoo! Messenger. Download > Now! http://messenger.yahoo.com/download.php > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-m

Re: a proof that every word is indexing properly

2010-12-01 Thread Lance Norskog
> Any ideas or thoughts, would be very much appreciated. > > Thanks in advance > David > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >

Re: asking about index verification tools

2010-11-17 Thread Lance Norskog
The Lucene CheckIndex program does this. It is a class somewhere in Lucene with a main() method. Samarendra Pratap wrote: It is not guaranteed that every term will be indexed. There is a limit on maximum number of terms (as in lucene 3.0 and may be earlier too) per field. Check out this http://

Re: What is the best Analyzer and Parser for this type of question?

2010-11-15 Thread Lance Norskog
document was scored. After all that- you might have a problem with the PrnP etc. stuff getting chopped up in weird ways. I don't know how people handle this in chemistry/bio search. Lance Ahmet Arslan wrote: Example of Question: - What is the role of PrnP in mad cow disease? First

Re: Can I use Lucene for this?

2010-11-13 Thread Lance Norskog
The Lucene MoreLikeThis tool in lucene/contrib/similar will do one variant of what you want. You can do this particular test in Solr- you'll find it much much easier to put together. For other text similarities, you'll have to code them directly. Lance On Sat, Nov 13, 2010 at 7:07

Re: How to handle more than Integer.MAX_VALUE documents?

2010-11-02 Thread Lance Norskog
You would have to control your MergePolicy so it doesn't collapse everything back to one segment. On Tue, Nov 2, 2010 at 12:03 PM, Simon Willnauer wrote: > On Tue, Nov 2, 2010 at 1:58 AM, Lance Norskog wrote: >> 2billion is a hard limit. Usually people split indexes into multiple

Re: How to handle more than Integer.MAX_VALUE documents?

2010-11-01 Thread Lance Norskog
lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Lance Norskog goks...@gmail.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Email Indexing

2010-10-27 Thread Lance Norskog
Tika has some mailbox file parsing that includes metadata parsing. For POP/IMAP email servers I don't know any tools. Hasan Diwan wrote: On 27 October 2010 18:16, Troy Wical wrote: Depends on what your trying to index, I suppose. Maildir or mbox? For some time now, off and on, I have been

Re: Text categorization / classification

2010-10-27 Thread Lance Norskog
already > does this using Lucene/Solr. > Thanks! > Maria > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@

Re: How to export lucene index to a simple text file?

2010-09-21 Thread Lance Norskog
The Lucene CheckIndex program opens an index and walks all of the data structures. It is a good start for you. Sahin Buyrukbilen wrote: Thank you Uwe, I will read the docs and try to do it, however do you have an example code? I need because I am not very familiar with Java. Thank you. Sahin

Re: Checksum and transactional safety for lucene indexes

2010-09-20 Thread Lance Norskog
off. Usually the data structures are damaged and Lucene throws CorruptIndexExceptions, NPE or array out-of-bounds exceptions. There is no checksumming of the index files. Lance Pulkit Singhal wrote: Hello Everyone, What happens if: a) lucene index gets written half-way to the disk and then

Re: Connection question

2010-09-17 Thread Lance Norskog
This can probably be done. The hardest part is cross-correlating your Lucene analyzer use with the Solr analyzer stack definition. There are a few things Lucene does that Solr doesn't- span queries for one. Lance On Fri, Sep 17, 2010 at 12:39 PM, Christopher Gross wrote: > Yes, I'm

Re: Extra Analyzers

2010-09-10 Thread Lance Norskog
appears puny.  (-: >>> >>> Thanks, >>> Chris >>> >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional command

Re: Lucene applicability

2010-08-25 Thread Lance Norskog
e of the data you're going to index? If you're > relying on your SOLR index to be your backup, you simply must back it up > somewhere "often enough" to get by if your building burns down. I'd also > think about storing your original input... > > This is no diffe

Re: Sorting a Lucene index

2010-08-25 Thread Lance Norskog
fying an > existing order-array is cheaper than a full re-sort or not depends on > your batch size. > > Regards, > Toke Eskildsen > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org &

Re: Solr SynonymFilter in Lucene analyzer

2010-08-18 Thread Lance Norskog
>> >> When I print synonymMap using synonymMap.toString(), I get the output like >> >> <{New York=<{Chicago=<{Seattle=<{New >> Orleans=<[(CONCEPTcity,0,0,type=SYNONYM),ORIG],null>}>}>}>}> >> >> so it looks like all the synonyms are loaded. But if I search for >> "CONCEPTcity" then it says no matches found. I am not sure whether I have >> loaded the synonyms correctly in the synonymMap. >> >> Any help will be deeply appreciated. Thanks! >> > -- Lance Norskog goks...@gmail.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Migrating from Lucene 2.9.1 to Solr 1.4.0 - Performance issues under heavy load

2010-08-03 Thread Lance Norskog
gle > MultiThreadedHttpConnectionManager and HttpClient for all the SolrServer’s, > and the other with a new MultiThreadedHttpConnectionManager and HttpClient > for each SolrServer. > > Both tries yielded similar performance results. > > Also tried to give setMaxTotalConnection

Re: Rank results only on some fields

2010-07-31 Thread Lance Norskog
tomScoreProvider. The QWF/CSQ trick is more convenient and used quite > often inside Lucene, too. > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -Original Message- >&g

Re: Rank results only on some fields

2010-07-31 Thread Lance Norskog
gt;> >      Philippe >> > >> > - >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >&

Re: Best practices for searcher memory usage?

2010-07-14 Thread Lance Norskog
Glen, thank you for this very thorough and informative post. Lance Norskog - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: phrase search in a particular case

2010-06-19 Thread Lance Norskog
SpanFirstQuery is the clean option. Another option is to add a "start token" to each title. Then, search for "startToken oil spill". This will be faster than SpanFirstQuery. But it also requires doing something weird to the field. Lance On Thu, Jun 17, 2010 at 3:19 PM, Micha

Re: segment_N file is missed

2010-06-18 Thread Lance Norskog
o be caused by the new behavior introduced here? >> https://issues.apache.org/jira/browse/LUCENE-2386 >> If you open a writer, add docs, and then crash before calling commit? > > That could be; Maryam is that what happened? > > Mike > > - > To unsubscribe, e-mail

Re: segment_N file is missed

2010-06-13 Thread Lance Norskog
y > copying these from another lucene index directory generated with the same > lucene version or can I merge this inex with another index which has > segments_N to retrieve the data ? > > Thanks > -- Lance Norskog goks...@gmail.com -

Re: A question bout google search index?

2010-06-13 Thread Lance Norskog
es per day. You > would need 625,000 of the largest iPods to store that much information; if > these were stacked end-to-end they would go for more than 40 miles > > > ----- > To unsubs

Re: Solr tutorial

2010-06-01 Thread Lance Norskog
t; To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mai

Re: Right memory for search application

2010-04-27 Thread Lance Norskog
- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Lance Norskog goks...@gmail.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Utility program to extract a segment

2010-04-14 Thread Lance Norskog
_g9 into a new directory and generate a segments.gen for just those two segments. Is this all that's needed? -- Lance Norskog goks...@gmail.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For addit

Re: IndexWriter and memory usage

2010-04-12 Thread Lance Norskog
ated. >>> >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> &

[ADMIN] - Spam problems?

2007-10-10 Thread Norskog, Lance
Hi- Around Sept. 20 I started getting Japanese spam to this account. This is a special account I only use for the Solr and Lucene user mailing lists. Did anybody else get these, starting around 9/20? Lance Norskog

[JOB] Solr/Lucene developer wanted in startup: San Francisco Peninsula, CA, USA

2007-10-09 Thread Lance Norskog
solid funding and are a real business. We have a contract with a large company to lease our index and provide various services. Thank for your time. Please contact me at [EMAIL PROTECTED] Lance Norskog 650-922-8831

UTF-8/unicode input in querying in Lucene

2007-09-14 Thread Lance Norskog
Hi- The page http://lucene.apache.org/java/docs/queryparsersyntax.html does not mention that \u Unicode syntax is supported. For example, \u0048\u0045\u004c\u004c\u004f is HELLO. Please add this to the page, it took experimentation to discover it. Thanks, Lance Norskog