How to search special characters in LUcene

2009-04-20 Thread Uday Kumar Maddigatla
HI, I'm new to the lucene. I downloaded lucene 2.4.1. I have one xml file which contains few special characters like 'å', 'ø,' °' etc.(these are Danish language elements). How can I search these things. Uday Kumar Reddy Maddigatla Software Engineer(Progrator|gatetrade) MACH

Re: Faceting, Sort and DocIDSet

2009-04-20 Thread John Wang
Hi David: We built bobo-browse specifically for these types of usecases: http://code.google.com/p/bobo-browse Let me know if you need any help getting it going. -John On Mon, Apr 20, 2009 at 12:59 PM, Karsten F. wrote: > > Hi David, > > correct: you should avoid reading the content o

Re: Seattle / PNW Hadoop + Lucene User Group?

2009-04-20 Thread Bradford Stephens
Thanks for the responses, everyone. Where shall we host? My company can offer space in our building in Factoria, but it's not exactly a 'cool' or 'fun' place. I can also reserve a room at a local library. I can bring some beer and light refreshments. On Mon, Apr 20, 2009 at 7:22 AM, Matthew Hall

Re: IndexWriter update method

2009-04-20 Thread Erick Erickson
I don't think you *can* create a Term that spans two fields. Perhaps you'd be better off just doing a search, getting the doc ID back then adding a new version of the document. You *could* think about reindexing your corpus and indexing an additional field that was the concatenation of the two fie

Re: ebook resources - including lucene in action

2009-04-20 Thread Erik Hatcher
It is not legal to share purchased e-books in this manner. Please purchase copies of the books you read, otherwise authors have very little incentive to dedicate months (14 months in the case of Lucene in Action, first edition) of their lives to writing this content. Erik On Apr 2

Re: readModifiedUTF8String stuck

2009-04-20 Thread MakMak
Mike, I made a standalone tool like you suggested which prints out the size of each doc in the index, none of the docs are more than 1MB !!! The queries are the same. They repeat throughout the test. We give about 6GB of heap to the application and yes we are on 64 bit JVM. I hit upon anothe

Re: Query scoring

2009-04-20 Thread Chris Hostetter
Erick means we need to see *all* of your code (inlcuding how you get the score and the Explanation you are printing) to understand why they don't match. All you've shown is the output of your program and the generation of a Hits object. -Hoss --

RE: Faceting, Sort and DocIDSet

2009-04-20 Thread Karsten F.
Hi David, correct: you should avoid reading the content of a document inside a hitcollector. Normaly that means to cache all you need in main memory. Very simple and fast is a facet with only 255 possible values and exactly one value per document. In this case you need only an byte[IndexReader.ma

Re: ebook resources - including lucene in action

2009-04-20 Thread Matthew Hall
Strange.. as far as I can tell I never even got this email at all, was it not originally sent to the lucene lists? Matt Grant Ingersoll wrote: Lest you think silence equals acceptance... This is not appropriate use of these lists. -Grant On Apr 19, 2009, at 11:58 PM, wu fuheng wrote: welc

Re: ebook resources - including lucene in action

2009-04-20 Thread Grant Ingersoll
Lest you think silence equals acceptance... This is not appropriate use of these lists. -Grant On Apr 19, 2009, at 11:58 PM, wu fuheng wrote: welcome to download http://www.ultraie.com/admin/flist.php - To unsubscribe, e-

RE: IndexWriter update method

2009-04-20 Thread Newman, Billy
What if you're unique id is a composite of two field when you create the document? I.E. doc.add(new Field("partno", "123345", Field.Store.whatever, Field.Index.UN_TOKENIZED); doc.add(new Field("storeLoc", "Springfield", Field.Store.whatever, Field.Index.UN_TOKENIZED); How do you create a Term fo

RE: Faceting, Sort and DocIDSet

2009-04-20 Thread David Seltzer
Robert, 99% of the documents are inserted as soon as we discover them, so the INDEXORDER is largely correct. However, two factors keep me from using INDEXORDER. The first is that a small portion of our records (1%) enter the index late (so they appear out of order with respect to the other 99%

Solr webinar

2009-04-20 Thread Erik Hatcher
(excuse the cross-post) I'm presenting a webinar on Solr. Registration is limited, so sign up soon. Looking forward to "seeing" some of you there! Thanks, Erik "Got data? You can build your own Solr-powered Search Engine!" Erik Hatcher, Lucene/Solr Committer and author, will show

Re: Faceting, Sort and DocIDSet

2009-04-20 Thread Robert Muir
David, One suggestion I have for your large index. Is it possible to index these documents ordered by Date? (and ingest new docs in Date order?) This way index order = Date order, you can do this sort very quickly by using Sort.INDEXORDER with huge indexes I try to see if there's a way i can hav

Re: Seattle / PNW Hadoop + Lucene User Group?

2009-04-20 Thread Matthew Hall
Same here, sadly there isn't much call for Lucene user groups in Maine. It would be nice though ^^ Matt Amin Mohammed-Coleman wrote: I would love to come but I'm afraid I'm stuck in rainy old England :( Amin On 18 Apr 2009, at 01:08, Bradford Stephens wrote: OK, we've got 3 people... t

Re: LocalLucene/Lucene Spatial

2009-04-20 Thread patrick o'leary
Honestly I'm more focused on intelligent ways to do faster and more complex GIS features. As I said the most time consuming part is the DistainceFilter, which is required to sort by distance. I'm playing with several ideas on how to do those better, and get a win there. However if anyone wants to

RE: Faceting, Sort and DocIDSet

2009-04-20 Thread David Seltzer
Hi Karsten, My index contains about 100M documents, and I'm trying to count results on around 300 facets. At the moment I'm keeping a set of cached facet bitsets and then comparing the query result against those bitsets. Performance is pretty lousy. It takes more than 2s to calculate the cardinali

Re: A Challenge!: Combining 2 searches into a single resultset?

2009-04-20 Thread theDude_2
Thanks! I wound up indexing both versions in the same index, and boosting the words that appeared in the "good word" list! Thanks again for your advice! Matthew Hall-7 wrote: > > Erm, I likely should have mentioned that this technique requires the use > of a MultiFieldQueryParser. > > Matt

Reply to "Search for synonyms - implemenetation for review"

2009-04-20 Thread liat oren
Hi, I saw a very old thread that suggests an implementation for Synonyms that takes into account differnt weight to differnt synonyms and gives a penalty factor to synonyms, to avoid getting documents with the synonyms prior to documents with the original words. http://mail-archives.apache.org/mod

RE: LocalLucene/Lucene Spatial

2009-04-20 Thread Uwe Schindler
Have you thought about subclassing MultiTermQuery and provide a FilteredTermEnum? When you do this, the query can be either BooleanQuery or a Filter. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: patric