Re: confused about an entry in the FAQ
On Sat, May 24, 2008 at 12:39 AM, Emmanuel Bernard <[EMAIL PROTECTED]> wrote: > Hi Stephane > Can you tell me a bit more about the deadlocks you experience with Hibernate > Search. I have not seen such a situation so far and am interested to see how > to fix the problem. It is hard to externalize a unit test since it relies on many factor. You need to have a significant amount of data (100.000 documents) and you need to browse all results in the lucene index (15.000 results for a typicial query in my case). I still don't find any optimized solution to do this even if I only need one field from the search result and the index is 5MB. I could put that into memory but that's not a viable solution mid-term. I've stopped using lucene. I am using sql like for now and we are investigating Oracle Text and postigs test extension. If anyone has an idea, i'm interested. For instance, knowing that the IDs I got from the database are < 500, would it be reasonable to build a lucene query like "my search query AND (id IN (the list of 500 ids)" <- will this hit the toomanyclausesexception? How can I build such a query efficently? Thanks, Stéphane > > Emmanuel > > On May 12, 2008, at 06:13, Stephane Nicoll wrote: > >> Hibernate Search introduces deadlock with multiple threads and the >> lucene integration in spring modules does not seeem to do what I want. > > -- Large Systems Suck: This rule is 100% transitive. If you build one, you suck" -- S.Yegge - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Postcode/zipcode search
have you had a look at WOEID's ? https://developer.yahoo.com/geo/ http://where.yahooapis.com/v1/places.q('NW10%207NY') gives you details about the postcode, as well as a lat/long bounding box and the 'real' name of it (Willesden) in this case. http://where.yahooapis.com/v1/place/26556102/neighbors gives you the neighbors to it http://where.yahooapis.com/v1/place/26556102/siblings gives you it's children. and http://where.yahooapis.com/v1/place/26556102/parent?select=long gives you 1 level up. (NW2 4) apparently. So I'm guessing you could use 2 calls. 1 to get the WOEID of what the user has entered. the 2nd to get the siblings. using that you can construct a query to get all the entries in NW10 7NY. (note: I don't work for yahoo, but work with people who used to) mark harwood wrote: Can you not convert all postcodes to coordinates and do actual distance-based matching? You will have to pay Royal Mail or 3rd party suppliers to get hold of the PAF data required for this geocoding (despite having funded this already as a UK tax payer- g) Cheers Mark - Original Message From: Chris Mannion <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Tuesday, 6 May, 2008 5:28:25 PM Subject: Postcode/zipcode search Hi all I've got a bit of a niggling problem with how one of my searches is working as opposed to how my users would like it too work. We're indexing on UK postcodes, which are in the format of a 3 or 4 character area code followed by a 3 or 4 character street specific code, e.g. "NW10 7NY" or "M11 1LQ". We originally had the values being indexed as tokenized and used a very simple search string in the format "postcode:xxx xxx", with no grouping or boosting or fuzzy searching, just an straight search on whatever the user answered. This had the benefit of finding exact matches to searches and allowing us to search just on the area part of the code to return all records with that area code, eg a search on "NW2" returning anything starting NW2, like "NW2 6TB", "NW2 1ER" etc etc. However, the downside to that was that searches could also return records only tenuously related to what was searched for, eg. a search for "NW10 7NY" would also return a record with a postcode "SE9 6NY" because of the slight match of the "NY". Obviously this was technically correct but users complained because their searches were returning records from completely different areas. Our first step to put this right was to take off the tokenization of the field, which we also weren't happy with so have continued to fiddle. The current status is as follows - we index the values by stripping out spaces and tokeniing them and use a keywordAnalyzer. In searching we also strip spaces from the search term entered and search with a keywordAnalyzer. Searches for full postcodes, e.g. "NW10 7NY" find all exact matches but also any full values that are partial matches (e.g. some records just have "NW10" as their postcode field and the "NW10 7NY" search pulls them back too), but searches for partial postcodes e.g. "NW10" still only finds exact matches, e.g. it only pulls back those record that have just "NW10" as their postcode, rather than anything *starting* with NW10 as we'd like it to do. Can anyone help me get this working in the way we need it too please? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Is it possible to add multiple keywords to a single field from one doc?
Hi, I haven't been able to find the answer to this question easily so any help would be appreciated. Thanks, Tom - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Improving search performance
There needs to be a solution to that problem. I noticed it several years ago which is why ever since have designed systems using MultiSearcher concepts. There should only be one instance of deleted docs per IndexReader now that there is reopen. Editing the live deleted docs does not seem like something most people do. One should be able to delete docs, flush, then have to reopen to get the changes. Or this should be one option by extending SegmentReader. Also think it is important to keep the ability to delete docs using an open IndexReader rather than deprecate it because realtime search systems cannot switch between IndexReaders and IndexWriters. On Fri, May 23, 2008 at 11:24 PM, Otis Gospodnetic < [EMAIL PROTECTED]> wrote: > Hi Emmanuel, > > Because there are some synchronized methods, like the one that checks > whether a doc is deleted, that get called during search. If you have a pile > of threads (op. p. mentioned 100 threads) there could be contention around > those methods. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > - Original Message > > From: Emmanuel Bernard <[EMAIL PROTECTED]> > > To: java-user@lucene.apache.org > > Sent: Friday, May 23, 2008 6:41:36 PM > > Subject: Re: Improving search performance > > > > Hi > > Hibernate Search does not pool the Searcher but pools the underlying > > IndexReader(s). From what i've seen, a Searcher is stateless and all > > the state is kept in the Readers. so this essentially is equivalent to > > reusing the searcher. > > > > Out of curiosity why is a pool of Searcher more efficient? > > > > Emmanuel > > > > On May 22, 2008, at 13:22, Otis Gospodnetic wrote: > > > > > Some quick feedback. Those are all very expensive queries > > > (wildcards and ranges). The first thing I'd do is try without > > > Hibernate Search (to make sure HS is not the bottleneck). 100 > > > threads is a lot, I'm guessing you are reusing your searcher, which > > > is good, but you will actually improve performance a bit if you work > > > with a small pool of searchers instead of a single searcher. > > > > > > Otis > > > -- > > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > > - Original Message > > >> From: Rakesh Shete > > >> To: [EMAIL PROTECTED]; java-user@lucene.apache.org > > >> Sent: Thursday, May 22, 2008 1:16:13 PM > > >> Subject: Improving search performance > > >> > > >> > > >> Hi all, > > >> > > >> I have index of size 85MB. My query looks as follows: > > >> > > >> +(t:boss* d:boss* dd:boss* tg:boss*) +st:act +ntid:0 +cid:1 +dr: > > >> [20080410 TO > > >> 20081010] +rT:[002 TO 005] > > >> > > >> All the fields used in the query are stored in the indexes (Indexed > > >> & Stored) > > >> > > >> The query response time for me is around 30 seconds when running > > >> mutliple > > >> simultanoeous threads (~100). The no. of matches is ~30k but I > > >> retrieve only the > > >> top 100 results. I am using Hibernate Search which is a wrapper > > >> around Lucene. I > > >> retrieve the "id" filed from the index which is also indexex and > > >> stored. > > >> > > >> What is the approach that I should take for improving the > > >> performance? > > >> > > >> Will just indexing the values without storing them work (Index & > > >> UnStored)? > > >> > > >> My machine configuration is: > > >> P4 2.66GHz 1.99 GB RAM > > >> > > >> The code for searching runs in JBoss application server which has a > > >> maximum heap > > >> size of 1024MB. When these 100 threads are running in the > > >> application server the > > >> CPU utilization is 100% and JBoss consumes all of the heap size. > > >> > > >> Any pointers on index optimization would be really appreciated. > > >> > > >> --Regards, > > >> Rakesh Shete > > >> > > >> _ > > >> No Harvard, No Oxford. We are here. Find out !! > > >> http://ss1.richmedia.in/recurl.asp?pid=500 > > > > > > > > > - > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
Re: Is it possible to add multiple keywords to a single field from one doc?
Tom Conlon wrote: Hi, I haven't been able to find the answer to this question easily so any help would be appreciated. Thanks, Tom - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Could you elaborate a bit Tom? Your question is not very clear. If by keywords, you mean terms, and instead of from one doc, you mean to one doc, then the answer is yes. But as phrased, I am not sure what the question is. - Mark - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]