Re: confused about an entry in the FAQ

2008-05-24 Thread Stephane Nicoll
On Sat, May 24, 2008 at 12:39 AM, Emmanuel Bernard
<[EMAIL PROTECTED]> wrote:
> Hi Stephane
> Can you tell me a bit more about the deadlocks you experience with Hibernate
> Search. I have not seen such a situation so far and am interested to see how
> to fix the problem.

It is hard to externalize a unit test since it relies on many factor.
You need to have a significant amount of data (100.000 documents) and
you need to browse all results in the lucene index (15.000 results for
a typicial query in my case). I still don't find any optimized
solution to do this even if I only need one field from the search
result and the index is 5MB. I could put that into memory but that's
not a viable solution mid-term.

I've stopped using lucene. I am using sql like for now and we are
investigating Oracle Text and postigs test extension.

If anyone has an idea, i'm interested. For instance, knowing that the
IDs I got from the database are < 500, would it be reasonable to build
a lucene query like

"my search query  AND (id IN (the list of 500 ids)" <- will this hit
the toomanyclausesexception? How can I build such a query efficently?

Thanks,
Stéphane


>
> Emmanuel
>
> On  May 12, 2008, at 06:13, Stephane Nicoll wrote:
>
>> Hibernate Search introduces deadlock with multiple threads and the
>> lucene integration in spring modules does not seeem to do what I want.
>
>



-- 
Large Systems Suck: This rule is 100% transitive. If you build one,
you suck" -- S.Yegge

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Postcode/zipcode search

2008-05-24 Thread Ian Holsman (Lists)

have you had a look at WOEID's ?
https://developer.yahoo.com/geo/


http://where.yahooapis.com/v1/places.q('NW10%207NY')
gives you details about the postcode, as well as a lat/long bounding box 
and the 'real' name of it (Willesden) in this case.


http://where.yahooapis.com/v1/place/26556102/neighbors

gives you the neighbors to it
http://where.yahooapis.com/v1/place/26556102/siblings
gives you it's children.
and
http://where.yahooapis.com/v1/place/26556102/parent?select=long
gives you 1 level up. (NW2 4) apparently.


So I'm guessing you could use 2 calls. 1 to get the WOEID of what the 
user has entered. the 2nd to get the siblings. using that you can 
construct a query to get all the entries in NW10 7NY.



(note: I don't work for yahoo, but work with people who used to)

mark harwood wrote:

Can you not convert all postcodes to coordinates and do actual distance-based 
matching?

You will have to pay Royal Mail or 3rd party suppliers to get hold of the PAF 
data required for this geocoding (despite having funded this already as a UK 
tax payer- g)

Cheers
Mark

- Original Message 
From: Chris Mannion <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, 6 May, 2008 5:28:25 PM
Subject: Postcode/zipcode search

Hi all

I've got a bit of a niggling problem with how one of my searches is working
as opposed to how my users would like it too work.  We're indexing on UK
postcodes, which are in the format of a 3 or 4 character area code followed
by a 3 or 4 character street specific code, e.g. "NW10 7NY" or "M11 1LQ".
We originally had the values being indexed as tokenized and used a very
simple search string in the format "postcode:xxx xxx", with no grouping or
boosting or fuzzy searching, just an straight search on whatever the user
answered.  This had the benefit of finding exact matches to searches and
allowing us to search just on the area part of the code to return all
records with that area code, eg a search on "NW2" returning anything
starting NW2, like "NW2 6TB", "NW2 1ER" etc etc.

However, the downside to that was that searches could also return records
only tenuously related to what was searched for, eg. a search for "NW10 7NY"
would also return a record with a postcode "SE9 6NY" because of the slight
match of the "NY".  Obviously this was technically correct but users
complained because their searches were returning records from completely
different areas.  Our first step to put this right was to take off the
tokenization of the field, which we also weren't happy with so have
continued to fiddle.

The current status is as follows - we index the values by stripping out
spaces and tokeniing them and use a keywordAnalyzer.  In searching we also
strip spaces from the search term entered and search with a
keywordAnalyzer.  Searches for full postcodes, e.g. "NW10 7NY" find all
exact matches but also any full values that are partial matches (e.g. some
records just have "NW10" as their postcode field and the "NW10 7NY" search
pulls them back too), but searches for partial postcodes e.g. "NW10" still
only finds exact matches, e.g. it only pulls back those record that have
just "NW10" as their postcode, rather than anything *starting* with NW10 as
we'd like it to do.

Can anyone help me get this working in the way we need it too please?




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Is it possible to add multiple keywords to a single field from one doc?

2008-05-24 Thread Tom Conlon
Hi,

I haven't been able to find the answer to this question easily 
so any help would be appreciated.

Thanks,
Tom

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Improving search performance

2008-05-24 Thread Jason Rutherglen
There needs to be a solution to that problem.  I noticed it several years
ago which is why ever since have designed systems using MultiSearcher
concepts.  There should only be one instance of deleted docs per IndexReader
now that there is reopen.  Editing the live deleted docs does not seem like
something most people do.  One should be able to delete docs, flush, then
have to reopen to get the changes.  Or this should be one option by
extending SegmentReader.  Also think it is important to keep the ability to
delete docs using an open IndexReader rather than deprecate it because
realtime search systems cannot switch between IndexReaders and IndexWriters.

On Fri, May 23, 2008 at 11:24 PM, Otis Gospodnetic <
[EMAIL PROTECTED]> wrote:

> Hi Emmanuel,
>
> Because there are some synchronized methods, like the one that checks
> whether a doc is deleted, that get called during search.  If you have a pile
> of threads (op. p. mentioned 100 threads) there could be contention around
> those methods.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
> - Original Message 
> > From: Emmanuel Bernard <[EMAIL PROTECTED]>
> > To: java-user@lucene.apache.org
> > Sent: Friday, May 23, 2008 6:41:36 PM
> > Subject: Re: Improving search performance
> >
> > Hi
> > Hibernate Search does not pool the Searcher but pools the underlying
> > IndexReader(s). From what i've seen, a Searcher is stateless and all
> > the state is kept in the Readers. so this essentially is equivalent to
> > reusing the searcher.
> >
> > Out of curiosity why is a pool of Searcher more efficient?
> >
> > Emmanuel
> >
> > On  May 22, 2008, at 13:22, Otis Gospodnetic wrote:
> >
> > > Some quick feedback.  Those are all very expensive queries
> > > (wildcards and ranges).  The first thing I'd do is try without
> > > Hibernate Search (to make sure HS is not the bottleneck).  100
> > > threads is a lot, I'm guessing you are reusing your searcher, which
> > > is good, but you will actually improve performance a bit if you work
> > > with a small pool of searchers instead of a single searcher.
> > >
> > > Otis
> > > --
> > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > >
> > >
> > > - Original Message 
> > >> From: Rakesh Shete
> > >> To: [EMAIL PROTECTED]; java-user@lucene.apache.org
> > >> Sent: Thursday, May 22, 2008 1:16:13 PM
> > >> Subject: Improving search performance
> > >>
> > >>
> > >> Hi all,
> > >>
> > >> I have index of size 85MB. My query looks as follows:
> > >>
> > >> +(t:boss* d:boss* dd:boss* tg:boss*) +st:act +ntid:0 +cid:1 +dr:
> > >> [20080410 TO
> > >> 20081010] +rT:[002 TO 005]
> > >>
> > >> All the fields used in the query are stored in the indexes (Indexed
> > >> & Stored)
> > >>
> > >> The query response time for me is around 30 seconds when running
> > >> mutliple
> > >> simultanoeous threads (~100). The no. of matches is ~30k but I
> > >> retrieve only the
> > >> top 100 results. I am using Hibernate Search which is a wrapper
> > >> around Lucene. I
> > >> retrieve the "id" filed from the index which is also indexex and
> > >> stored.
> > >>
> > >> What is the approach that I should take for improving the
> > >> performance?
> > >>
> > >> Will just indexing the values without storing them work (Index &
> > >> UnStored)?
> > >>
> > >> My machine configuration is:
> > >> P4 2.66GHz 1.99 GB RAM
> > >>
> > >> The code for searching runs in JBoss application server which has a
> > >> maximum heap
> > >> size of 1024MB. When these 100 threads are running in the
> > >> application server the
> > >> CPU utilization is 100% and JBoss consumes all of the heap size.
> > >>
> > >> Any pointers on index optimization would be really appreciated.
> > >>
> > >> --Regards,
> > >> Rakesh Shete
> > >>
> > >> _
> > >> No Harvard, No Oxford. We are here. Find out !!
> > >> http://ss1.richmedia.in/recurl.asp?pid=500
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> >
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


Re: Is it possible to add multiple keywords to a single field from one doc?

2008-05-24 Thread Mark Miller

Tom Conlon wrote:

Hi,

I haven't been able to find the answer to this question easily 
so any help would be appreciated.


Thanks,
Tom

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

  
Could you elaborate a bit Tom? Your question is not very clear. If by 
keywords, you mean terms, and instead of from one doc, you mean to one 
doc, then the answer is yes. But as phrased, I am not sure what the 
question is.


- Mark

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]