Re: A codec moment or pickle

2015-02-12 Thread Benson Margulies
WHOOPS. First sentence was, until just before I clicked 'send', "Hardware has .5T of RAM. Index is relatively small (20g) ..." On Thu, Feb 12, 2015 at 4:51 PM, Benson Margulies wrote: > Robert, > > Let me lay out the scenario. > > Hardware has .5T of Index is relatively small. Application pro

Re: A codec moment or pickle

2015-02-12 Thread Benson Margulies
Robert, Let me lay out the scenario. Hardware has .5T of Index is relatively small. Application profiling shows a significant amount of time spent codec-ing. Options as I see them: 1. Use DPF complete with the irritation of having to have this spurious codec name in the on-disk format that has

Re: occurrence of two terms with the highest frequency

2015-02-12 Thread Ian Lea
I think you can do it with 4 simple queries: 1) +flying +shooting 2) +flying +fighting etc. or BooleanQuery equivalents with MUST clauses. Use aol.search.TotalHitCountCollector and it should be blazingly fast, even if you have more that 100 docs. -- Ian. On Thu, Feb 12, 2015 at 5:42 PM, Ma

Re: Proximity query

2015-02-12 Thread Maisnam Ns
Hi, I googled it but could not find the jars of these classes can some help me where to get the jars import org.apache.lucene.corpus.stats.IDFCalc; import org.apache.lucene.corpus.stats.TFIDFPriorityQueue; import org.apache.lucene.corpus.stats.TermIDF; Thanks On Thu, Feb 12, 2015 at 11:01 PM, M

occurrence of two terms with the highest frequency

2015-02-12 Thread Maisnam Ns
Hi, Can someone help me with this use case. Use case: Say there are 4 key words 'Flying', 'Shooting', 'fighting' and 'looking' in100 documents to search for. Consider 'Flying' and 'Shooting' co- occurs (together) in 70 documents where as 'Flying and 'fighting' co- occurs in 14 documents 'Flyin

Re: Proximity query

2015-02-12 Thread Maisnam Ns
Hi Allison and Sujit, Thanks so much for your links I am so happy I am looking at exactly the links that almost covers my use case. Allison, sure will get back to you if I have some more questions. Regards NS On Thu, Feb 12, 2015 at 10:49 PM, Sujit Pal wrote: > I did something like this

Re: Proximity query

2015-02-12 Thread Sujit Pal
I did something like this sometime back. The objective was to find patterns surrounding some keywords of interest so I could find keywords similar to the ones I was looking for, sort of like a poor man's word2vec. It uses SpanQuery as Jigar said, and you can find the code here (I believe it was wri

Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

2015-02-12 Thread Robert Muir
On Thu, Feb 12, 2015 at 11:58 AM, McKinley, James T wrote: > Hi Robert, > > Thanks for responding to my message. Are you saying that you or others have > encountered problems running Lucene 4.8+ on the 64-bit Java SE 1.7 JVM with > G1 and was it on Windows or on Linux? If so, where can I find

RE: Proximity query

2015-02-12 Thread Allison, Timothy B.
Might also look at concordance code on LUCENE-5317 and here: https://github.com/tballison/lucene-addons/tree/master/lucene-5317 Let me know if you have any questions. -Original Message- From: Maisnam Ns [mailto:maisnam...@gmail.com] Sent: Thursday, February 12, 2015 11:57 AM To: java-us

RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

2015-02-12 Thread McKinley, James T
Hi Robert, Thanks for responding to my message. Are you saying that you or others have encountered problems running Lucene 4.8+ on the 64-bit Java SE 1.7 JVM with G1 and was it on Windows or on Linux? If so, where can I find out more? I only looked into the one bug because that was the only

Re: Proximity query

2015-02-12 Thread Maisnam Ns
Hi Shah, Thanks for your reply. Will try to google SpanQuery meanwhile if you have some links can you please share Thanks On Thu, Feb 12, 2015 at 10:17 PM, Jigar Shah wrote: > This concept is called Proximity Search in general. > > In Lucene they are achieved using SpanQuery. > > On Thu, Feb 1

Re: Proximity query

2015-02-12 Thread Jigar Shah
This concept is called Proximity Search in general. In Lucene they are achieved using SpanQuery. On Thu, Feb 12, 2015 at 10:10 PM, Maisnam Ns wrote: > Hi, > > Can someone help me if this use case is possible or not with lucene > > Use case: I have a string say 'Japan' appearing in 10 documents

Proximity query

2015-02-12 Thread Maisnam Ns
Hi, Can someone help me if this use case is possible or not with lucene Use case: I have a string say 'Japan' appearing in 10 documents and I want to get back , say some results which contain two words before 'Japan' and two words after 'Japan' may be something like this ' Economy of Japan is gro

Re: A codec moment or pickle

2015-02-12 Thread Robert Muir
On Thu, Feb 12, 2015 at 8:51 AM, Benson Margulies wrote: > On Thu, Feb 12, 2015 at 8:43 AM, Robert Muir wrote: > >> Honestly i dont agree. I don't know what you are trying to do, but if >> you want file format backwards compat working, then you need a >> different FilterCodec to match each lucene

RE: A codec moment or pickle

2015-02-12 Thread Uwe Schindler
Hi, FYI, this is the same issues like Locales have/had in ICU! If you try to render an error message in Locales's constructors, this breaks with NPE - because default Locale is not yet there... I think they implemented some "fallback" that is guaranteed to be there. But this would not help you

Re: A codec moment or pickle

2015-02-12 Thread Benson Margulies
On Thu, Feb 12, 2015 at 8:43 AM, Robert Muir wrote: > Honestly i dont agree. I don't know what you are trying to do, but if > you want file format backwards compat working, then you need a > different FilterCodec to match each lucene codec. > > Otherwise your codec is broken from a back compat st

Re: A codec moment or pickle

2015-02-12 Thread Robert Muir
Honestly i dont agree. I don't know what you are trying to do, but if you want file format backwards compat working, then you need a different FilterCodec to match each lucene codec. Otherwise your codec is broken from a back compat standpoint. Wrapping the latest is an antipattern here. On Thu,

Re: A codec moment or pickle

2015-02-12 Thread Benson Margulies
Based on reading the same comments you read, I'm pretty doubtful that Codec.getDefault() is going to work. It seems to me that this situation renders the FilterCodec a bit hard to to use, at least given the 'every release deprecates a codec' sort of pattern. On Thu, Feb 12, 2015 at 3:20 AM, Uwe

RE: A codec moment or pickle

2015-02-12 Thread Uwe Schindler
Hi, How about Codec.getDefault()? It does indeed not necessarily return the newest one (if somebody changes the default using Codec.setDefault()), but for your use case "wrapping the current default one", it should be fine? I have not tried this yet, but there might be a chicken-egg problem: -