Re: how to add attributes to a field just like term's payload ?

2013-01-24 Thread wgggfiy
hello, but there is no getCommitUserData in IndexReader, how can I get the userdata ?? thx - -- Email: wuqiu.m...@qq.com -- -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-add-attributes-to-a-field-just-like-term-s-p

Re: List of files that Lucene 4.0 generates during indexing

2013-01-24 Thread saisantoshi
Thanks. Could you please also comment on the following as well? http://lucene.472066.n3.nabble.com/TopDocCollector-vs-TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-td4035806.html Thanks and really appreciate your help. Thanks, Sai. -- View this message in context: htt

Re: List of files that Lucene 4.0 generates during indexing

2013-01-24 Thread Michael McCandless
You get/set the merge policy on IndexWriterConfig (which you pass to IndexWriter). And then you can set this CFS ratio via that merge policy. Mike McCandless http://blog.mikemccandless.com On Thu, Jan 24, 2013 at 5:35 PM, saisantoshi wrote: > Thanks a lot. One last question, how do we set it?

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-01-24 Thread saisantoshi
Can someone please help us here to validate the above? Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/TopDocCollector-vs-TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-tp4035806p4036093.html Sent from the Lucene - Java Users mailing list

Re: List of files that Lucene 4.0 generates during indexing

2013-01-24 Thread saisantoshi
Thanks a lot. One last question, how do we set it? IndexWriter.??? Thanks, Ranjith. -- View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036091.html Sent from the Lucene - Java Users mailing list archive at Nabbl

Re: List of files that Lucene 4.0 generates during indexing

2013-01-24 Thread Michael McCandless
I would leave the default until/unless something goes wrong ... Mike McCandless http://blog.mikemccandless.com On Thu, Jan 24, 2013 at 5:28 PM, saisantoshi wrote: > Thanks. Are there any best practices to follow here? or leave the the > default > ( which is hybrid approach as you mentioned). >

Re: List of files that Lucene 4.0 generates during indexing

2013-01-24 Thread saisantoshi
Thanks. Are there any best practices to follow here? or leave the the default ( which is hybrid approach as you mentioned). -- View this message in context: http://lucene.472066.n3.nabble.com/List-of-files-that-Lucene-4-0-generates-during-indexing-tp4035993p4036086.html Sent from the Lucene - J

Re: Chinese analyzer

2013-01-24 Thread Jerome Lanneluc
Thanks Robert. Is there another analyzer I should use? Jerome From: Robert Muir To: java-user@lucene.apache.org, Date: 01/24/2013 06:20 PM Subject:Re: Chinese analyzer On Thu, Jan 24, 2013 at 10:53 AM, Jerome Lanneluc wrote: > It looks like my attachment was lost. It refer

Re: List of files that Lucene 4.0 generates during indexing

2013-01-24 Thread Michael McCandless
4.0 has a hybrid approach by default: "big" segments (> 10% of index size, by default) are non-compound-files and small segments are compound files. See TieredMergePolicy.setNoCFSRatio if you want to always use compound file format. Mike McCandless http://blog.mikemccandless.com On Thu, Jan 24

Re: List of files that Lucene 4.0 generates during indexing

2013-01-24 Thread saisantoshi
Thanks Michael. The additional file in the list is just a typo. One more question is, we were using 2.4 before, and it only generated few files _0.cfs _0.cfx // segment files I am assuming that the 2.4 version has the compound index structure enabled by default. Do we need to set it explicitly w

Re: List of files that Lucene 4.0 generates during indexing

2013-01-24 Thread Michael McCandless
That looks correct, except I don't know what index.v0008 is. Mike McCandless http://blog.mikemccandless.com On Thu, Jan 24, 2013 at 1:22 PM, saisantoshi wrote: > Thanks. I checked it out. > > Here are the list of files that has been generated: > > _0.fdt > _0.fdx > _0.f

Re: Faceted search in OR

2013-01-24 Thread Nicola Buso
Hi Shai, the use case is simple. Suppose you want to buy an hi-fi on a online shop. Go in the website in the Electronic department and write "hi-fi" in the search box, the interface return you lots of results and a facet on brands (10 brands values). You select brand A and the results are filtered

Re: 回复: IndexReader.open and CorruptIndexException

2013-01-24 Thread Ian Lea
There will be one file handle for every currently open file. Use SearcherManager and this problem should go away. -- Ian. On Thu, Jan 24, 2013 at 6:40 PM, zhoucheng2008 wrote: > What file handlers did you guy refer to? > > > I opened the index directory only. Is this the file handler? Also, h

回复: IndexReader.open and CorruptIndexException

2013-01-24 Thread zhoucheng2008
What file handlers did you guy refer to? I opened the index directory only. Is this the file handler? Also, how to safely and effectively close the index directory? I found the link's explanation somewhat self-contradictory. After I read it, I am confused if I should close the file handlers

Re: List of files that Lucene 4.0 generates during indexing

2013-01-24 Thread saisantoshi
Thanks. I checked it out. Here are the list of files that has been generated: _0.fdt _0.fdx _0.fnm _0.si _0_Lucene40_0.frq _0_Lucene40_0.prx _0_Lucene40_0.tim _0_Lucene40_0.tip _0_nrm.cfe _0_nrm.cfs index.v000

Re: Faceted search in OR

2013-01-24 Thread Shai Erera
Hi Nicola, Regarding the OR drill-down, yes you can construct your own BooleanQuery, passing Occur.SHOULD instead of MUST. Currently DrillDown does not help you do that, so you can copy the code from DrillDown.query and change SHOULD to MUST. I opened LUCENE-4716 to add this support to DrillDown.

Faceted search in OR

2013-01-24 Thread Nicola Buso
Hi all, I'm introducing Lucene faceted search in our project and I need some hints to achieve some functionalities: - I want facet filtering in OR, how to? - obtain facets for the filtered results but also for the non filtered one. i.e. I have facet A with values A/V1, A/V2, A/V3 and these value

Re: Chinese analyzer

2013-01-24 Thread Robert Muir
On Thu, Jan 24, 2013 at 10:53 AM, Jerome Lanneluc wrote: > It looks like my attachment was lost. It referred to > org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer. > I think this analyzer will not properly tokenize text outside of the BMP: it pretty much only works for simplified text (e.

Re: List of files that Lucene 4.0 generates during indexing

2013-01-24 Thread Steve Rowe
Hi saisantoshi, Check out the documentation: - particularly the "File Formats" link under "Reference Documents". Steve On Jan 24, 2013, at 11:41 AM, saisantoshi wrote: > Is there any doc on how many files that lucene generates during indexing >

Re: FacetedSearch and MultiReader

2013-01-24 Thread Nicola Buso
Hi Shai, I'd like just to give you a confirmation that your solution is working after the tests I did. Thanks again for the useful hints. Nicola. On Tue, 2013-01-22 at 06:20 +0200, Shai Erera wrote: > Hi Nicola, > > What I had in mind is something similar to this, which is possible starting >

Re: Chinese analyzer

2013-01-24 Thread Jerome Lanneluc
It looks like my attachment was lost. It referred to org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer. I'm inlining it here: import java.io.IOException; import java.io.StringReader; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.cn.smart.SmartChineseAna

Re: Chinese analyzer

2013-01-24 Thread Robert Muir
On Thu, Jan 24, 2013 at 9:25 AM, Jerome Lanneluc wrote: > Note the 2 tokens in the second sample when I would expect to have only one > token with the (55401 57046) characters. > > I could not figure out if I'm doing something wrong, or if this is a bug in > the Chinese analyzer. > Which analyzer

Chinese analyzer

2013-01-24 Thread Jerome Lanneluc
Hi, I'm using the 3.6.1 Chinese analyzer and when tokenizing some Chinese words containing CJK Unified Ideographs Extension B characters, the resulting tokens do not contain the original words. Instead it seems that the CJK Unified Ideographs Extension B characters are split in two characters.

Re: StoredFieldsFormat / documentation

2013-01-24 Thread Adrien Grand
Hi Bernd, On Thu, Jan 24, 2013 at 11:55 AM, Bernd Müller wrote: > Hi Simon, > >> you mean where it is used? Look at the org.apache.lucene.codecs.Codec >> class, it has a method: >> >> public abstract StoredFieldsFormat storedFieldsFormat(); >> >> which returns a stored fields format used to enc

Re: StoredFieldsFormat / documentation

2013-01-24 Thread Bernd Müller
Hi Simon, > you mean where it is used? Look at the org.apache.lucene.codecs.Codec > class, it has a method: > > public abstract StoredFieldsFormat storedFieldsFormat(); > > which returns a stored fields format used to encode your stored fields > written by the index writer. Thanks for your quic

Re: IndexWriter: IndexWriter.MaxFieldLength.LIMITED && setMaxFieldLength(MAX_FIELD_SCAN_LENGTH)

2013-01-24 Thread Ian Lea
See org.apache.lucene.analysis.miscellaneous.LimitTokenCountAnalyzer and org.apache.lucene.analysis.miscellaneous.LimitTokenCountFilter. Looks you can use the former with StandardAnalyzer as the delegate and whatever value you want for maxTokenCount. The 3.6,1 javadocs have "IndexWriter.MaxFieldL

Re: IndexReader.open and CorruptIndexException

2013-01-24 Thread Ian Lea
Well, raising the limits is one option but there may be better ones. There's an FAQ entry on this: http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_an_IOException_that_says_.22Too_many_open_files.22.3F Take a look at org.apache.lucene.search.SearcherManager "Utility class to safely s

Re: StoredFieldsFormat / documentation

2013-01-24 Thread Simon Willnauer
Hi Bernd, On Thu, Jan 24, 2013 at 9:30 AM, Bernd Müller wrote: > Hello, > > In the lucene 4.1 release, there was introduced a compression for > stored fields as described here: > https://issues.apache.org/jira/browse/LUCENE-4226 yeah that is correct, its the new default. if you use Lucene 4.1 t

Re: IndexReader.open and CorruptIndexException

2013-01-24 Thread Rafał Kuć
Hello! You need to allow the user that is running Lucene to open more files. There are plenty of tutorials available on the web. Modify your /etc/security/limits.conf and if for example your user is lucene add the following (or modify if those already exists): lucene soft nofile 64000 lucene hard

Re: IndexReader.open and CorruptIndexException

2013-01-24 Thread Cheng
Here is the log: Jan 24, 2013 4:10:33 AM org.apache.tomcat.util.net.AprEndpoint$Acceptor run SEVERE: Socket accept failed org.apache.tomcat.jni.Error: 24: Too many open files at org.apache.tomcat.jni.Socket.accept(Native Method) at org.apache.tomcat.util.net.AprEndpoint$Acceptor.run(AprEndpoint.ja

StoredFieldsFormat / documentation

2013-01-24 Thread Bernd Müller
Hello, In the lucene 4.1 release, there was introduced a compression for stored fields as described here: https://issues.apache.org/jira/browse/LUCENE-4226 In the java-docs, I don't really find any documentation about the application of StoredFieldsFormat and CompressingStoredFieldsFormat. Where