inspecting chinese index using luke

2011-12-19 Thread Peyman Faratin
hi We are indexing some chinese text (using the following outputstreamwriter with UTF-8 enconding). OutputStreamWriter outputFileWriter = new OutputStreamWriter(new FileOutputStream(outputFile), "utf8"); We are trying to inspect the index in Luke 3.4.0 (have chosen the UTF-8 option in Luke)

luke and chinese text

2011-12-22 Thread Peyman Faratin
Hi We are indexing some chinese text (using the following outputstreamwriter with UTF-8 enconding). OutputStreamWriter outputFileWriter = new OutputStreamWriter(new FileOutputStream(outputFile), "utf8"); using lucene 3.2. The analyzer is new LimitTokenCountAnalyzer(new SmartChineseAnalyze

SweetSpotSimilarity

2012-02-15 Thread Peyman Faratin
Hi I have a noobie question. I am trying to use the SweetSpotSimilarity (SSS) class. http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/contrib-misc/org/apache/lucene/misc/SweetSpotSimilarity.html I understand the scoring behavior of Lucene http://lucene.apache.org/core/old_ve

Upgrading from 3.6.1 to 4.3.0 and Custom collector

2013-06-17 Thread Peyman Faratin
Hi I am migrating from Lucene 3.6.1 to 4.3.0. I am however not sure how to migrate my custom collector below to 4.3.0 (this page http://lucene.apache.org/core/4_3_0/MIGRATE.html gives some hints but is the instructions are incomplete and looking at the source examples of custom collectors ma

Re: Upgrading from 3.6.1 to 4.3.0 and Custom collector

2013-06-18 Thread Peyman Faratin
Hi Adrien thank you very much. It worked. have a good day On Jun 18, 2013, at 5:35 AM, Adrien Grand wrote: > Hi, > > You didn't say specifically what your problem is so I assume it is > with the following method: > > On Tue, Jun 18, 2013 at 4:37 AM, P

Problem with BooleanQuery

2011-09-21 Thread Peyman Faratin
Hi The problem I would like to solve is determining the lucene score of a word in _a particular_ given document. The 2 candidates i have been trying are - QueryWrapperFilter - BooleanQuery Both are to restrict search within a search space. But according to Doug Cutting QueryWrapperFilter opti

Re: Problem with BooleanQuery

2011-09-21 Thread Peyman Faratin
at you've got for the two fields. > > As for performance, first narrow down where it is taking the time. If > it is in lucene, read > http://wiki.apache.org/lucene-java/ImproveSearchingSpeed > > > -- > Ian. > > On Wed, Sep 21, 2011 at 5:38 PM, Peyman Faratin

Re: Problem with BooleanQuery

2011-09-22 Thread Peyman Faratin
p words removed etc. >>> >>> Maybe you need your "word" as TermQuery, assuming it is lowercased >>> etc., and pass the title through query parser. In other words, >>> reverse what you've got for the two fields. >>> >>> As for perfo

setting MaxFieldLength in indexwriter

2011-09-28 Thread Peyman Faratin
Hi Newbie question. I'm trying to set the max field length property of the indexwriter to unlimited. The old api is now deprecated but I can't seem to be able to figure out how to set the field with the new (IndexWriterConfig) API. I've tried IndexWriterConfig.maxFieldLength(Integer.MAX_VALUE)

Re: setting MaxFieldLength in indexwriter

2011-09-28 Thread Peyman Faratin
line, you'll likely be > interested in the Filter variant of the above-linked Analyzer wrapper: > > <http://lucene.apache.org/java/3_4_0/api/core/org/apache/lucene/analysis/LimitTokenCountFilter.html> > > > Steve > >> -Original Message- >> From: P

StandardTokenizer

2011-09-29 Thread Peyman Faratin
Hi I have a sentence "i'll email you at x...@abc.com" and I am looking at the tokens a StandardAnalyzer (which uses the StandardTokenizer) produces 1: [i'll:0->4:] 2: [email:5->10:] 3: [you:11->14:] 5: [x:18->19:] 6: [abc.com:20->27:] I am using the following constructor new Standar

Re: StandardTokenizer

2011-09-30 Thread Peyman Faratin
or you could look at UAX29URLEmailTokenizer which should > pick up the email component, although probably not the apostrophe. > > > -- > Ian. > > > On Thu, Sep 29, 2011 at 7:51 PM, Peyman Faratin > wrote: >> Hi >> >> I have a sentence >> >

ShinglesAnalyzer Queston

2011-10-09 Thread Peyman Faratin
Hi I am trying to understand why I am not able to retrieve docs I have indexed by a ShingleAnalyzer. The setup is as follows: During indexing I do the following: PerFieldAnalyzerWrapper wrapper = DocFieldAnalyzerWrapper.getDocFieldAnalyzerWrapper(Stopwords);

Shingles Filter problems

2011-10-11 Thread Peyman Faratin
Hi I have the following shinglefilter (Lucene 3.2) public TokenStream tokenStream(String fieldName, Reader reader) { StandardTokenizer first = new StandardTokenizer(Version.LUCENE_32, reader); StandardFilter second = new StandardFilter(Version.LUCEN

Re: Shingles Filter problems

2011-10-11 Thread Peyman Faratin
have expected there to be some shingles in there. > Are we both missing something? > > > -- > Ian. > > > On Tue, Oct 11, 2011 at 3:25 PM, Peyman Faratin > wrote: >> Hi >> >> I have the following shinglefilter (Lucene 3.2) >> >>

FieldCache

2011-10-21 Thread Peyman Faratin
Hi I have a field that is indexed as follows for(String c: article.getCategories()){ doc.add(new Field("categories", c.toLowerCase(), Field.Store.YES, Field.Index.ANALYZED)); } I have a search space of 2 million docs and I need to access the category field of each hitdoc. I woul

ElasticSearch

2011-11-16 Thread Peyman Faratin
Hi A client is considering moving from Lucene to ElasticSearch. What is the community's opinion on ES? thank you Peyman - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-use

Re: ElasticSearch

2011-11-18 Thread Peyman Faratin
Thank you all for the feedback and your point of views. Peyman On Nov 18, 2011, at 2:47 AM, Peter Karich wrote: > Hi Lukáš, hi Mark > >> https://issues.apache.org/jira/browse/SOLR-839 > > > thanks for pointing me there > > >>> although some parameters are available as URL parameters as w

docFreq of a Boolean query (LUCENE 4.3)

2013-12-16 Thread Peyman Faratin
Hi I know how to get the docFreq of a term in a single field (say "content" field) int docFreqInIndex = indexReader.docFreq(new Term("content", q)); But is it possible to get the docFreq of a boolean query consisting of matches across two or more fields? For instance, BooleanQuery booleanQuer