from:"Peyman Faratin"

inspecting chinese index using luke

2011-12-19 Thread Peyman Faratin

hi We are indexing some chinese text (using the following outputstreamwriter with UTF-8 enconding). OutputStreamWriter outputFileWriter = new OutputStreamWriter(new FileOutputStream(outputFile), "utf8"); We are trying to inspect the index in Luke 3.4.0 (have chosen the UTF-8 option in Luke)

luke and chinese text

2011-12-22 Thread Peyman Faratin

Hi We are indexing some chinese text (using the following outputstreamwriter with UTF-8 enconding). OutputStreamWriter outputFileWriter = new OutputStreamWriter(new FileOutputStream(outputFile), "utf8"); using lucene 3.2. The analyzer is new LimitTokenCountAnalyzer(new SmartChineseAnalyze

SweetSpotSimilarity

2012-02-15 Thread Peyman Faratin

Hi I have a noobie question. I am trying to use the SweetSpotSimilarity (SSS) class. http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/contrib-misc/org/apache/lucene/misc/SweetSpotSimilarity.html I understand the scoring behavior of Lucene http://lucene.apache.org/core/old_ve

Upgrading from 3.6.1 to 4.3.0 and Custom collector

2013-06-17 Thread Peyman Faratin

Hi I am migrating from Lucene 3.6.1 to 4.3.0. I am however not sure how to migrate my custom collector below to 4.3.0 (this page http://lucene.apache.org/core/4_3_0/MIGRATE.html gives some hints but is the instructions are incomplete and looking at the source examples of custom collectors ma

Re: Upgrading from 3.6.1 to 4.3.0 and Custom collector

2013-06-18 Thread Peyman Faratin

Hi Adrien thank you very much. It worked. have a good day On Jun 18, 2013, at 5:35 AM, Adrien Grand wrote: > Hi, > > You didn't say specifically what your problem is so I assume it is > with the following method: > > On Tue, Jun 18, 2013 at 4:37 AM, P

Problem with BooleanQuery

2011-09-21 Thread Peyman Faratin

Hi The problem I would like to solve is determining the lucene score of a word in _a particular_ given document. The 2 candidates i have been trying are - QueryWrapperFilter - BooleanQuery Both are to restrict search within a search space. But according to Doug Cutting QueryWrapperFilter opti

Re: Problem with BooleanQuery

2011-09-21 Thread Peyman Faratin

at you've got for the two fields. > > As for performance, first narrow down where it is taking the time. If > it is in lucene, read > http://wiki.apache.org/lucene-java/ImproveSearchingSpeed > > > -- > Ian. > > On Wed, Sep 21, 2011 at 5:38 PM, Peyman Faratin

Re: Problem with BooleanQuery

2011-09-22 Thread Peyman Faratin

p words removed etc. >>> >>> Maybe you need your "word" as TermQuery, assuming it is lowercased >>> etc., and pass the title through query parser. In other words, >>> reverse what you've got for the two fields. >>> >>> As for perfo

setting MaxFieldLength in indexwriter

2011-09-28 Thread Peyman Faratin

Hi Newbie question. I'm trying to set the max field length property of the indexwriter to unlimited. The old api is now deprecated but I can't seem to be able to figure out how to set the field with the new (IndexWriterConfig) API. I've tried IndexWriterConfig.maxFieldLength(Integer.MAX_VALUE)

Re: setting MaxFieldLength in indexwriter

2011-09-28 Thread Peyman Faratin

line, you'll likely be > interested in the Filter variant of the above-linked Analyzer wrapper: > > <http://lucene.apache.org/java/3_4_0/api/core/org/apache/lucene/analysis/LimitTokenCountFilter.html> > > > Steve > >> -Original Message- >> From: P

StandardTokenizer

2011-09-29 Thread Peyman Faratin

Hi I have a sentence "i'll email you at x...@abc.com" and I am looking at the tokens a StandardAnalyzer (which uses the StandardTokenizer) produces 1: [i'll:0->4:] 2: [email:5->10:] 3: [you:11->14:] 5: [x:18->19:] 6: [abc.com:20->27:] I am using the following constructor new Standar

Re: StandardTokenizer

2011-09-30 Thread Peyman Faratin

or you could look at UAX29URLEmailTokenizer which should > pick up the email component, although probably not the apostrophe. > > > -- > Ian. > > > On Thu, Sep 29, 2011 at 7:51 PM, Peyman Faratin > wrote: >> Hi >> >> I have a sentence >> >

ShinglesAnalyzer Queston

2011-10-09 Thread Peyman Faratin

Hi I am trying to understand why I am not able to retrieve docs I have indexed by a ShingleAnalyzer. The setup is as follows: During indexing I do the following: PerFieldAnalyzerWrapper wrapper = DocFieldAnalyzerWrapper.getDocFieldAnalyzerWrapper(Stopwords);

Shingles Filter problems

2011-10-11 Thread Peyman Faratin

Hi I have the following shinglefilter (Lucene 3.2) public TokenStream tokenStream(String fieldName, Reader reader) { StandardTokenizer first = new StandardTokenizer(Version.LUCENE_32, reader); StandardFilter second = new StandardFilter(Version.LUCEN

Re: Shingles Filter problems

2011-10-11 Thread Peyman Faratin

have expected there to be some shingles in there. > Are we both missing something? > > > -- > Ian. > > > On Tue, Oct 11, 2011 at 3:25 PM, Peyman Faratin > wrote: >> Hi >> >> I have the following shinglefilter (Lucene 3.2) >> >>

FieldCache

2011-10-21 Thread Peyman Faratin

Hi I have a field that is indexed as follows for(String c: article.getCategories()){ doc.add(new Field("categories", c.toLowerCase(), Field.Store.YES, Field.Index.ANALYZED)); } I have a search space of 2 million docs and I need to access the category field of each hitdoc. I woul

ElasticSearch

2011-11-16 Thread Peyman Faratin

Hi A client is considering moving from Lucene to ElasticSearch. What is the community's opinion on ES? thank you Peyman - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-use

Re: ElasticSearch

2011-11-18 Thread Peyman Faratin

Thank you all for the feedback and your point of views. Peyman On Nov 18, 2011, at 2:47 AM, Peter Karich wrote: > Hi Lukáš, hi Mark > >> https://issues.apache.org/jira/browse/SOLR-839 > > > thanks for pointing me there > > >>> although some parameters are available as URL parameters as w

docFreq of a Boolean query (LUCENE 4.3)

2013-12-16 Thread Peyman Faratin

Hi I know how to get the docFreq of a term in a single field (say "content" field) int docFreqInIndex = indexReader.docFreq(new Term("content", q)); But is it possible to get the docFreq of a boolean query consisting of matches across two or more fields? For instance, BooleanQuery booleanQuer

inspecting chinese index using luke

luke and chinese text

SweetSpotSimilarity

Upgrading from 3.6.1 to 4.3.0 and Custom collector

Re: Upgrading from 3.6.1 to 4.3.0 and Custom collector

Problem with BooleanQuery

Re: Problem with BooleanQuery

Re: Problem with BooleanQuery

setting MaxFieldLength in indexwriter

Re: setting MaxFieldLength in indexwriter

StandardTokenizer

Re: StandardTokenizer

ShinglesAnalyzer Queston

Shingles Filter problems

Re: Shingles Filter problems

FieldCache

ElasticSearch

Re: ElasticSearch

docFreq of a Boolean query (LUCENE 4.3)

19 matches

Site Navigation

Mail list logo

Footer information