Re: Tool for Lucene storage recovery

2013-01-18 Thread Simon Willnauer
hey, do you wanna open a jira issue for this and attach your code? this might help others too and if the shit hits the fan its good to have something in the lucene jar that can bring some data back. simon On Fri, Jan 18, 2013 at 6:37 PM, Michał Brzezicki wrote: > in lucene (*.fdt). Code is avail

Re: Is LogByteSizeMergePolicy deterministic?

2013-01-18 Thread Apostolis Xekoukoulotakis
Thanks Michael. I am mostly duplicating the solrCloud implementation so as to reduce the complexity and be able to use custom collectors and the LogByteSizeMergePolicy. (in order to perform an ordered join with exterior data from a database) Most of the things I want are under development in Solr

Re: SpanNearQuery with two boundaries

2013-01-18 Thread Igor Shalyminov
Alan and Jack, That's it, thank you! -- Best Regards, Igor 18.01.2013, 22:14, "Jack Krupansky" : > +1 > > I think that accurately states the semantics of the operation you want. > > -- Jack Krupansky > > -Original Message- > From: Alan Woodward > Sent: Friday, January 18, 2013 1:08 PM >

Re: Is LogByteSizeMergePolicy deterministic?

2013-01-18 Thread Michael McCandless
You must also use only a single indexing thread. And you must use SerialMergeScheduler. If you do that, I think it will be deterministic. But don't rely on this ... this is runtime behavior and can suddenly change between releases ... Mike McCandless http://blog.mikemccandless.com On Fri, Jan

Re: SpanNearQuery with two boundaries

2013-01-18 Thread Jack Krupansky
+1 I think that accurately states the semantics of the operation you want. -- Jack Krupansky -Original Message- From: Alan Woodward Sent: Friday, January 18, 2013 1:08 PM To: java-user@lucene.apache.org Subject: Re: SpanNearQuery with two boundaries Hi Igor, You could try wrapping t

Re: SpanNearQuery with two boundaries

2013-01-18 Thread Alan Woodward
Hi Igor, You could try wrapping the two cases in a SpanNotQuery: SpanNot(SpanNear(runs, cat, 10), SpanNear(runs, cat, 3)) That should return documents that have runs within 10 positions of cat, as long as they don't overlap with runs within 3 positions of cat. Alan Woodward www.flax.co.

Tool for Lucene storage recovery

2013-01-18 Thread Michał Brzezicki
Hi, I have created simple tool for recovering data from corrupted storage files in lucene (*.fdt). Code is available here http://pastebin.com/nmF0j4np you just have to implement method "handleDocument". Hope you will never have to use it :) -- Michał

SpanNearQuery with two boundaries

2013-01-18 Thread Igor Shalyminov
Hello! I want to perform search queries like this one: word:"dog" \1 word:"runs" (\3 \10) word:"cat" It is thus something like SpanNearQuery, but with two boundaries - minimum and maximum distance between the terms (which in the \1-case would be equal). Syntax (as above, fictional:) itself doesn

Re: tries and spatial search

2013-01-18 Thread Apostolis Xekoukoulotakis
The new Spatial contrib module has already implemented what I was talking. 2012/12/22 Apostolis Xekoukoulotakis > I just found out about the blocktree implementation and how it is used to > increase the speed of prefix search. > > Have you tried to use it for spatial search? > I will explain

Re: Document term vectors in Lucene 4

2013-01-18 Thread Jon Stewart
Thanks! I still can't see what was wrong with my original code--must have been a dumb typo somewhere--but starting over from that example now works on indices generated from my real indexing code. I will try to blog about it next week so there is some sample code up on the web for anyone else searc

Re: Inner join in lucene

2013-01-18 Thread Apostolis Xekoukoulotakis
You can put those fields as a DocValue type of field. They are optimized for use during search(or join in this case). Then create a collector that collects the documents which have the same value in those fields. Have other more experienced comment though before you start implementing it. 2013/

Re: Japanese analyzer

2013-01-18 Thread Jerome Lanneluc
Thanks Dawid, that was it. I'm now using an empty stoptags set and I'm seeing all the expected tokens. Jerome From: Dawid Weiss To: java-user@lucene.apache.org, Date: 01/18/2013 02:52 PM Subject:Re: Japanese analyzer Jerome, Some of the tokens are removed because their pa

Re: Japanese analyzer

2013-01-18 Thread Dawid Weiss
Jerome, Some of the tokens are removed because their part of speech tags are in the stoptags file? That's my guess at least -- you can always try to copy/paste Japanese analyzer and change the token stream components: protected TokenStreamComponents createComponents(String fieldName, Reader rea

Re: Japanese analyzer

2013-01-18 Thread Jerome Lanneluc
Thanks for your answer. No those words are not part of the stop word file (I'm using the one that comes with the Japanese analyzer in lucene-kuromoji-3.6.1.jar. My Japanese contact told me that the first sentence means "I am Japanese" and the second one is a unit of length. Jerome From: S

Re: Japanese analyzer

2013-01-18 Thread Swapnil Patil
Hi, I just translated these words, using google translate look like Japanese I [ Can you check if these words are in your stopword file. if these words exits in your stop word file than you will not get them in token stream. -Swapnil On Fri, Jan 18, 2013 at 6:58 PM, Jerome Lanneluc wrote: >

Japanese analyzer

2013-01-18 Thread Jerome Lanneluc
I have searched this mailing list but I could not find the answer to the following problem. I'm using the 3.6.1 Japanese analyzer and it seems that when tokenizing some Japanese words, some characters are ignored and they are not returned in the tokens. In the attached example, the output is:

Re: Document term vectors in Lucene 4

2013-01-18 Thread Ian Lea
To get stats from the whole index I think you need to come at this from a different direction. See the 4.0 migration guide for some details. With a variation on your code and 2 docs doc1: foobar qux quote doc2: foobar qux qux quorum this code snippet Fields fields = MultiFields.getFiel

Re: Combine two BooleanQueries by a SpanNearQuery.

2013-01-18 Thread Michel Conrad
I created a feature request: https://issues.apache.org/jira/browse/LUCENE-4696 Thanks for your help, Michel On Thu, Jan 17, 2013 at 6:33 PM, Jack Krupansky wrote: > Currently there isn't. SpanNearQuery can take only other SpanQuery objects, > which includes other spans, span terms, and span wrap