Re: Sorting field contating NULL values consumes field cache memory

2009-07-20 Thread Ganesh
Thanks. We could this feature be expected 2.9 OR 3.0? This is a good feature. Mostly Users store sparse data. All records may not have data for all fields. This will reduce the memory consumption to a large extent. In my case almost 30% of records just store information of reference file pointe

trie* space-time tradeoff

2009-07-20 Thread Yonik Seeley
Anyone have any numbers? I couldn't find complete info in the Trie* JIRA issues, esp relating to size increase in the index. There was this: > The indexes each contain 13 numeric, tree encoded fields (doubles and Dates). > Index size (including the "normal" fields) was: > >* 8bit: 4.8 GiB >

Re: IndexWriter.deleteDocuments(Term) and Field.Store.YES

2009-07-20 Thread Erick Erickson
P.S. Storing should be irrelevant. On Mon, Jul 20, 2009 at 8:13 PM, Erick Erickson wrote: > Describe a bit more, please, what "does not seem to work" means. > > For instance, if you're searching for the doc and haven't reopened your > index, you won't see changes. > > Better yet, a small, self-c

Re: IndexWriter.deleteDocuments(Term) and Field.Store.YES

2009-07-20 Thread Erick Erickson
Describe a bit more, please, what "does not seem to work" means. For instance, if you're searching for the doc and haven't reopened your index, you won't see changes. Better yet, a small, self-contained test case would be even better. I've often found my problem trying to write a test to illustra

IndexWriter.deleteDocuments(Term) and Field.Store.YES

2009-07-20 Thread Paul J. Lucas
If I have a field: Field f = new Field( "F", "foo", Field.Store.YES, Field.Index.NOT_ANALYZED ); can I later do: Term t = new Term( "F", "foo" ); myIndexWriter.deleteDocuments( t ); and have it work even though the field is Field.Store.YES ? Does the YES/NO make any diff

RE: question on custom filter

2009-07-20 Thread OBender
Never mind, I think I got it. -Original Message- From: OBender [mailto:osya_ben...@hotmail.com] Sent: Monday, July 20, 2009 4:42 PM To: java-user@lucene.apache.org Subject: RE: question on custom filter No, it reversed in the e-mail. Funny though, when I insert it in to the Excel it tur

RE: question on custom filter

2009-07-20 Thread OBender
No, it reversed in the e-mail. Funny though, when I insert it in to the Excel it turns to the right order of words. Thanks for all the help. Maybe you have an idea on what could be the problem. Here is how my data gets read and indexed. I have a UTF-8 CSV file that is produced from Excel. I read

RE: question on custom filter

2009-07-20 Thread OBender
Ok, it makes a lot of sense (the input being incorrect). Let's just verify that :) At the end of the line: "but the text you sent as an example was" what I see is word TOV [טוֹב] on the left and EREV [עֶרֶב] on the right. So it reads (for me) EREV TOV which is correct. At the end of the line: "

Re: question on custom filter

2009-07-20 Thread Robert Muir
Obender, does the following text appear like the image in the link, or not? שומר אחי http://farm1.static.flickr.com/3/10445435_75b4546703.jpg?v=0 On Mon, Jul 20, 2009 at 3:34 PM, OBender wrote: > I've checked, and it appears to be enabled. > > -Original Message- > From: Robert Muir [mai

RE: question on custom filter

2009-07-20 Thread OBender
I've checked, and it appears to be enabled. -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Monday, July 20, 2009 3:18 PM To: java-user@lucene.apache.org Subject: Re: question on custom filter Obender, based on your previous comments (that you see text displayed in t

Re: question on custom filter

2009-07-20 Thread Robert Muir
Obender, I think your input is incorrect. The hebrew text you pasted in your example appears incorrect. Its gonna be hard for me to communicate this since I think your computer is not displaying hebrew correctly :) but the text you sent as an example was [טוֹב עֶרֶב] Shouldn't the adjective follo

RE: question on custom filter

2009-07-20 Thread OBender
Interesting, the question now is why am I seeing (even in println) what I'm seeing :) I'm reading a string from the file which is in UTF-8 encoding. Could this somehow be related...? -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Monday, July 20, 2009 3:03 PM To: j

Re: question on custom filter

2009-07-20 Thread Robert Muir
Obender, based on your previous comments (that you see text displayed in the wrong order), I again recommend that you enable support for RTL languages in your operating system, as I mentioned earlier... are you using a Windows-based OS, this is not enabled by default! I think you are seeing things

Re: question on custom filter

2009-07-20 Thread Robert Muir
Obender, i ran your code and it did what I expected (but not what you pasted): First token is: (טוֹב,0,4) Second token is: (עֶרֶב,5,10) I also loaded up your SimpleWhitespaceAnalyzer in Luke, with the same results. On Mon, Jul 20, 2009 at 2:53 PM, OBender wrote: > Here is the simple code. If you

RE: question on custom filter

2009-07-20 Thread OBender
Here is the simple code. If you run it with English and with Hebrew you will see that in case of English tokens returned from the left of the phrase to the right and with Hebrew from the right to the left. Again I'm talking about tokens not the individual letters here. public class XFilter exte

Re: question on custom filter

2009-07-20 Thread Robert Muir
Obender, I think something in your environment / display environment might be causing some confusion. Are you using microsoft windows? If so, please verify that support for right-to-left languages is enabled [control panel/regional and language options]. It is possible you are "seeing something di

RE: question on custom filter

2009-07-20 Thread OBender
This is how it should be written: http://unicode.org/cldr/utility/transform.jsp?a=name&b=%D7%A2%D6%B6%D7%A8%D6%B6%D7%91+%D7%98%D7%95%D6%B9%D7%91 -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Monday, July 20, 2009 2:07 PM To: java-user@lucene.apache.org Subject: Re:

RE: question on custom filter

2009-07-20 Thread OBender
Hold on a second, the phrase that you included link to is not in the correct order of words! -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Monday, July 20, 2009 2:07 PM To: java-user@lucene.apache.org Subject: Re: question on custom filter Obender, This is not tru

RE: question on custom filter

2009-07-20 Thread OBender
Well, the only thing I can say is that the order of tokens I've presented is what I see in the debugger. It is what input.next(reusableToken) gives me, in that exact order and with that exact indexes. -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Monday, July 20,

Re: question on custom filter

2009-07-20 Thread Robert Muir
Obender, This is not true. the text you pasted is the following in unicode: \N{HEBREW LETTER TET} \N{HEBREW LETTER VAV} \N{HEBREW POINT HOLAM} \N{HEBREW LETTER BET} \N{SPACE} \N{HEBREW LETTER AYIN} \N{HEBREW POINT SEGOL} \N{HEBREW LETTER RESH} \N{HEBREW POINT SEGOL} \N{HEBREW LETTER BET} you can

RE: question on custom filter

2009-07-20 Thread OBender
Robert, I'm not sure you are correct on this one. If I have a Hebrew phrase: [טוֹב עֶרֶב] Then first token that filter receives is: [עֶרֶב] (0,5) and the second is: [טוֹב] (6,10) Which means that it counts from right to left (words and indexes). Am I missing something? -Original Message

Re: question on custom filter

2009-07-20 Thread Robert Muir
Obender, I don't think its as difficult as you think. Your filter does not need to be aware of this issue at all. In unicode, right-to-left languages are encoded in the data in logical order. The rendering system is what converts it to display in right-to-left for RTL languages. For example in Ar

Re: Sorting field contating NULL values consumes field cache memory

2009-07-20 Thread Mark Miller
Right now, you can't really do anything about it. In the future, with the new FieldCache API that may go in, you could plug in a custom implementation that makes tradeoffs for a sparse array of some kind. The docid is currently the index into the array, but with a custom impl you may be able to use

question on custom filter

2009-07-20 Thread OBender
Hi All! Let say I have a filter that produces new tokens based on the original ones. How bad will it be if my filter sets the start of each token to 0 and end to the length of a token? An example (based on the phrase "How are you?": Original token: [you?] (8,12) New tokens: [you]

RangeFilter and ConstantScoreRangeQuery

2009-07-20 Thread Ganesh
Hello all, What is the difference in using RangeFilter and ConstantScoreRangeQuery? Any difference in performance? I am using datetime field (MMDDhhmm), If i store the field with date precision (MMDD), Will the range filter be faster? Regards Ganesh Send instant messages to your onlin

analyzer in surround query parser

2009-07-20 Thread AHMET ARSLAN
I am using Paul's awesome Surround Query Parser that utilizes SpanQuery family. I integrated it into Solr. Nested proximity searches perfectly works. However it seems that it does not use an Analyzer to analyze query words. The main reason I need an analyzer at query time is stemming. (also deas

Re: Sorting field contating NULL values consumes field cache memory

2009-07-20 Thread Ganesh
Any ideas on this?? Regards Ganesh - Original Message - From: "Ganesh" To: Sent: Friday, July 17, 2009 2:42 PM Subject: Sorting field contating NULL values consumes field cache memory I am doing sorting on DateTime with minute resolution. I am having 90 million of records and sortin