Thanks. We could this feature be expected 2.9 OR 3.0?
This is a good feature. Mostly Users store sparse data. All records may not
have data for all fields. This will reduce the memory consumption to a large
extent. In my case almost 30% of records just store information of reference
file pointe
Anyone have any numbers? I couldn't find complete info in the Trie*
JIRA issues, esp relating to size increase in the index.
There was this:
> The indexes each contain 13 numeric, tree encoded fields (doubles and Dates).
> Index size (including the "normal" fields) was:
>
>* 8bit: 4.8 GiB
>
P.S. Storing should be irrelevant.
On Mon, Jul 20, 2009 at 8:13 PM, Erick Erickson wrote:
> Describe a bit more, please, what "does not seem to work" means.
>
> For instance, if you're searching for the doc and haven't reopened your
> index, you won't see changes.
>
> Better yet, a small, self-c
Describe a bit more, please, what "does not seem to work" means.
For instance, if you're searching for the doc and haven't reopened your
index, you won't see changes.
Better yet, a small, self-contained test case would be even better. I've
often
found my problem trying to write a test to illustra
If I have a field:
Field f = new Field( "F", "foo", Field.Store.YES,
Field.Index.NOT_ANALYZED );
can I later do:
Term t = new Term( "F", "foo" );
myIndexWriter.deleteDocuments( t );
and have it work even though the field is Field.Store.YES ? Does the
YES/NO make any diff
Never mind, I think I got it.
-Original Message-
From: OBender [mailto:osya_ben...@hotmail.com]
Sent: Monday, July 20, 2009 4:42 PM
To: java-user@lucene.apache.org
Subject: RE: question on custom filter
No, it reversed in the e-mail. Funny though, when I insert it in to the Excel
it tur
No, it reversed in the e-mail. Funny though, when I insert it in to the Excel
it turns to the right order of words.
Thanks for all the help.
Maybe you have an idea on what could be the problem.
Here is how my data gets read and indexed.
I have a UTF-8 CSV file that is produced from Excel.
I read
Ok, it makes a lot of sense (the input being incorrect).
Let's just verify that :)
At the end of the line:
"but the text you sent as an example was" what I see is word TOV [טוֹב] on the
left and EREV [עֶרֶב] on the right.
So it reads (for me) EREV TOV which is correct.
At the end of the line:
"
Obender, does the following text appear like the image in the link, or not?
שומר אחי
http://farm1.static.flickr.com/3/10445435_75b4546703.jpg?v=0
On Mon, Jul 20, 2009 at 3:34 PM, OBender wrote:
> I've checked, and it appears to be enabled.
>
> -Original Message-
> From: Robert Muir [mai
I've checked, and it appears to be enabled.
-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com]
Sent: Monday, July 20, 2009 3:18 PM
To: java-user@lucene.apache.org
Subject: Re: question on custom filter
Obender, based on your previous comments (that you see text displayed
in t
Obender, I think your input is incorrect. The hebrew text you pasted
in your example appears incorrect. Its gonna be hard for me to
communicate this since I think your computer is not displaying hebrew
correctly :)
but the text you sent as an example was [טוֹב עֶרֶב]
Shouldn't the adjective follo
Interesting, the question now is why am I seeing (even in println) what I'm
seeing :)
I'm reading a string from the file which is in UTF-8 encoding. Could this
somehow be related...?
-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com]
Sent: Monday, July 20, 2009 3:03 PM
To: j
Obender, based on your previous comments (that you see text displayed
in the wrong order), I again recommend that you enable support for RTL
languages in your operating system, as I mentioned earlier... are you
using a Windows-based OS, this is not enabled by default!
I think you are seeing things
Obender, i ran your code and it did what I expected (but not what you pasted):
First token is: (טוֹב,0,4)
Second token is: (עֶרֶב,5,10)
I also loaded up your SimpleWhitespaceAnalyzer in Luke, with the same results.
On Mon, Jul 20, 2009 at 2:53 PM, OBender wrote:
> Here is the simple code. If you
Here is the simple code. If you run it with English and with Hebrew you will
see that in case of English tokens returned from the left of the phrase to the
right and with Hebrew from the right to the left.
Again I'm talking about tokens not the individual letters here.
public class XFilter exte
Obender, I think something in your environment / display environment
might be causing some confusion.
Are you using microsoft windows? If so, please verify that support for
right-to-left languages is enabled [control panel/regional and
language options]. It is possible you are "seeing something di
This is how it should be written:
http://unicode.org/cldr/utility/transform.jsp?a=name&b=%D7%A2%D6%B6%D7%A8%D6%B6%D7%91+%D7%98%D7%95%D6%B9%D7%91
-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com]
Sent: Monday, July 20, 2009 2:07 PM
To: java-user@lucene.apache.org
Subject: Re:
Hold on a second, the phrase that you included link to is not in the correct
order of words!
-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com]
Sent: Monday, July 20, 2009 2:07 PM
To: java-user@lucene.apache.org
Subject: Re: question on custom filter
Obender, This is not tru
Well, the only thing I can say is that the order of tokens I've presented is
what I see in the debugger.
It is what input.next(reusableToken) gives me, in that exact order and with
that exact indexes.
-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com]
Sent: Monday, July 20,
Obender, This is not true.
the text you pasted is the following in unicode:
\N{HEBREW LETTER TET}
\N{HEBREW LETTER VAV}
\N{HEBREW POINT HOLAM}
\N{HEBREW LETTER BET}
\N{SPACE}
\N{HEBREW LETTER AYIN}
\N{HEBREW POINT SEGOL}
\N{HEBREW LETTER RESH}
\N{HEBREW POINT SEGOL}
\N{HEBREW LETTER BET}
you can
Robert,
I'm not sure you are correct on this one.
If I have a Hebrew phrase:
[טוֹב עֶרֶב]
Then first token that filter receives is:
[עֶרֶב] (0,5)
and the second is:
[טוֹב] (6,10)
Which means that it counts from right to left (words and indexes).
Am I missing something?
-Original Message
Obender, I don't think its as difficult as you think. Your filter does
not need to be aware of this issue at all.
In unicode, right-to-left languages are encoded in the data in logical order.
The rendering system is what converts it to display in right-to-left
for RTL languages.
For example in Ar
Right now, you can't really do anything about it. In the future, with the
new FieldCache API that may go in, you could plug in a custom implementation
that makes tradeoffs for a sparse array of some kind. The docid is currently
the index into the array, but with a custom impl you may be able to use
Hi All!
Let say I have a filter that produces new tokens based on the original ones.
How bad will it be if my filter sets the start of each token to 0 and end to
the length of a token?
An example (based on the phrase "How are you?":
Original token:
[you?] (8,12)
New tokens:
[you]
Hello all,
What is the difference in using RangeFilter and ConstantScoreRangeQuery? Any
difference in performance?
I am using datetime field (MMDDhhmm), If i store the field with date
precision (MMDD), Will the range filter be faster?
Regards
Ganesh
Send instant messages to your onlin
I am using Paul's awesome Surround Query Parser that utilizes SpanQuery family.
I integrated it into Solr. Nested proximity searches perfectly works.
However it seems that it does not use an Analyzer to analyze query words.
The main reason I need an analyzer at query time is stemming.
(also deas
Any ideas on this??
Regards
Ganesh
- Original Message -
From: "Ganesh"
To:
Sent: Friday, July 17, 2009 2:42 PM
Subject: Sorting field contating NULL values consumes field cache memory
I am doing sorting on DateTime with minute resolution. I am having 90 million
of records and sortin
27 matches
Mail list logo