On Fri, Jan 25, 2013 at 6:23 PM, Shai Erera wrote:
> Hi
>
> Are the values of 'a' and 'b' known in advance? Is it a limited set of
> values? Are you always interested in a table which covers all values?
>
> If so, one way to do that is to each value of 'a' against all values of
> 'b'. Of course,
Hi Mike,
Thanks for your reply..
MY Scenario is I am creating Lucene Index with Two Fields
1.Filename
2.File Contents
For Example I initially added fields FileName:-say LuceneInAction.pdf
which is not analysed FileContents:Content of the Book it is analysed
using custom analyzer.
Now what is t
Hi,
Random data was indexed. I wanted to see the worst case where little data
is same across documents and which in most of my cases is.
So, i guess in these scenarios compression becomes an overhead.
Arun
On Thu, Jan 31, 2013 at 8:00 PM, Robert Muir wrote:
> The top method here is your ra
On Thu, Jan 31, 2013 at 7:31 AM, Rolf Veen wrote:
> Thank you, Mike.
>
> I didn't state why I need this. I want to be able to send
> a query to some QueryParser that understands "field:1"
> regardless if 'field' was added as StringField or LongField,
> for example. I do not want to rely on schema
On Thu, Jan 31, 2013 at 7:56 AM, Trejkaz wrote:
> On Thu, Jan 31, 2013 at 11:05 PM, Michael McCandless
> wrote:
>> It's confusing, but you should never try to re-index a document you
>> retrieved from a searcher, because certain index-time details (eg,
>> whether a field was tokenized) are not pr
Unfortunately, t's not possible/easy to just add one new field to all
existing docs ... there are several issues open to do this, eg see
https://issues.apache.org/jira/browse/LUCENE-4258 and LUCENE-3837 and
LUCENE-4272.
Mike McCandless
http://blog.mikemccandless.com
On Thu, Jan 31, 2013 at 8:00
Is it by design. The older API (2.4) does not have this problem. Lets say if
I have 100 updates or so.. then it will create 100 versions of those files
in the index. This would increase the number of files in the index directory
and might run into some file issues?
It would be good to just have th
On Thu, Jan 31, 2013 at 2:52 PM, George Kelvin
wrote:
> Thank you! That is the problem! I changed the maxExpansions to 100 and the
> results are found.
Phew!
> About my second question, the ranking of wildcard fuzzy search, can you
> also give some suggestions? Thanks!
This is tricky, eg see h
Then those files are expected.
Your 2nd open was with APPEND, which means newly indexed documents are
written into a new set of files.
Lucene is segment based, so your first batch of documents are in
segment _0, while your second batch is in _1 and _2.
Mike McCandless
http://blog.mikemccandless
It's _0.si ( typo)
For second update, create = "false".
Thanks,
Sai.
--
View this message in context:
http://lucene.472066.n3.nabble.com/IndexWriterConfig-OpenMode-CREATE-vs-OpenMode-APPEND-index-files-tp4037766p4037785.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com
Hi Jack, sorry for confusing you. I understand that it would be great if a
minimal data set can be provided to repro the problem. But I was unable to
do that..
Hi Michael,
Thank you! That is the problem! I changed the maxExpansions to 100 and the
results are found.
About my second question, the
I don't know what _0.csi is ... was that supposed to be _0.si?
Did you pass create=true or false for the 2nd update?
Mike McCandless
http://blog.mikemccandless.com
On Thu, Jan 31, 2013 at 1:39 PM, saisantoshi wrote:
> I am using the following below for creating the IndexWriter (for my
> indexi
I am using the following below for creating the IndexWriter (for my
indexing):
IndexWriterConfig indexWriterConfig = new
IndexWriterConfig(Version.LUCENE_40,
new LimitTokenCountAnalyzer(analyzer,
MAX_FIELD_SCAN_LENGTH));
if (create) { // create will be trure for indexing
Hello!
I want to perform a SpanQuery and get the precise overall number of all hits
throughout the entire index (i.e. if the query words combination appears
multiple times in a document, I need that number counted).
I've found a method called SpanQuery.getSpans, but the way of using it in the
s
The top method here is your random string generation.
are you indexing random data?
On Thu, Jan 31, 2013 at 12:46 AM, arun k wrote:
> Hi,
>
> Please find the snapshots here.
> http://picpaste.com/Lucene3.0.2-G00Z5FfX.png
> http://picpaste.com/Lucene4.1-LsxpcQk0.png
>
> Arun
>
>
> On Wed, Jan 30,
Oh, so you wanted "similar" words! You should have said so... your inquiry
said you were looking for "related" words. So, which is it? More
specifically, what exactly are you looking for, in terms of the semantics?
In any case, "find similar" (MoreLikeThis) is about the best you can do out
of t
wgggfiy wrote:
en, it seems nice, but I'm puzzled by you and Andrew Gilmartina above,
what's the difference between you guys ?
The different is that similar documents do not give you similar terms. Similar
documents can show a correlation of terms -- ie, whereever Lucene is mentioned
so is So
Hi all.
We have an application which has been around for so long that it's
still using doc IDs to key to an external database.
Obviously this won't work forever (even in Lucene 3.x we had to use a
custom merge policy to keep it working) so we want to introduce
application IDs eventually. We have
On Thu, Jan 31, 2013 at 11:05 PM, Michael McCandless
wrote:
> It's confusing, but you should never try to re-index a document you
> retrieved from a searcher, because certain index-time details (eg,
> whether a field was tokenized) are not preserved in the stored
> document.
>
> Instead, you shoul
Thank you, Mike.
I didn't state why I need this. I want to be able to send
a query to some QueryParser that understands "field:1"
regardless if 'field' was added as StringField or LongField,
for example. I do not want to rely on schema information
if I can avoid it, and rather use a smart QueryPar
I haven't used it myself, but I did find this for atomic updates:
http://www.mumuio.com/solrj-4-0-0-alpha-atomic-updates/
Don't know if there really is need for specific support in SolrJ for RTG,
isn't that all over on the Solr side and automagic?
Best
Erick
On Wed, Jan 30, 2013 at 5:47 PM, Dye
On Thu, Jan 31, 2013 at 7:07 AM, Gili Nachum wrote:
> So, when loading the results I want to return (say 10 documents), if not
> all docs fit in RAM, I would incur up to 10 individual disk seek
> operations. Which will kill my performance. Is that correct?
Yes, 10 seeks, and that may or may not
Getting the FieldInfos from each AtomicReader is the right approach!
But, FieldInfos won't tell you which XXXField class was used for the
indexing: that information is not fully preserved ...
Mike McCandless
http://blog.mikemccandless.com
On Thu, Jan 31, 2013 at 6:33 AM, Rolf Veen wrote:
> Hel
Hi Mike,
So, when loading the results I want to return (say 10 documents), if not
all docs fit in RAM, I would incur up to 10 individual disk seek
operations. Which will kill my performance. Is that correct?
Considering what are my alternatives:
1. Create another separate lean index that would f
It's confusing, but you should never try to re-index a document you
retrieved from a searcher, because certain index-time details (eg,
whether a field was tokenized) are not preserved in the stored
document.
Instead, you should re-build the document yourself, setting the right
details per-Field, a
Hi All,
I am having a basic doubt..
I am trying to update a lucene document field with a new value..
The below is my code.. It is not giving any errors and also it is not
updating the document with field.
Document d = searcher.doc(docId);
writer1 = new IndexWriter(csDirectory, new
IndexWriterC
26 matches
Mail list logo