Thanks Otis. The download link sent via email has file called cemail. There is
no extn. I tried with html,pdf but it is not opening properly.
Regards
Ganesh
- Original Message -
From: "Otis Gospodnetic"
To:
Sent: Wednesday, January 20, 2010 11:54 AM
Subject: Re: Lucene as a primary da
Have you seen the "Hot Backups with Lucene" paper available via
http://www.manning.com/hatcher3/ ?
Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
- Original Message
> From: Ganesh
> To: java-user@lucene.apache.org
> Sent: Wed, January 20, 2010 1:13:21 AM
> Subjec
We have data in compound files and we use Lucene as primary database. Its
working great and much faster with millions of records. The only issue, I face
is with sorting. Lucene sorting consumes good amount of memory. I don't know
much about the MySQL/PostgreSQL database, and how they behave with
You are not alone, Guido. It's a good question. In my experience, Lucene is
as stable as MySQL/PostgreSQL in terms of its ability to hold your data and not
corrupt it. Of course, even with the most expensive databases, you'd want to
make backups. The same goes with Lucene. Nowadays, one way
Hi Chris,
It's not actively being worked on. Are you interested in working on it?
Jason
On Tue, Jan 19, 2010 at 4:42 PM, Chris Harris wrote:
> I'm interested in the Tag Index patch (LUCENE-1292), in particular
> because of how it enables you to modify certain fields without
> reindexing a whol
I know that the primary use case for Lucene is as an index of data
that can be reconstructed (e.g., from a relational database or from
spidering your corporate intranet).
But, I'm curious if anyone uses Lucene as their primary datastore for
their gold data. Is it good enough?
Would anyone conside
> I see -- so your file format allows you to append to the same file
> without affecting prior readers? We never do that in Lucene today
> (all files are "write once").
Yes. For the most part it only appends. The exception is when the
log's entry count is updated (when the appends actually "commi
> Here are some questions about unary
> operators and operator precedence or default order of
> operation.
>
> We all know the importance of order of operation of binary
> operators (ones that operate on two operands) such as AND
> and OR. We know how to impose express order of operation by
> grou
I'm interested in the Tag Index patch (LUCENE-1292), in particular
because of how it enables you to modify certain fields without
reindexing a whole document. However, that issue is marked Lucene
2.3.1 and hasn't been updated since July 2008. Can anyone provide any
status updates on this patch? Que
: I'm about to embark on implementing the full-text search feature of XQuery:
Good luck with that.
Here's some quick suggestions on how i'd try to tackle the things you
asked about, w/o putting much thought into...
: title ftcontains "usability" occurs at least 2 times
assuming this is
We have been able to expose this exception under system load but NOT with
individual requests.
Lucene version is 2.9.1. These indexed files are being read over NFS.
Java version is:
Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mod
Have a look at Mahout (Lucene sister project), which can create SparseVectors
from Lucene term vectors where the entries are the term id and the "weight" of
the term. Trivial to replicate what is done in Mahout for LibSVM or ARFF or
whatever.
On Jan 18, 2010, at 9:07 AM, Solt, Illés wrote:
>
> For proximity expressions, the query
> parser documentation says, "use the tilde, "~", symbol at
> the end of a Phrase." It gives the example "jakarta
> apache"~10
>
> Does this mean that proximity can only be operated on
> single words enquoted in quotation marks?
Yes if you are using QueryPar
> 3.) Does grouping or nesting affect results with unary operators? Does
> using unary operators with binary operators affect results. For example,
> in the query:
>
> (+a +b) OR c
>
> has the "required" effect of the + (plus) operator been eliminated by
> the OR operator, so that nevermin
Here are some questions about unary operators and operator precedence or
default order of operation.
We all know the importance of order of operation of binary operators
(ones that operate on two operands) such as AND and OR. We know how to
impose express order of operation by grouping and nes
What's a reasonable upper limit on the number of files? Because I think it
would be simpler, at least to start, to allow your field to be larger (say,
1B
tokens, 1,000 files of 1M tokens each), but restrict the input of each file
to 1M tokens per file. The most elegant way would probably be to
subc
You can simple index both "files" and "cards" into same index (no need
for 2 indexes)
Lucene easily support documents of different structure.
You may add some boosting per field or document, and tune similarity
to get most important stuff in top.
On Tue, Jan 19, 2010 at 16:35, Anna Hunecke wro
For proximity expressions, the query parser documentation says, "use the
tilde, "~", symbol at the end of a Phrase." It gives the example "jakarta
apache"~10
Does this mean that proximity can only be operated on single words enquoted
in quotation marks? To clarify the question by comparision,
The field size is restricted to 1 million tokens, because of the very reasons
you mentioned.
So, even if I have one separate field for the content of a file, I might reach
the limit if the file is really big. But I can't help that. What I want to
avoid is that the whole content of some files can
What field size limit are you talking about here? Because 10,000
tokens is the default, but you can increase it to Integer.MAX_VALUE.
So are you really talking billions of tokens here? Your index
quickly becomes unmanageable if you're allowing it to grow
by such increments.
One can argue, IMO, th
Index is pretty large (50GB, divided into 8 shards). I'm afraid I would
start running into memory issues by adding the stop words (though it is
definitely something I would like to test at some point).
My question was more to try to understand if this was known behavior in
lucene, since I can't re
How big is your index? Because the simplest thing would be
to just not remove stopwords at index or query time. Perhaps
in a duplicate field depending upon your needs.
Erick
On Tue, Jan 19, 2010 at 6:50 AM, Avi Rosenschein wrote:
> Hi,
>
> I am using PhraseQuery with explicitly set term position
Hi!
I have been working with Lucene for a while now. So far, I found helpful tips
on this list, so I hope somebody can help me with my problem:
In our app information is grouped in so-called cards. Now, it should be made
possible to also search on files linked to the cards. You can link arbitrar
Hi,
I am using PhraseQuery with explicitly set term positions and slop=0, in
order to skip stop words. The field in my index is indexed with TermVector
positions.
When I do a query with stop words skipped, for example "internet for
research" (translated into PhraseQuery: "internet ? research"), I
On Tue, Jan 19, 2010 at 1:32 AM, Babak Farhang wrote:
>> This is about multiple sessions with the writer. Ie, open writer,
>> update a few docs, close. Do the same again, but, that 2nd session
>> cannot overwrite the same files from the first one, since readers may
>> have those files open. The
25 matches
Mail list logo