This is unlikely to work well/fast. It will depend on the size of the index
(not in terms of the number of docs, but its physical size), the number of
queries/second and desired query latency. If you can wait 10 seconds to get a
query and if only a few queries are hitting the server at any one
: Frustrated is the word :) I have looked at Solr...what I am worried
: about there is this: Solr says it requires an OS that supports hard
: links. Currently Windows does not to my knowledge. Someone seemed to
: make a comment that Windows could be supported...from what I know I
: don't think so.
The single server is important because I think it will take a lot of
work to scale it to multiple servers. The index must allow for close to
real-time updates and additions. It must also remain searchable at all
times (other than than during the brief period of single updates and
additions). If
Why is a single server so important? I can scale horizontally much cheaper
than I scale vertically.
On 8/11/06, Mark Miller <[EMAIL PROTECTED]> wrote:
I've made a nice little archive application with lucene. I made it to
handle our largest need: 2.5 million docs or so on a single server. Now
Frustrated is the word :) I have looked at Solr...what I am worried
about there is this: Solr says it requires an OS that supports hard
links. Currently Windows does not to my knowledge. Someone seemed to
make a comment that Windows could be supported...from what I know I
don't think so. Not a
As Jason says, you can structure each Lucene document with one Field per
content type, and index all data that way. The database is not required.
To address your search complexity concern, you can create queries that
search only those Field(s) the user wants -- there is no need to have a
Field fo
i've indexed 80m records and now up to 200m.. it can be done, and could've
been done better. like the other said, architecture is important. have you
considered looking into solr? i haven't kept up with it (and many of the
mailing lists...), but looks very interesting.
ray,
On 8/12/06, Jason
This strategy can also be nicely abstracted from your main app. Whilst I
haven't yet implemented it, my plan is to create a template style structure
which tells me which fields are in lucene, and which are externalized. This
way I don't bother storing data in lucene that it stored elsewhere, but
Sounds like you're a bit frustrated. Cheer up, the simple fact is that
engineering and business rarely see eye-to-eye. Just focus on the fact that
what you have learnt from the process will help you, and they paid for it ;)
On the issue at hand...Lucene should scale to this level, but you need
IMO you should avoid storing any data in the index that you don't need for
display. Lucene is an index (and a damn good one), not a database. If you
find yourself storing large amounts of data in the index, this could be an
indication that you may need to re-think your architecture.
In its simp
Maybe I'm not understanding your requirement, but this should be fairly
simple in Lucene.
Each document in your document management system would be represented by a
single Lucene document in the index. Each lucene document will then have
several fields, each field representing the values of the
Hi all;
We have got a Document management system and we want to build a search on it.
We have tree kind of content in our system: Refers, Documents and Attachments.
A document can have multiple attachments and can be Refered to many users.
Our users want to be able to search on documents attachme
12 matches
Mail list logo