Re: 30 milllion+ docs on a single server

2006-08-12 Thread Otis Gospodnetic
This is unlikely to work well/fast. It will depend on the size of the index (not in terms of the number of docs, but its physical size), the number of queries/second and desired query latency. If you can wait 10 seconds to get a query and if only a few queries are hitting the server at any one

Re: 30 milllion+ docs on a single server

2006-08-12 Thread Chris Hostetter
: Frustrated is the word :) I have looked at Solr...what I am worried : about there is this: Solr says it requires an OS that supports hard : links. Currently Windows does not to my knowledge. Someone seemed to : make a comment that Windows could be supported...from what I know I : don't think so.

Re: 30 milllion+ docs on a single server

2006-08-12 Thread Mark Miller
The single server is important because I think it will take a lot of work to scale it to multiple servers. The index must allow for close to real-time updates and additions. It must also remain searchable at all times (other than than during the brief period of single updates and additions). If

Re: 30 milllion+ docs on a single server

2006-08-12 Thread Jeff Rodenburg
Why is a single server so important? I can scale horizontally much cheaper than I scale vertically. On 8/11/06, Mark Miller <[EMAIL PROTECTED]> wrote: I've made a nice little archive application with lucene. I made it to handle our largest need: 2.5 million docs or so on a single server. Now

Re: 30 milllion+ docs on a single server

2006-08-12 Thread Mark Miller
Frustrated is the word :) I have looked at Solr...what I am worried about there is this: Solr says it requires an OS that supports hard links. Currently Windows does not to my knowledge. Someone seemed to make a comment that Windows could be supported...from what I know I don't think so. Not a

Re: Indexing Documents which has Attachments and are Refered many times!!

2006-08-12 Thread Steven Rowe
As Jason says, you can structure each Lucene document with one Field per content type, and index all data that way. The database is not required. To address your search complexity concern, you can create queries that search only those Field(s) the user wants -- there is no need to have a Field fo

Re: 30 milllion+ docs on a single server

2006-08-12 Thread Ray Tsang
i've indexed 80m records and now up to 200m.. it can be done, and could've been done better. like the other said, architecture is important. have you considered looking into solr? i haven't kept up with it (and many of the mailing lists...), but looks very interesting. ray, On 8/12/06, Jason

Re: updating document

2006-08-12 Thread Jason Polites
This strategy can also be nicely abstracted from your main app. Whilst I haven't yet implemented it, my plan is to create a template style structure which tells me which fields are in lucene, and which are externalized. This way I don't bother storing data in lucene that it stored elsewhere, but

Re: 30 milllion+ docs on a single server

2006-08-12 Thread Jason Polites
Sounds like you're a bit frustrated. Cheer up, the simple fact is that engineering and business rarely see eye-to-eye. Just focus on the fact that what you have learnt from the process will help you, and they paid for it ;) On the issue at hand...Lucene should scale to this level, but you need

Re: WIll storing docs affect lucene's search performance ?

2006-08-12 Thread Jason Polites
IMO you should avoid storing any data in the index that you don't need for display. Lucene is an index (and a damn good one), not a database. If you find yourself storing large amounts of data in the index, this could be an indication that you may need to re-think your architecture. In its simp

Re: Indexing Documents which has Attachments and are Refered many times!!

2006-08-12 Thread Jason Polites
Maybe I'm not understanding your requirement, but this should be fairly simple in Lucene. Each document in your document management system would be represented by a single Lucene document in the index. Each lucene document will then have several fields, each field representing the values of the

Indexing Documents which has Attachments and are Refered many times!!

2006-08-12 Thread Shaghayegh Sahebie
Hi all; We have got a Document management system and we want to build a search on it. We have tree kind of content in our system: Refers, Documents and Attachments. A document can have multiple attachments and can be Refered to many users. Our users want to be able to search on documents attachme