Thanks for all of the useful info on this topic. You have been very
enlightening. My RAM requirements where obviously off the mark. Here is
my current understanding of this issue:
A standard 32-bit processor has access to 4GB of RAM. If your CPU
supports Physical Address Extension (PAE) the OS can access up to 64GB
of RAM, although a single application is still limited to 4GB. In
Windows, Address Windowing Extensions (AWE) solves this limitation but
you must use OS specific calls to manage your memory. Unix/Linux has
it's own memory mapping scheme that accomplishes something similar to AWE.
I have no problem requiring that the server i use be maxed to the
gills. I just want to support Windows as well as Unix/Linux/Sun. I was
shortsighted in my RAM comments.
As far as Solr: it looks like this will not necessarily solve my problem
of handling an index so big on a single server--the entire index is
replicated across each slave searcher--but if load is a factor
(according to Hoss, load will probably be the big issue, not the size of
the index) than this distribution is exactly what I need. I think that
the average load I can expect will be pretty low though. Updates to the
large index will also be relatively rare. 30-200 items added every
morning with the occasional update scattered throughout the day. There
will only be a few hundred searchers in total and they will not be
hammering the system but consulting it here and there throughout the day.
When I said that Solr required hardlinks I was referring to the
replication feature. My fault--replication was the benefit I was seeing
from solar and I was unclear. I bet solr's caching helps a lot too and I
will be looking into that. What I did not know was that Windows supports
hard links when using ntfs. Fsutil.exe creates them. This gives me hope
that cygwin might support them. I do not know if I can use the same cp
-l -r trick, but it at least it looks hopeful. I did not realize that
solr's replication feature was so external and easily changeable.
as far as rmi: splitting the index across several boxes seems relatively
easy (the index can be naturally partitioned without much trouble) but I
would still have to deal with sorting on a single box. I am not so
worried about that anymore though. Warming a new searcher and tons of
RAM should handle my sorts in an acceptable way.
It seems to me that one way or another I can make this happen.
I am wondering whether I should try and use a RAM directory. While I
am sure it might be faster, what happens when the power goes out? System
reboots? I would have to occasionally flush to disk anyway. Would it be
better to use a normal directory and turn the maxbuffered docs way up?
Thanks,
- Mark
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]