: Actually I am storing twitter streaming data into the core, so the rate of
: index is about 12tweets(docs)/second. The same solr contains 3 other cores
...
: . At any given time I dont need data more than past 15 days, unless
: someone queries for it explicetly. How can this be achieved?
so you are adding 12 docs a second, and you need to keep all docs forever,
in case someone askes for a specific doc, but otherwise you only typically
need to search for docs in the past 15 days.
if you index is going to grow w/o bounds at this rate forever then it
doesn't matter what tricks you try, or how you tune things -- you are
always going to run out of resources unless you adopt some sort of
distributed approach.
off the cuff, i would suggest indexing all of the docs for a single "day"
in one shard, and making most of your searches be a distributed request
against the most recent 15 shards.
you didn't say how people "query for it explicitly" when looking for older
docs -- if it's by date then when a user asks for a specific date range
you cna just query those shards explicitly, if it's by some unique id then
you'll want to cache in your application the min/max id for each doc in
each shard (easy enough to determine by looping over them all and doing a
stast query)
-Hoss