Vivek - exactly, all of those are good reasons for index partitioning.  In 
addition, think about how much of a huge index will be able to fit in RAM/FS 
buffers of a single server?  At some point you'll have to spread your index 
over N servers.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: vivek sar <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thursday, January 24, 2008 11:21:08 PM
Subject: Re: Archiving Index using partitions

Thanks 
Otis 
for 
your 
response. 
I've 
few 
more 
questions,

1) 
Is 
it 
recommended 
to 
do 
index 
partitioning 
for 
large 
indexes?
  
  
  
 
- 
We 
index 
around 
35 
fields 
(storing 
only 
two 
of 
them 
- 
simple 
ids)
  
  
  
 
- 
Each 
document 
is 
around 
200 
bytes
  
  
  
 
- 
Our 
index 
grows 
to 
around 
50G 
a 
week

2) 
The 
reason 
I 
could 
think 
for 
partitioning 
would 
be,
  
  
  
- 
optimization 
would 
be 
faster 
on 
smaller 
indexes
  
  
  
- 
search 
would 
be 
faster 
if 
I 
have 
to 
search 
only 
on 
specific 
partition
  
  
  
- 
I 
would 
be 
able 
to 
archive 
old 
partitions
  
  
  
- 
Even 
if 
a 
partition 
gets 
corrupt 
I 
wouldn't 
lose 
all 
data

  
  
Is 
this 
correct? 
Are 
there 
any 
other 
reasons?

Thanks,
-vivek



On 
Jan 
21, 
2008 
2:32 
PM, 
Otis 
Gospodnetic 
<[EMAIL PROTECTED]> 
wrote:
>
> 
Why 
not 
just 
design 
your 
system 
to 
roll 
over 
to 
a 
new 
index 
on 
a 
weekly 
a 
basis 
(new 
IndexWriter 
on 
a 
new 
index 
dir, 
roughly 
speaking)?  
You 
can't 
partition 
a 
single 
Document, 
if 
that 
is 
what 
you 
are 
asking.  
But 
you 
can 
create 
multiple 
smaller 
(e.g. 
weekly 
indices) 
instead 
one 
large 
one, 
and 
then 
every 
2 
weeks 
archive 
the 
one 
> 
2 
weeks 
old.
>
> 
Otis
> 
--
> 
Sematext 
-- 
http://sematext.com/ 
-- 
Lucene 
- 
Solr 
- 
Nutch
>
> 
----- 
Original 
Message 
----
> 
From: 
vivek 
sar 
<[EMAIL PROTECTED]>
> 
To: 
java-user@lucene.apache.org
> 
Sent: 
Monday, 
January 
21, 
2008 
3:06:50 
PM
> 
Subject: 
Archiving 
Index 
using 
partitions
>
> 
Hi,
>
>  
As 
a 
requirement 
I 
need 
to 
be 
able 
to 
archive 
any 
indexes 
older 
than
> 
2 
weeks 
(due 
to 
space 
and 
performance 
reasons). 
That 
means 
I 
would
> 
need 
to 
maintain 
weekly 
indexes. 
Here 
are 
my 
questions,
>
> 
1) 
What's 
the 
best 
way 
to 
partition 
indexes 
using 
Lucene?
> 
2) 
Is 
there 
a 
way 
I 
can 
partition 
documents, 
but 
not 
indexes? 
I 
don't
> 
want 
each 
partitioned 
index 
to 
be 
a 
full 
index, 
as 
that 
would 
be 
waste
> 
of 
space. 
We 
collect 
over 
10K 
new 
documents 
per 
min 
(with 
each
> 
document 
around 
250 
bytes).
> 
3) 
Is 
ParallelMultiSearcher 
the 
way 
to 
go 
for 
partitioned 
indexes? 
Do
> 
I 
ever 
have 
to 
merge 
these 
partitioned 
indexes?
> 
4) 
I'm 
hoping 
I 
can 
reload 
the 
archived 
indexes 
in 
future 
if 
needed.
>
> 
Not 
sure 
if 
there 
is 
a 
standard 
way 
to 
archive 
the 
indexes 
using
>  
Lucene.
>
> 
Thanks,
> 
-vivek
>
> 
---------------------------------------------------------------------
> 
To 
unsubscribe, 
e-mail: 
[EMAIL PROTECTED]
> 
For 
additional 
commands, 
e-mail: 
[EMAIL PROTECTED]
>
>
>
>
>
> 
---------------------------------------------------------------------
> 
To 
unsubscribe, 
e-mail: 
[EMAIL PROTECTED]
> 
For 
additional 
commands, 
e-mail: 
[EMAIL PROTECTED]
>
>

---------------------------------------------------------------------
To 
unsubscribe, 
e-mail: 
[EMAIL PROTECTED]
For 
additional 
commands, 
e-mail: 
[EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to