Re: Indexing with foreign key

Erick Erickson Sun, 31 Oct 2010 16:17:17 -0700

32M is tiny. Is this a self-imposed memory constraint or do you really have
hardware that's that limited? I ask because "just give the VM more memory"
is the very first option I'd suggest......


Best
Erick

On Sun, Oct 31, 2010 at 5:17 PM, Paulo Levi <i30...@gmail.com> wrote:

> I meant -Xmx32m
>
> On Sun, Oct 31, 2010 at 9:17 PM, Paulo Levi <i30...@gmail.com> wrote:
>
> > Yes that's what i ended up doing. I will probably "fork" a new java vm
> > instead of doing it in the same. That way i can control the memory
> > requirements, though it hasn't given me any problems (actually it even
> > worked with -Xmx, though it probably doesn't if i do something else in
> the
> > program at the same time - i'm not indexing the book subjects yet too,
> need
> > to do some sort of string caching for that and authors.)
> >
> >
> > On Sun, Oct 31, 2010 at 8:47 PM, Erick Erickson <erickerick...@gmail.com
> >wrote:
> >
> >> Hmmmm. Are you too memory limited to do a first pass through the file
> and
> >> save
> >> the key/download links part in a map, then make another pass through the
> >> file
> >> indexing the data and grabbing the link from your map? I'm assuming that
> >> there's a lot less than 200M in just the key/link part.
> >>
> >> Alternatively (and this would probably be kinda slow, but...) still do a
> 2
> >> pass
> >> process, but instead of making a map, put the data in a Lucene index on
> >> disk.
> >> Then the second pass searches that index for the data to add to the docs
> >> in your "real" index.
> >>
> >> HTH
> >> Erick
> >>
> >> On Sun, Oct 31, 2010 at 12:17 PM, Paulo Levi <i30...@gmail.com> wrote:
> >>
> >> > I'm stepping tru a rdf file (the project gutenberg catalog) and
> sending
> >> > data
> >> > to a lucene index to allow searches of titles authors and such.
> However
> >> the
> >> > gutenberg rdf is a little bit "special". It has two sections, one for
> >> > title,
> >> > authors, collaborators and such, and (after all the books) starts the
> >> other
> >> > section that has the download links. The connection is a kind of
> foreign
> >> > key
> >> > that exists on both tags (a unique number id). While i don't need to
> >> search
> >> > the download link, i do need to save it.
> >> >
> >> > I'm memory limited and can't put in memory the 200 mb file that the
> >> catalog
> >> > is. I'm wondering if there is some way for me to use the number id to
> >> > connect both kinds of information without having to keep things in
> >> memory.
> >> > A
> >> > first search for the things i want, and a second using the number id?
> >> > It seems very clumsy. I'm not actually using a database, and i don't
> >> want
> >> > to
> >> > use very large libraries. Compass is 60 mbs (!). I tried lucenesail
> for
> >> a
> >> > while, but it has stopped working and the code is a mess (it is not
> >> adapted
> >> > to the filtering of the gutemberg rdf that i'm doing).
> >> >
> >>
> >
> >
>

Re: Indexing with foreign key

Reply via email to