Re: Indexing with foreign key

Paulo Levi Sun, 31 Oct 2010 14:18:46 -0700

I meant -Xmx32m

On Sun, Oct 31, 2010 at 9:17 PM, Paulo Levi <i30...@gmail.com> wrote:


> Yes that's what i ended up doing. I will probably "fork" a new java vm
> instead of doing it in the same. That way i can control the memory
> requirements, though it hasn't given me any problems (actually it even
> worked with -Xmx, though it probably doesn't if i do something else in the
> program at the same time - i'm not indexing the book subjects yet too, need
> to do some sort of string caching for that and authors.)
>
>
> On Sun, Oct 31, 2010 at 8:47 PM, Erick Erickson 
> <erickerick...@gmail.com>wrote:
>
>> Hmmmm. Are you too memory limited to do a first pass through the file and
>> save
>> the key/download links part in a map, then make another pass through the
>> file
>> indexing the data and grabbing the link from your map? I'm assuming that
>> there's a lot less than 200M in just the key/link part.
>>
>> Alternatively (and this would probably be kinda slow, but...) still do a 2
>> pass
>> process, but instead of making a map, put the data in a Lucene index on
>> disk.
>> Then the second pass searches that index for the data to add to the docs
>> in your "real" index.
>>
>> HTH
>> Erick
>>
>> On Sun, Oct 31, 2010 at 12:17 PM, Paulo Levi <i30...@gmail.com> wrote:
>>
>> > I'm stepping tru a rdf file (the project gutenberg catalog) and sending
>> > data
>> > to a lucene index to allow searches of titles authors and such. However
>> the
>> > gutenberg rdf is a little bit "special". It has two sections, one for
>> > title,
>> > authors, collaborators and such, and (after all the books) starts the
>> other
>> > section that has the download links. The connection is a kind of foreign
>> > key
>> > that exists on both tags (a unique number id). While i don't need to
>> search
>> > the download link, i do need to save it.
>> >
>> > I'm memory limited and can't put in memory the 200 mb file that the
>> catalog
>> > is. I'm wondering if there is some way for me to use the number id to
>> > connect both kinds of information without having to keep things in
>> memory.
>> > A
>> > first search for the things i want, and a second using the number id?
>> > It seems very clumsy. I'm not actually using a database, and i don't
>> want
>> > to
>> > use very large libraries. Compass is 60 mbs (!). I tried lucenesail for
>> a
>> > while, but it has stopped working and the code is a mess (it is not
>> adapted
>> > to the filtering of the gutemberg rdf that i'm doing).
>> >
>>
>
>

Re: Indexing with foreign key

Reply via email to