I meant -Xmx32m On Sun, Oct 31, 2010 at 9:17 PM, Paulo Levi <i30...@gmail.com> wrote:
> Yes that's what i ended up doing. I will probably "fork" a new java vm > instead of doing it in the same. That way i can control the memory > requirements, though it hasn't given me any problems (actually it even > worked with -Xmx, though it probably doesn't if i do something else in the > program at the same time - i'm not indexing the book subjects yet too, need > to do some sort of string caching for that and authors.) > > > On Sun, Oct 31, 2010 at 8:47 PM, Erick Erickson > <erickerick...@gmail.com>wrote: > >> Hmmmm. Are you too memory limited to do a first pass through the file and >> save >> the key/download links part in a map, then make another pass through the >> file >> indexing the data and grabbing the link from your map? I'm assuming that >> there's a lot less than 200M in just the key/link part. >> >> Alternatively (and this would probably be kinda slow, but...) still do a 2 >> pass >> process, but instead of making a map, put the data in a Lucene index on >> disk. >> Then the second pass searches that index for the data to add to the docs >> in your "real" index. >> >> HTH >> Erick >> >> On Sun, Oct 31, 2010 at 12:17 PM, Paulo Levi <i30...@gmail.com> wrote: >> >> > I'm stepping tru a rdf file (the project gutenberg catalog) and sending >> > data >> > to a lucene index to allow searches of titles authors and such. However >> the >> > gutenberg rdf is a little bit "special". It has two sections, one for >> > title, >> > authors, collaborators and such, and (after all the books) starts the >> other >> > section that has the download links. The connection is a kind of foreign >> > key >> > that exists on both tags (a unique number id). While i don't need to >> search >> > the download link, i do need to save it. >> > >> > I'm memory limited and can't put in memory the 200 mb file that the >> catalog >> > is. I'm wondering if there is some way for me to use the number id to >> > connect both kinds of information without having to keep things in >> memory. >> > A >> > first search for the things i want, and a second using the number id? >> > It seems very clumsy. I'm not actually using a database, and i don't >> want >> > to >> > use very large libraries. Compass is 60 mbs (!). I tried lucenesail for >> a >> > while, but it has stopped working and the code is a mess (it is not >> adapted >> > to the filtering of the gutemberg rdf that i'm doing). >> > >> > >