32M is tiny. Is this a self-imposed memory constraint or do you really have hardware that's that limited? I ask because "just give the VM more memory" is the very first option I'd suggest......
Best Erick On Sun, Oct 31, 2010 at 5:17 PM, Paulo Levi <i30...@gmail.com> wrote: > I meant -Xmx32m > > On Sun, Oct 31, 2010 at 9:17 PM, Paulo Levi <i30...@gmail.com> wrote: > > > Yes that's what i ended up doing. I will probably "fork" a new java vm > > instead of doing it in the same. That way i can control the memory > > requirements, though it hasn't given me any problems (actually it even > > worked with -Xmx, though it probably doesn't if i do something else in > the > > program at the same time - i'm not indexing the book subjects yet too, > need > > to do some sort of string caching for that and authors.) > > > > > > On Sun, Oct 31, 2010 at 8:47 PM, Erick Erickson <erickerick...@gmail.com > >wrote: > > > >> Hmmmm. Are you too memory limited to do a first pass through the file > and > >> save > >> the key/download links part in a map, then make another pass through the > >> file > >> indexing the data and grabbing the link from your map? I'm assuming that > >> there's a lot less than 200M in just the key/link part. > >> > >> Alternatively (and this would probably be kinda slow, but...) still do a > 2 > >> pass > >> process, but instead of making a map, put the data in a Lucene index on > >> disk. > >> Then the second pass searches that index for the data to add to the docs > >> in your "real" index. > >> > >> HTH > >> Erick > >> > >> On Sun, Oct 31, 2010 at 12:17 PM, Paulo Levi <i30...@gmail.com> wrote: > >> > >> > I'm stepping tru a rdf file (the project gutenberg catalog) and > sending > >> > data > >> > to a lucene index to allow searches of titles authors and such. > However > >> the > >> > gutenberg rdf is a little bit "special". It has two sections, one for > >> > title, > >> > authors, collaborators and such, and (after all the books) starts the > >> other > >> > section that has the download links. The connection is a kind of > foreign > >> > key > >> > that exists on both tags (a unique number id). While i don't need to > >> search > >> > the download link, i do need to save it. > >> > > >> > I'm memory limited and can't put in memory the 200 mb file that the > >> catalog > >> > is. I'm wondering if there is some way for me to use the number id to > >> > connect both kinds of information without having to keep things in > >> memory. > >> > A > >> > first search for the things i want, and a second using the number id? > >> > It seems very clumsy. I'm not actually using a database, and i don't > >> want > >> > to > >> > use very large libraries. Compass is 60 mbs (!). I tried lucenesail > for > >> a > >> > while, but it has stopped working and the code is a mess (it is not > >> adapted > >> > to the filtering of the gutemberg rdf that i'm doing). > >> > > >> > > > > >