On Wed, Apr 7, 2010 at 1:51 PM, Eric Evans <eev...@rackspace.com> wrote: > On Tue, 2010-04-06 at 10:55 -0700, Tatu Saloranta wrote: >> On Tue, Apr 6, 2010 at 12:15 AM, JKnight JKnight <beukni...@gmail.com> >> wrote: >> > When import, all data in json file will load in memory. So that, you >> can not >> > import large data. >> > You need to export large sstable file to many small json files, and >> run >> > import. >> >> Why would you ever read the whole file in memory? JSON is very easily >> streamable. Or does the whole data set need to be validated or >> something (I assume not, if file splitting could be used). Perhaps it >> is just an implementation flaw in importer tool. > > It's been awhile, but if I'm not mistaken, this is because we're writing > SSTables and the records must be written in decorated-key sorted order.
Ok. It might make sense to solve this then, for example by using external sorting? (reminds me that I must clean up and release basic on-disk merge sort code that seems to be something that is not included in existing commons lib, oddly enough -- we used it for this purpose, pre-sorting data for systems that required it, or benefited heavily) -+ Tatu +-