On Fri, Sep 2, 2011 at 10:29 AM, Benoit Perroud <ben...@noisette.ch> wrote: > Hi All, > > I started using SSTableSimpleUnsortedWriter to load data, and my data > has a few rows but a lot of column name in each rows. > > I call SSTableSimpleUnsortedWriter.newRow every 10'000 columns inserted. > > But the time taken to insert columns is increasing as the column > family is increasing. The problem appears because everytime we call > newRow, all the columns of the previous CF is added to the new CF.
If I understand correctly, each row has way more that 10 000 columns, but you call newRow every 10 000 columns, right ? Note that you have the possibility to decrease the frequency of the calls to newRow. But anyway, I agree that the code shouldn't suck like that. > Attached is a small patch that check which is the smallest CF, and add > the smallest CF to the biggest one. > > Should I open I bug for that ? Please do. I'm actually thinking of a slightly different fix: we should not have to add all the previous columns to the new column family, we should just directly reuse the previous column family when adding the new column. But the JIRA ticket will be a better place to discuss this. -- Sylvain