2011/3/8 Peter Schuller <peter.schul...@infidyne.com>

> >                 $client->batch_mutate($mutations,
> > cassandra_ConsistencyLevel::QUORUM);
>
> Btw, what are the mutations? Are you doing something like inserting
> both very small values and very large ones?
>
> I have big xml file (5 GB) (mysql dump in xml format) and read data from it
with SAX xml parser, all records on that file looks like this:

        <row>
                <field name="uid">5</field>
                <field name="aid">3619780:1</field>
                <field name="cleanness">0</field>
                <field name="counter">7</field>
                <field name="gcount">0</field>
                <field name="lastchange">1291053619</field>
                <field name="disaster">0</field>
                <field name="tdisaster">0</field>
        </row>


mutations in that case is 10 similar records (follow fragment of code,
describes situation )

    $l_supercolumn = new cassandra_SuperColumn(array("name" =>
$l_row["aid"], "columns" => $l_columns));
    $l_c_or_sc = new cassandra_ColumnOrSuperColumn(array("super_column" =>
$l_supercolumn));
    $l_mutation = new cassandra_Mutation(array("column_or_supercolumn" =>
$l_c_or_sc));

    if(array_key_exists($l_key, $mutations))
    {
        array_push($mutations[$l_key]['aquarium_friend'], $l_mutation);
    }
    else
    {
        $mutations[$l_key] = array('aquarium_friend' => array($l_mutation));
    };

    if(!($l_i % 10))
    {
        make_mutation($client, $mutations, $g_loger, $g_rloger);
        $mutations = array();

        if(!($l_i % 1000))
        {
            $g_loger->info(sprintf("inserted: %s", $l_i));
        };
    };



> That's why I asked about the frequency. If you're doing a long-term
> stress test and seeing a 30 second pause once per week, that's a lot
> more likely to be "normal" for your workload than if you're seeing it
> happen once ever three minutes. The issue is that if you want to fix
> your problem, one must first figure out what the problem *is*. Based
> on past mailing list traffic, it seems most people's problems that are
> seemingly "due to GC" end up being because of a too high live set size
> or the CMS phase triggering too late. These are fixable issues if are
> running into them.
>
>
In may case this happen from time to time. For example insert all 5GB xml
took about 30-40 minutes, and nodes frozen about 5-10 time on that period
(avg time of frozen 15 secs)



> If all you have is a single column family with a 64 mb flush threshold
> and doing a bunch of insertions, and have a heap size of 5 (was it?)
> gig, you should not be having these issues. But stating that helps no
> one, which is why I'm asking for more information. (Widely
> extrapolating and suggesting that all Cassandra nodes will always
> freeze for 30 seconds every now and then is also helping no one, other
> than not being true.)
>
>  At initial state HEAP was 6GB. When i increase HEAP size to 7GB nodes
frozen only one time, but took much greater time (40 secs)

Reply via email to