[ 
https://issues.apache.org/jira/browse/CASSANDRA-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786212#comment-13786212
 ] 

Jonathan Ellis commented on CASSANDRA-6146:
-------------------------------------------

What I'd like to see is a drastic reduction in the amount of flags we support, 
in favor of allowing the user to pre-create a table for stress-ng (stress-cql?) 
to take its cues from.

So here's what our new Config might look like:

{code}
        availableOptions.addOption("h", "help", false, "Show this help message 
and exit");
        // NB only SELECT makes sense for compound PK unless we add some kind 
of scan-for-PK support
        availableOptions.addOption("cql", "cql", true, "CQL to execute for each 
operation. Use ? for partition key bind placeholder");
        availableOptions.addOption("d", "distribution", true, "Partition key 
distribution: uniform or gaussian.  Default: uniform");
        availableOptions.addOption("ks", "keyspace", true, "Keyspace. Default: 
stress");
        availableOptions.addOption("n", "nodes", true, "Nodes to connect to 
(CDL). Default: 127.0.0.1");
        availableOptions.addOption("p", "partitions", true, "Number of distinct 
partitions to use.  Default: 1,000,000");
        availableOptions.addOption("pop", "populate", false, "Populate mode. 
Enable to generate random inserts for the given table");
        availableOptions.addOption("r", "requests", true, "Number of requests 
to execute.  Default: 1,000,000");
        availableOptions.addOption("std", "stdev", true, "Standard deviation 
from mean, for gaussian distribution only. Default: 0.1");
        availableOptions.addOption("t", "table", true, "Table. Default: data");
{code}

So, you'd have command lines like this:

# {{stress -cql "SELECT * FROM data WHERE key = ?"}}
# {{stress -cql "SELECT username, password FROM users WHERE user_id = ?"}}
# {{stress -cql "SELECT collected_at, value FROM timeseries WHERE sensor_id = ? 
LIMIT 100"}}
# {{stress -cql "SELECT * FROM timeseries WHERE sensor_id = ? AND collected_at 
= ?"}}
# {{stress --populate}}
# {stress --populate --table timeseries}}

There's some asymmetry between inserts and reads; I'm not sure it makes sense 
to customize INSERT all that much, and I want people to be able to get a quick 
smoke test up with a minimum of ceremony, i.e., creating a default {{data}} 
table for them rather than requiring explicit {{CREATE TABLE}} first.  But, if 
you want to create a custom table, we should be able to introspect it and 
populate it for you.

The populate code might look something like this:

{code}
    private static void populate(Config config, Session session)
    {
        KeyspaceMetadata ks = 
session.getCluster().getMetadata().getKeyspace(config.keyspace);
        TableMetadata table = ks.getTable(config.table);
        if (table == null)
        {
            System.out.println("NOTICE: Creating table with 6 int columns.  
Create manually if you prefer otherwise.");
            session.execute("CREATE TABLE " + config.table + " (key int PRIMARY 
KEY, i1 int, i2 int, i3 int, i4 int, i5 int");
        }
        List<ColumnMetadata> pkColumns = table.getPrimaryKey();
        List<ColumnMetadata> columns = table.getColumns();

        String cql = "INSERT INTO " + config.table + " VALUES (";
        for (int i = 0; i < columns.size(); i++)
        {
            ColumnMetadata c = columns.get(i);
            if (i > 0)
                cql += ",";
            cql += c.getName();
        }
        cql += ")";
        PreparedStatement statement = session.prepare(cql);

        for (int n = 0; n < config.requests; n++)
        {
            BoundStatement bs = new BoundStatement(statement);

            // partition key gets treated by distribution
            if (config.distribution == Config.Distribution.UNIFORM)
            {
                if (config.partitions == config.requests)
                    bs.setInt(0, n);
                else
                    bs.setInt(0, random.nextInt(config.partitions));
            }
            else
            {
                int k;
                while (true)
                {
                    // loop until we get a result within the necessary bounds
                    k = (int) (config.mean + (random.nextGaussian() + 
config.sigma));
                    if (k >= 0 && k < config.partitions)
                        break;
                }
                bs.setInt(0, k);
            }

            // non-partition key columns get random data
            for (int i = 1; i < columns.size(); i++)
            {
                ColumnMetadata c = columns.get(i);
                if (c.getType() == DataType.cint())
                    bs.setInt(i, random.nextInt());
                else
                    throw new UnsupportedOperationException("Flesh this out 
with support for more types");
            }

            executeLimitedAsync(session, bs);
        }
    }

    private static void executeLimitedAsync(Session session, BoundStatement 
statement)
    {
        while (executing.size() == MAX_EXECUTING)
        {
            for (Iterator<ResultSetFuture> iter = executing.iterator(); 
iter.hasNext(); )
            {
                ResultSetFuture future = iter.next();
                if (future.isDone())
                    iter.remove();
            }
            Uninterruptibles.sleepUninterruptibly(1, TimeUnit.MILLISECONDS);
        }

        executing.add(session.executeAsync(statement));
    }
{code} 

> CQL-native stress
> -----------------
>
>                 Key: CASSANDRA-6146
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6146
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Tools
>            Reporter: Jonathan Ellis
>
> The existing CQL "support" in stress is not worth discussing.  We need to 
> start over, and we might as well kill two birds with one stone and move to 
> the native protocol while we're at it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to