[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

Benedict (JIRA) Fri, 14 Feb 2014 16:26:36 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902200#comment-13902200
 ]


Benedict commented on CASSANDRA-6704:
-------------------------------------

I think the two issues that aren't being addressed effectively here are:

1) The support burden of introducing a whole new (*turing complete*) language 
into the database; and (if we decide this is acceptable)
2) What language would be suitable?

Both are very difficult questions, and to make assumptions about either is 
dangerous, as there is no stepping back from the decision once it's released. 
Some users will rely on it, and it will have to be maintained. Guaranteeing 
those hours of support burden is difficult, and not something easily committed 
to (or convincingly, given there is no mechanism by which anybody can require 
somebody contribute that support).

As to 1 (ignoring 2): any turing complete language is going to have interesting 
and unexpected interactions with Cassandra once let loose upon the world. To 
assume that the support burden will be low is very optimistic: naturally we 
will take on some support burden of users of the language *itself*, as users do 
not understand where they are making the mistake, be it in their interaction 
with Cassandra or the language. But also there will be (probably many) 
unintended edge cases, ones we don't expect and cannot predict because we do 
not fully understand both sides of the equation, and that even if we did the 
combination of the two is frankly impossible to model in our heads. These edge 
cases will change and continually present themselves with each version and new 
feature in Cassandra, and in the language itself.

2) The choice of Groovy itself is also likely to be a strong point of 
contention. It may be quick to put in place, but I disagree with your 
assertions that it is intuitive. I find it powerful in some situations, but it 
has some very strange scoping behaviours, and I found myself quite unproductive 
with it for at least the first day, which is a pretty poor track record given 
my Java background and how straight forward it should ostensibly be. I don't 
want to be on the other end of the user confusion, frankly. This, and it does 
not have a strong backing in general; not weak, but not incredibly widely used. 
And further, it - to me- seems a slightly lazily put together language. Useful 
features, expressive, but not coherently designed with a clearly defined goal 
and purpose or specifications. This is only my impression of it. The point 
being it is a point of contention, and not easily brushed under the carpet.

So, as far as I can see, even *if* we decide that (1) is acceptable and we want 
to include a turing complete language - or any languge other than CQL - and 
that we are confident we can safely support it, we still need to collectively 
address (2) carefully given that we cannot roll back the decision.



> Create wide row scanners
> ------------------------
>
>                 Key: CASSANDRA-6704
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Edward Capriolo
>            Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
>     @Test
>     public void test_scanner() throws Exception
>     {
>       ColumnParent cp = new ColumnParent();
>       cp.setColumn_family("Standard1");
>       ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>       for (char a='a'; a < 'g'; a++){
>         Column c1 = new Column();
>         c1.setName((a+"").getBytes());
>         c1.setValue(new byte [0]);
>         c1.setTimestamp(System.nanoTime());
>         server.insert(key, cp, c1, ConsistencyLevel.ONE);
>       }
>       
>       FilterDesc d = new FilterDesc();
>       d.setSpec("GROOVY_CLASS_LOADER");
>       d.setName("limit3");
>       d.setCode("import org.apache.cassandra.dht.* \n" +
>               "import org.apache.cassandra.thrift.* \n" +
>           "public class Limit3 implements SFilter { \n " +
>           "public FilterReturn filter(ColumnOrSuperColumn col, 
> List<ColumnOrSuperColumn> filtered) {\n"+
>           " filtered.add(col);\n"+
>           " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>           "} \n" +
>         "}\n");
>       server.create_filter(d);
>       
>       
>       ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>       Assert.assertEquals(3, res.results.size());
>     }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

Reply via email to