[ https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902200#comment-13902200 ]
Benedict commented on CASSANDRA-6704: ------------------------------------- I think the two issues that aren't being addressed effectively here are: 1) The support burden of introducing a whole new (*turing complete*) language into the database; and (if we decide this is acceptable) 2) What language would be suitable? Both are very difficult questions, and to make assumptions about either is dangerous, as there is no stepping back from the decision once it's released. Some users will rely on it, and it will have to be maintained. Guaranteeing those hours of support burden is difficult, and not something easily committed to (or convincingly, given there is no mechanism by which anybody can require somebody contribute that support). As to 1 (ignoring 2): any turing complete language is going to have interesting and unexpected interactions with Cassandra once let loose upon the world. To assume that the support burden will be low is very optimistic: naturally we will take on some support burden of users of the language *itself*, as users do not understand where they are making the mistake, be it in their interaction with Cassandra or the language. But also there will be (probably many) unintended edge cases, ones we don't expect and cannot predict because we do not fully understand both sides of the equation, and that even if we did the combination of the two is frankly impossible to model in our heads. These edge cases will change and continually present themselves with each version and new feature in Cassandra, and in the language itself. 2) The choice of Groovy itself is also likely to be a strong point of contention. It may be quick to put in place, but I disagree with your assertions that it is intuitive. I find it powerful in some situations, but it has some very strange scoping behaviours, and I found myself quite unproductive with it for at least the first day, which is a pretty poor track record given my Java background and how straight forward it should ostensibly be. I don't want to be on the other end of the user confusion, frankly. This, and it does not have a strong backing in general; not weak, but not incredibly widely used. And further, it - to me- seems a slightly lazily put together language. Useful features, expressive, but not coherently designed with a clearly defined goal and purpose or specifications. This is only my impression of it. The point being it is a point of contention, and not easily brushed under the carpet. So, as far as I can see, even *if* we decide that (1) is acceptable and we want to include a turing complete language - or any languge other than CQL - and that we are confident we can safely support it, we still need to collectively address (2) carefully given that we cannot roll back the decision. > Create wide row scanners > ------------------------ > > Key: CASSANDRA-6704 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6704 > Project: Cassandra > Issue Type: New Feature > Reporter: Edward Capriolo > Assignee: Edward Capriolo > > The BigTable white paper demonstrates the use of scanners to iterate over > rows and columns. > http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf > Because Cassandra does not have a primary sorting on row keys scanning over > ranges of row keys is less useful. > However we can use the scanner concept to operate on wide rows. For example > many times a user wishes to do some custom processing inside a row and does > not wish to carry the data across the network to do this processing. > I have already implemented thrift methods to compile dynamic groovy code into > Filters as well as some code that uses a Filter to page through and process > data on the server side. > https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk > The following is a working code snippet. > {code} > @Test > public void test_scanner() throws Exception > { > ColumnParent cp = new ColumnParent(); > cp.setColumn_family("Standard1"); > ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes()); > for (char a='a'; a < 'g'; a++){ > Column c1 = new Column(); > c1.setName((a+"").getBytes()); > c1.setValue(new byte [0]); > c1.setTimestamp(System.nanoTime()); > server.insert(key, cp, c1, ConsistencyLevel.ONE); > } > > FilterDesc d = new FilterDesc(); > d.setSpec("GROOVY_CLASS_LOADER"); > d.setName("limit3"); > d.setCode("import org.apache.cassandra.dht.* \n" + > "import org.apache.cassandra.thrift.* \n" + > "public class Limit3 implements SFilter { \n " + > "public FilterReturn filter(ColumnOrSuperColumn col, > List<ColumnOrSuperColumn> filtered) {\n"+ > " filtered.add(col);\n"+ > " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : > FilterReturn.FILTER_DONE;\n"+ > "} \n" + > "}\n"); > server.create_filter(d); > > > ScannerResult res = server.create_scanner("Standard1", "limit3", key, > ByteBuffer.wrap("a".getBytes())); > Assert.assertEquals(3, res.results.size()); > } > {code} > I am going to be working on this code over the next few weeks but I wanted to > get the concept our early so the design can see some criticism. -- This message was sent by Atlassian JIRA (v6.1.5#6160)