[ https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901463#comment-13901463 ]
Sylvain Lebresne commented on CASSANDRA-6704: --------------------------------------------- bq. Everything CQL is right, and everything else is wrong? I don't think that's really what people mean here. I believe the concern (maybe I should say "my" concern, I'm really speaking in my own name here) is that it would be a bad idea for C* to have 2 API (thrift and CQL) that continue to evolve with set of features that fundamentally do the same thing but have different implementations. In practice, the project don't want to maintain 2 APIs, we don't have infinite development resources and this is confusing for users in the long run. Thrift is the legacy API. We've promised to maintain it in it's current state indefinitely (which *is* a non-negligible drain on the project resources btw), and we are even fine exposing some new features through it when that require very little maintenance effort (CAS for instance), but the C* API moving forward, the one we are developing not just maintaining, is CQL. This ticket seems non trivial and thrift-only by design and so, for the reason I just expressed, I do not think that it's a good idea for the C* project and agree that we should focus on tickets like CASSANDRA-4914 instead (and granted no-one has had the time to focus on that yet, but that's really just proving my point that development resources are never infinite. As a side note and for what it's worth, I do intent to make ticket one of my priority for 3.0 (if no-one else beats me to it of course)). > Create wide row scanners > ------------------------ > > Key: CASSANDRA-6704 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6704 > Project: Cassandra > Issue Type: New Feature > Reporter: Edward Capriolo > Assignee: Edward Capriolo > > The BigTable white paper demonstrates the use of scanners to iterate over > rows and columns. > http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf > Because Cassandra does not have a primary sorting on row keys scanning over > ranges of row keys is less useful. > However we can use the scanner concept to operate on wide rows. For example > many times a user wishes to do some custom processing inside a row and does > not wish to carry the data across the network to do this processing. > I have already implemented thrift methods to compile dynamic groovy code into > Filters as well as some code that uses a Filter to page through and process > data on the server side. > https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk > The following is a working code snippet. > {code} > @Test > public void test_scanner() throws Exception > { > ColumnParent cp = new ColumnParent(); > cp.setColumn_family("Standard1"); > ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes()); > for (char a='a'; a < 'g'; a++){ > Column c1 = new Column(); > c1.setName((a+"").getBytes()); > c1.setValue(new byte [0]); > c1.setTimestamp(System.nanoTime()); > server.insert(key, cp, c1, ConsistencyLevel.ONE); > } > > FilterDesc d = new FilterDesc(); > d.setSpec("GROOVY_CLASS_LOADER"); > d.setName("limit3"); > d.setCode("import org.apache.cassandra.dht.* \n" + > "import org.apache.cassandra.thrift.* \n" + > "public class Limit3 implements SFilter { \n " + > "public FilterReturn filter(ColumnOrSuperColumn col, > List<ColumnOrSuperColumn> filtered) {\n"+ > " filtered.add(col);\n"+ > " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : > FilterReturn.FILTER_DONE;\n"+ > "} \n" + > "}\n"); > server.create_filter(d); > > > ScannerResult res = server.create_scanner("Standard1", "limit3", key, > ByteBuffer.wrap("a".getBytes())); > Assert.assertEquals(3, res.results.size()); > } > {code} > I am going to be working on this code over the next few weeks but I wanted to > get the concept our early so the design can see some criticism. -- This message was sent by Atlassian JIRA (v6.1.5#6160)