Thanks Jeremy,

It make sense to abstract out CFOF and CFRW (right now it's tightly bounded to avro), so that one can plugin custom serializer (avro, thrift and going forward I guess may be CQL). I will create a JIRA and submit the patch with do the needful changes. Surely, I will ping you if I require help.

With regards,
Mayank


On 28-02-2011 23:07, Jeremy Hanna wrote:
One thing that could be done is the CFRW could be abstracted more so that it's 
easier to extend and only the serialization mechanism is required to extend it. 
 That is, all of the core functionality relating to Cassandra would be in an 
abstract class or something like that.  Then the avro based one could extend 
that with things specific to avro.  That way people could write their own CFRW 
extension with whatever serialization they chose.  Anyway, that seems 
reasonable, but would take some work - if you'd like to look at that, I could 
help as I had time.

On Feb 28, 2011, at 10:19 AM, Jeremy Hanna wrote:

There certainly could be a thrift based record writer.  However, (if I remember 
correctly) to enable Hadoop output streaming, it was easier to go with Avro for 
doing the records as the schema is included.  There could also have been a 
thrift version of the record writer, but it's simpler to just have one record 
writer.  That was the decision process at least.

If there is a compelling reason or a lot of demand for a thrift based one, 
maybe it could be revisited - though I'm not the one making that decision.

On Feb 28, 2011, at 4:10 AM, Mayank Mishra wrote:

Hi all,

As I was integrating Hadoop with Cassandra, I wanted to serialize mutations, 
hence I used thrift mutations in M/R jobs.

During the course, I came to know that CFRW considers only Avro mutations. Can 
someone please explain me why only avro transport is entertained by CFRW. Why 
not, both thrift and avro mutations are considered?

Please let me know if I missed some important point.

With regards,
Mayank

Reply via email to