Thanks Jeremy,
It make sense to abstract out CFOF and CFRW (right now it's tightly
bounded to avro), so that one can plugin custom serializer (avro, thrift
and going forward I guess may be CQL). I will create a JIRA and submit
the patch with do the needful changes. Surely, I will ping you if I
require help.
With regards,
Mayank
On 28-02-2011 23:07, Jeremy Hanna wrote:
One thing that could be done is the CFRW could be abstracted more so that it's
easier to extend and only the serialization mechanism is required to extend it.
That is, all of the core functionality relating to Cassandra would be in an
abstract class or something like that. Then the avro based one could extend
that with things specific to avro. That way people could write their own CFRW
extension with whatever serialization they chose. Anyway, that seems
reasonable, but would take some work - if you'd like to look at that, I could
help as I had time.
On Feb 28, 2011, at 10:19 AM, Jeremy Hanna wrote:
There certainly could be a thrift based record writer. However, (if I remember
correctly) to enable Hadoop output streaming, it was easier to go with Avro for
doing the records as the schema is included. There could also have been a
thrift version of the record writer, but it's simpler to just have one record
writer. That was the decision process at least.
If there is a compelling reason or a lot of demand for a thrift based one,
maybe it could be revisited - though I'm not the one making that decision.
On Feb 28, 2011, at 4:10 AM, Mayank Mishra wrote:
Hi all,
As I was integrating Hadoop with Cassandra, I wanted to serialize mutations,
hence I used thrift mutations in M/R jobs.
During the course, I came to know that CFRW considers only Avro mutations. Can
someone please explain me why only avro transport is entertained by CFRW. Why
not, both thrift and avro mutations are considered?
Please let me know if I missed some important point.
With regards,
Mayank