Hi,

For each line that we read as textLine from HDFS, we have a schema..if
there is an API that takes the schema as List[Symbol] and maps each token
to the Symbol it will be helpful...

One solution is to keep data on hdfs as avro/protobuf serialized objects
but not sure if that works on HBase input...we are testing HDFS right now
but finally we will read from a persistent store like hbase...so basically
the immutableBytes need to be converted to a schema view as well incase we
don't want to write the whole row as a protobuf...

Does RDDs provide a schema view of the dataset on HDFS / HBase ?

Thanks.
Deb

Reply via email to