Hi, For each line that we read as textLine from HDFS, we have a schema..if there is an API that takes the schema as List[Symbol] and maps each token to the Symbol it will be helpful...
One solution is to keep data on hdfs as avro/protobuf serialized objects but not sure if that works on HBase input...we are testing HDFS right now but finally we will read from a persistent store like hbase...so basically the immutableBytes need to be converted to a schema view as well incase we don't want to write the whole row as a protobuf... Does RDDs provide a schema view of the dataset on HDFS / HBase ? Thanks. Deb
