Hi, I have a system with files in a variety of non-standard input formats, though they're generally flat text files. I'd like to dynamically create DataFrames of string columns.
What's the best way to go from a RDD<String> to a DataFrame of StringType columns? My current plan is - Call map() on the RDD<String> with a function to split the String into columns and call RowFactory.create() with the resulting array, creating a RDD<Row> - Construct a StructType schema using column names and StringType - Call SQLContext.createDataFrame(RDD, schema) to create the result Does that make sense? I looked through the spark-csv package a little and noticed that it's using baseRelationToDataFrame(), but BaseRelation looks like it might be a restricted developer API. Anyone know if it's recommended for use? Thanks! - Everett