Re: class after join

2014-07-17 Thread Michael Armbrust
If you intern the string it will be more efficient, but still significantly more expensive than the class based approach. ** VERY EXPERIMENTAL ** We are working with EPFL on a lightweight syntax for naming the results of spark transformations in scala (and are going to make it interoperate with SQ

Re: class after join

2014-07-17 Thread Luis Guerra
Thank you for your fast reply. We are considering this Map[String, String] solution, but there are some details that we do not control yet. What would happen if we have different data types for different fields? Also, with this solution, we have to repeat the field names for every "row" that we ha

Re: class after join

2014-07-17 Thread Sean Owen
If what you have is a large number of named strings, why not use a Map[String,String] to represent them? If you're approaching a class with >22 String fields anyway, it probably makes more sense. You lose a bit of compile-time checking, but gain flexibility. Also, merging two Maps to make a new on

class after join

2014-07-17 Thread Luis Guerra
Hi all, I am a newbie Spark user with many doubts, so sorry if this is a "silly" question. I am dealing with tabular data formatted as text files, so when I first load the data, my code is like this: case class data_class( V1: String, V2: String, V3: String, V4: String, V5: String