When to expect UTF8String?

2015-06-11 Thread zsampson
I'm hoping for some clarity about when to expect String vs UTF8String when using the Java DataFrames API. In upgrading to Spark 1.4, I'm dealing with a lot of errors where what was once a String is now a UTF8String. The comments in the file and the related commit message indicate that maybe it sho

DataFrame.withColumn very slow when used iteratively?

2015-06-02 Thread zsampson
Hey, I'm seeing extreme slowness in withColumn when it's used in a loop. I'm running this code: for (int i = 0; i < NUM_ITERATIONS ++i) { df = df.withColumn("col"+i, new Column(new Literal(i, DataTypes.IntegerType))); } where df is initially a trivial dataframe. Here are the results of runni