Hi, I can easily do this in shell but wanted to see what I can do in Spark.
I am trying to create a simple table (10 rows, 2 columns) for now and then register it as tempTable and store in Hive, if it is feasible. First column col1 is monolithically incrementing integer and the second column a string of 10 random chars Use a UDF to create random char on length (charlength) import scala.util.Random def random_string(chars: String, charlength: Int) : String = { val newKey = (1 to charlength).map( x => { val index = Random.nextInt(chars.length) chars(index) } ).mkString("") return newKey } spark.udf.register("random_string", random_string(_:String, _:Int)) //create class case class columns (col1: Int, col2: String) val chars = ('a' to 'z') ++ ('A' to 'Z') var text = "Array(" val comma = "," val terminator = "))" for (i <- 1 to 10) { var random_char = random_string(chars.mkString(""), 10) if (i < 10) {text = text + """(""" + i.toString + """,""""+random_char+"""")"""+comma} else {text = text + """(""" + i.toString + """,""""+random_char+"""")))"""} } println(text) val df =sc.parallelize(text) val df =sc.parallelize(text).map(p => columns(p._1.toString.toInt, p._2.toString)).toDF When I run it I get this Loading dynamic_ARRAY_generator.scala... import scala.util.Random random_string: (chars: String, charlength: Int)String res0: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function2>,StringType,Some(List(StringType, IntegerType))) defined class columns chars: scala.collection.immutable.IndexedSeq[Char] = Vector(a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z, A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z) text: String = Array( comma: String = , terminator: String = )) Array((1,"yyzbPpXEoX"),(2,"bEnzvFCdXm"),(3,"dKXZbgaGTr"),(4,"hIHGkiWjcy"),(5,"HBnJmYlefk"),(6,"MKqfwWCmah"),(7,"CrKYmsbXKI"),(8,"iySnzSKtuH"),(9,"BbCRKqtkml"),(10,"nYdxrDneUm"))) *df: org.apache.spark.rdd.RDD[Char] = ParallelCollectionRDD[0] at parallelize at <console>:27*<console>:29: error: value _1 is not a member of Char val df =sc.parallelize(text).map(p => columns(p._1.toString.toInt, p._2.toString)).toDF ^ <console>:29: error: value _2 is not a member of Char val df =sc.parallelize(text).map(p => columns(p._1.toString.toInt, p._2.toString)).toDF ^ <console>:29: error: value toDF is not a member of org.apache.spark.rdd.RDD[U] val df =sc.parallelize(text).map(p => columns(p._1.toString.toInt, p._2.toString)).toDF ^ This works val df =sc.parallelize(text) But this fails val df =sc.parallelize(text).map(p => columns(p._1.toString.toInt, p._2.toString)).toDF I gather it sees it at RDD[Char]! Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.