Would the rdd resulting from the below query be partitioned on GEO_REGION, GEO_COUNTRY? I ran some tests(using mapPartitions on the resulting RDD) and seems that there are always 50 partitions generated while there should be around 1000.
/"SELECT * FROM spark_poc.<table_name>DISTRIBUTE BY GEO_REGION, GEO_COUNTRY SORT BY IP_ADDRESS, COOKIE_ID"/ If not, how can I partition the data based on an attribute/combination of attributes in data. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Does-JavaSchemaRDD-inherit-the-Hive-partitioning-of-data-tp17410.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org