Re: Spark -- Writing to Partitioned Persistent Table

2015-10-30 Thread Bryan Jeffrey
create the table in two separate >> metastores and simply use the same storage location for the data. That >> seems very hacky though, and likely to result in maintenance issues. >> >> Regards, >> >> Bryan Jeffrey >> ------ >

Re: Spark -- Writing to Partitioned Persistent Table

2015-10-29 Thread Deenar Toraskar
28/‎2015 8:32 PM > To: Bryan Jeffrey > Cc: Susan Zhang ; user > Subject: Re: Spark -- Writing to Partitioned Persistent Table > > For this issue in particular ( ERROR XSDB6: Another instance of Derby may > have already booted the database /spark/spark-1.4.1/metastore_db) -- I >

RE: Spark -- Writing to Partitioned Persistent Table

2015-10-28 Thread Bryan
storage location for the data. That seems very hacky though, and likely to result in maintenance issues. Regards, Bryan Jeffrey -Original Message- From: "Yana Kadiyska" Sent: ‎10/‎28/‎2015 8:32 PM To: "Bryan Jeffrey" Cc: "Susan Zhang" ; "user" Subj

Re: Spark -- Writing to Partitioned Persistent Table

2015-10-28 Thread Yana Kadiyska
For this issue in particular ( ERROR XSDB6: Another instance of Derby may have already booted the database /spark/spark-1.4.1/metastore_db) -- I think it depends on where you start your application and HiveThriftserver from. I've run into a similar issue running a driver app first, which would crea

Re: Spark -- Writing to Partitioned Persistent Table

2015-10-28 Thread Jerry Lam
or the note. It sounds like you were able to get further than I > have been - any insight? Just a Spark 1.4.1 vs Spark 1.5? > > Regards, > > Bryan Jeffrey > From: Jerry Lam > Sent: ‎10/‎28/‎2015 6:29 PM > To: Bryan Jeffrey > Cc: Susan Zhang; user > Subject: Re: Spa

RE: Spark -- Writing to Partitioned Persistent Table

2015-10-28 Thread Bryan
Zhang" ; "user" Subject: Re: Spark -- Writing to Partitioned Persistent Table Hi Bryan, Did you read the email I sent few days ago. There are more issues with partitionBy down the road: https://www.mail-archive.com/user@spark.apache.org/msg39512.html Best Regards, Je

Re: Spark -- Writing to Partitioned Persistent Table

2015-10-28 Thread Jerry Lam
Hi Bryan, Did you read the email I sent few days ago. There are more issues with partitionBy down the road: https://www.mail-archive.com/user@spark.apache.org/msg39512.html Best Regards, Jerry > On Oct 28, 2015, at 4:52 PM, B

Re: Spark -- Writing to Partitioned Persistent Table

2015-10-28 Thread Bryan Jeffrey
The second issue I'm seeing is an OOM issue when writing partitioned data. I am running Spark 1.4.1, Scala 2.11, Hadoop 2.6.1 & using the Hive libraries packaged with Spark. Spark was compiled using the following: mvn -Dhadoop.version=2.6.1 -Dscala-2.11 -DskipTests -Pyarn -Phive -Phive-thriftserve

Re: Spark -- Writing to Partitioned Persistent Table

2015-10-28 Thread Bryan Jeffrey
All, One issue I'm seeing is that I start the thrift server (for jdbc access) via the following: /spark/spark-1.4.1/sbin/start-thriftserver.sh --master spark://master:7077 --hiveconf "spark.cores.max=2" After about 40 seconds the Thrift server is started and available on default port 1. I th

Re: Spark -- Writing to Partitioned Persistent Table

2015-10-28 Thread Bryan Jeffrey
Susan, I did give that a shot -- I'm seeing a number of oddities: (1) 'Partition By' appears only accepts alphanumeric lower case fields. It will work for 'machinename', but not 'machineName' or 'machine_name'. (2) When partitioning with maps included in the data I get odd string conversion issu

Re: Spark -- Writing to Partitioned Persistent Table

2015-10-28 Thread Susan Zhang
Have you tried partitionBy? Something like hiveWindowsEvents.foreachRDD( rdd => { val eventsDataFrame = rdd.toDF() eventsDataFrame.write.mode(SaveMode.Append).partitionBy(" windows_event_time_bin").saveAsTable("windows_event") }) On Wed, Oct 28, 2015 at 7:41 AM, Bryan Jeffrey