Re: Need advice on hooking into Sql query plan

2015-11-05 Thread Yana Kadiyska
15 at 5:50 PM, Jörn Franke wrote: > Would it be possible to use views to address some of your requirements? > > Alternatively it might be better to parse it yourself. There are open > source libraries for it, if you need really a complete sql parser. Do you > want to do it on su

Need advice on hooking into Sql query plan

2015-11-05 Thread Yana Kadiyska
Hi folks, not sure if this belongs to dev or user list..sending to dev as it seems a bit convoluted. I have a UI in which we allow users to write ad-hoc queries against a (very large, partitioned) table. I would like to analyze the queries prior to execution for two purposes: 1. Reject under-cons

Re: Apache gives exception when running groupby on df temp table

2015-07-17 Thread Yana Kadiyska
I think that might be a connector issue. You say you are using Spark 1.4, are you also using 1.4 version of the Spark-cassandra-connector? The do have some bugs around this, e.g. https://datastax-oss.atlassian.net/browse/SPARKC-195. Also, I see that you import org.apache.spark.sql.cassandra.Cassand

Re: Problem with version compatibility

2015-06-25 Thread Yana Kadiyska
Jim, I do something similar to you. I mark all dependencies as provided and then make sure to drop the same version of spark-assembly in my war as I have on the executors. I don't remember if dropping in server/lib works, I think I ran into an issue with that. Would love to know "best practices" wh

[SparkSQL] HiveContext multithreading bug?

2015-05-18 Thread Yana Kadiyska
Hi folks, wanted to get a sanity check before opening a JIRA. I am trying to do the following: create a HiveContext, then from different threads: 1. Create a DataFrame 2. Name said df via registerTempTable 3. do a simple query via sql and dropTempTable My understanding is that since HiveContext

Re: K-Means And Class Tags

2015-01-08 Thread Yana Kadiyska
How about data.map(s=>s.split(",")).filter(_.length>1).map(good_entry=>Vectors.dense((Double.parseDouble(good_entry[0]), Double.parseDouble(good_entry[1])) ​ (full disclosure, I didn't actually run this). But after the first map you should have an RDD[Array[String]], then you'd discard everything

Re: Nabble mailing list mirror errors: "This post has NOT been accepted by the mailing list yet"

2014-12-13 Thread Yana Kadiyska
Since you mentioned this, I had a related quandry recently -- it also says that the forum archives "*u...@spark.incubator.apache.org "/* *d...@spark.incubator.apache.org *respectively, yet the "Community page" clearly says to email the @spark.apache.org list (but the nabble archive is linked right

Re: [Thrift,1.2 RC] what happened to parquet.hive.serde.ParquetHiveSerDe

2014-12-03 Thread Yana Kadiyska
ive.serde.ParquetHiveSerDe will be included as > before. > > Michael > > On Tue, Dec 2, 2014 at 9:31 AM, Yana Kadiyska > wrote: > >> Apologies if people get this more than once -- I sent mail to dev@spark >> last night and don't see it in the archives. Trying th

Fwd: [Thrift,1.2 RC] what happened to parquet.hive.serde.ParquetHiveSerDe

2014-12-02 Thread Yana Kadiyska
Apologies if people get this more than once -- I sent mail to dev@spark last night and don't see it in the archives. Trying the incubator list now...wanted to make sure it doesn't get lost in case it's a bug... -- Forwarded message -- From: Yana Kadiyska Date: Mon,

Trouble running tests

2014-10-09 Thread Yana
Hi, apologies if I missed a FAQ somewhere. I am trying to submit a bug fix for the very first time. Reading instructions, I forked the git repo (at c9ae79fba25cd49ca70ca398bc75434202d26a97) and am trying to run tests. I run this: ./dev/run-tests _SQL_TESTS_ONLY=true and after a while get the fo