Re: Possible to push sub-queries down into the DataSource impl?

2016-08-01 Thread Timothy Potter
yes, that's exactly what I was looking for, thanks for the pointer ;-) On Thu, Jul 28, 2016 at 1:07 AM, Takeshi Yamamuro wrote: > Hi, > > Have you seen this ticket? > https://issues.apache.org/jira/browse/SPARK-12449 > > // maropu > > On Thu, Jul 28, 2016 at 2:1

Re: Possible to push sub-queries down into the DataSource impl?

2016-07-27 Thread Timothy Potter
cache it, of multiple queries on the > same inner queries are requested. > > > Il mercoledì 27 luglio 2016, Timothy Potter ha > scritto: >> >> Take this simple join: >> >> SELECT m.title as title, solr.aggCount as aggCount FROM movies m INNER >> JOI

Possible to push sub-queries down into the DataSource impl?

2016-07-27 Thread Timothy Potter
Take this simple join: SELECT m.title as title, solr.aggCount as aggCount FROM movies m INNER JOIN (SELECT movie_id, COUNT(*) as aggCount FROM ratings WHERE rating >= 4 GROUP BY movie_id ORDER BY aggCount desc LIMIT 10) as solr ON solr.movie_id = m.movie_id ORDER BY aggCount DESC I would like the

How to do some pre-processing of the SQL in the Thrift server?

2016-06-21 Thread Timothy Potter
I'm using the Spark Thrift server to execute SQL queries over JDBC. I'm wondering if it's possible to plugin a class to do some pre-processing on the SQL statement before it gets passed to the SQLContext for actual execution? I scanned over the code and it doesn't look like this is supported but I

Re: Strange ML pipeline errors from HashingTF using v1.6.1

2016-03-29 Thread Timothy Potter
FWIW - I synchronized access to the transformer and the problem went away so this looks like some type of concurrent access issue when dealing with UDFs On Tue, Mar 29, 2016 at 9:19 AM, Timothy Potter wrote: > It's a local spark master, no cluster. I'm not sure what you mean > a

Re: Strange ML pipeline errors from HashingTF using v1.6.1

2016-03-29 Thread Timothy Potter
s://twitter.com/jaceklaskowski > > > On Mon, Mar 28, 2016 at 7:11 PM, Timothy Potter wrote: >> I'm seeing the following error when trying to generate a prediction >> from a very simple ML pipeline based model. I've verified that the raw >> data sent to the token

Strange ML pipeline errors from HashingTF using v1.6.1

2016-03-28 Thread Timothy Potter
I'm seeing the following error when trying to generate a prediction from a very simple ML pipeline based model. I've verified that the raw data sent to the tokenizer is valid (not null). It seems like this is some sort of weird classpath or class loading type issue. Any help you can provide in tryi

selected field not getting pushed down into my DataSource?

2015-09-17 Thread Timothy Potter
I'm using Spark 1.4.1 and am doing the following with spark-shell: solr = sqlContext.read.format("solr").option("zkhost", "localhost:2181").option("collection","spark").load() solr.select("id").count() The Solr DataSource implements PrunedFilteredScan so I expected the buildScan method to get ca

Re: Task deserialization problem using 1.1.0 for Hadoop 2.4

2014-10-01 Thread Timothy Potter
Forgot to mention that I've tested that SerIntWritable and PipelineDocumentWritable are serializable by serializing / deserializing to/from a byte array in memory. On Wed, Oct 1, 2014 at 1:43 PM, Timothy Potter wrote: > I'm running into the following deserialization issue when tryi

Task deserialization problem using 1.1.0 for Hadoop 2.4

2014-10-01 Thread Timothy Potter
I'm running into the following deserialization issue when trying to run a very simple Java-based application using a local Master (see stack trace below). My code basically queries Solr using a custom Hadoop InputFormat. I've hacked my code to make sure the objects involved (PipelineDocumentWritab