Yeah, CatalystScan should give you everything we can possibly push down in
raw form.  Note that this is not compatible across different spark versions.

On Thu, Nov 19, 2015 at 8:55 AM, james.gre...@baesystems.com <
james.gre...@baesystems.com> wrote:

> Thanks Hao
>
>
>
> I have written a new Data Source based on ParquetRelation and I have just
> retested what I had said about not getting anything extra when I change it
> over to a CatalystScan instead of PrunedFilteredScan and ooops it seems to
> work fine.
>
>
>
>
>
>
>
>
>
>
>
> *From:* Cheng, Hao [mailto:hao.ch...@intel.com]
> *Sent:* 19 November 2015 15:30
> *To:* Green, James (UK Guildford); dev@spark.apache.org
> *Subject:* RE: new datasource
>
>
>
> I think you probably need to write some code as you need to support the
> ES, there are 2 options per my understanding:
>
>
>
> Create a new Data Source from scratch, but you probably need to overwrite
> the interface at:
>
>
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala#L751
>
>
>
> Or you can reuse most of code in ParquetRelation in the new DataSource,
> but also need to modify your own logic, see
>
>
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala#L285
>
>
>
> Hope it helpful.
>
>
>
> Hao
>
> *From:* james.gre...@baesystems.com [mailto:james.gre...@baesystems.com
> <james.gre...@baesystems.com>]
> *Sent:* Thursday, November 19, 2015 11:14 PM
> *To:* dev@spark.apache.org
> *Subject:* new datasource
>
>
>
>
>
> We have written a new Spark DataSource that uses both Parquet and 
> ElasticSearch.  It is based on the existing Parquet DataSource.   When I look 
> at the filters being pushed down to buildScan I don’t get anything 
> representing any filters based on UDFs – or for any fields generated by an 
> explode – I had thought if I made it a CatalystScan I would get everything I 
> needed.
>
>
>
> This is fine from the Parquet point of view – but we are using ElasticSearch 
> to index/filter the data we are searching and I need to be able to capture 
> the UDF conditions – or have access to the Plan AST in order that I can 
> construct a query for ElasticSearch.
>
>
>
> I am thinking I might just need to patch Spark to do this – but I’d prefer 
> not too if there is a way of getting round this without hacking the core 
> code.  Any ideas?
>
>
>
> Thanks
>
>
>
> James
>
>
>
> Please consider the environment before printing this email. This message
> should be regarded as confidential. If you have received this email in
> error please notify the sender and destroy it immediately. Statements of
> intent shall only become binding when confirmed in hard copy by an
> authorised signatory. The contents of this email may relate to dealings
> with other companies under the control of BAE Systems Applied Intelligence
> Limited, details of which can be found at
> http://www.baesystems.com/Businesses/index.htm.
> Please consider the environment before printing this email. This message
> should be regarded as confidential. If you have received this email in
> error please notify the sender and destroy it immediately. Statements of
> intent shall only become binding when confirmed in hard copy by an
> authorised signatory. The contents of this email may relate to dealings
> with other companies under the control of BAE Systems Applied Intelligence
> Limited, details of which can be found at
> http://www.baesystems.com/Businesses/index.htm.
>

Reply via email to