subject:"Re\: \[DISCUSS\] Spark Columnar Processing"

Re: [DISCUSS] Spark Columnar Processing

2019-04-13 Thread Bobby Evans

s code for element-wise >> selection (excluding sort and join). The SIMDzation or GPUization >> capability depends on a compiler that translates native code from the code >> generated by the whole-stage codegen. >> >> 3. The current Projection assume to store row-oriented data,

Re: [DISCUSS] Spark Columnar Processing

2019-04-11 Thread Reynold Xin

t;>>>> >>>>>>> We split it this way because we thought it would be simplest to >>>>>>> implement, >>>>>>> and because it would provide a benefit to more than just GPU accelerated >>>>>>> queries. >>&

Re: [DISCUSS] Spark Columnar Processing

2019-04-11 Thread Bobby Evans

the current structure and remaining issues. This is >>>>>> orthogonal to cost-benefit trade-off discussion. >>>>>> >>>>>> The code generation basically consists of three parts. >>>>>> 1. Loading >>>>>> 2. Sele

Re: [DISCUSS] Spark Columnar Processing

2019-04-05 Thread Bobby Evans

ColumnVector ( >>>>> https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java) >>>>> class. By combining with ColumnarBatchScan, the whole-stage code >>>>> generation >>>>> generate code

Re: [DISCUSS] Spark Columnar Processing

2019-04-03 Thread Bobby Evans

rage if there is >>>> no row-based operation. >>>> Note: The current master does not support Arrow as a data source. >>>> However, I think it is not technically hard to support Arrow. >>>> >>>> 2. The current whole-stage codegen generates

Re: [DISCUSS] Spark Columnar Processing

2019-04-02 Thread Renjie Liu

gt;>> 2. The current whole-stage codegen generates code for element-wise >>> selection (excluding sort and join). The SIMDzation or GPUization >>> capability depends on a compiler that translates native code from the code >>> generated by the whole-stage codegen. &

Re: [DISCUSS] Spark Columnar Processing

2019-04-02 Thread Bobby Evans

store row-oriented data, I think that >> is a part that Wenchen pointed out >> >> My slides >> https://www.slideshare.net/ishizaki/making-hardware-accelerator-easier-to-use/41 >> <https://www.slideshare.net/ishizaki/making-hardware-accelerator-easier-to-use>ma

Re: [DISCUSS] Spark Columnar Processing

2019-04-01 Thread Reynold Xin

give a presentation about in-memory data storages for SPark at >> SAIS 2019 >> https://databricks.com/sparkaisummit/north-america/sessions-single-2019?id=40 >> ( >> https://databricks.com/sparkaisummit/north-america/sessions-single-2019?id=40 >> ) :) >> >>

Re: [DISCUSS] Spark Columnar Processing

2019-03-27 Thread Bobby Evans

> :) > > Kazuaki Ishizaki > > > > From:Wenchen Fan > To:Bobby Evans > Cc:Spark dev list > Date:2019/03/26 13:53 > Subject:Re: [DISCUSS] Spark Columnar Processing > -- > > > > Do y

Re: [DISCUSS] Spark Columnar Processing

2019-03-26 Thread Kazuaki Ishizaki

list Date: 2019/03/26 13:53 Subject:Re: [DISCUSS] Spark Columnar Processing Do you have some initial perf numbers? It seems fine to me to remain row-based inside Spark with whole-stage-codegen, and convert rows to columnar batches when communicating with external systems. On Mon, Mar

Re: [DISCUSS] Spark Columnar Processing

2019-03-26 Thread Bobby Evans

Reynold, >From our experiments, it is not a massive refactoring of the code. Most expressions can be supported by a relatively small change while leaving the existing code path untouched. We didn't try to do columnar with code generation, but I suspect it would be similar, although the code gene

Re: [DISCUSS] Spark Columnar Processing

2019-03-26 Thread Reynold Xin

26% improvement is underwhelming if it requires massive refactoring of the codebase. Also you can't just add the benefits up this way, because: - Both vectorization and codegen reduces the overhead in virtual function calls - Vectorization code is more friendly to compilers / CPUs, but requires

Re: [DISCUSS] Spark Columnar Processing

2019-03-26 Thread Bobby Evans

Cloudera reports a 26% improvement in hive query runtimes by enabling vectorization. I would expect to see similar improvements but at the cost of keeping more data in memory. But remember this also enables a number of different hardware acceleration techniques. If the data format is arrow compat

Re: [DISCUSS] Spark Columnar Processing

2019-03-25 Thread Wenchen Fan

Do you have some initial perf numbers? It seems fine to me to remain row-based inside Spark with whole-stage-codegen, and convert rows to columnar batches when communicating with external systems. On Mon, Mar 25, 2019 at 1:05 PM Bobby Evans wrote: > This thread is to discuss adding in support fo

Re: [DISCUSS] Spark Columnar Processing

Re: [DISCUSS] Spark Columnar Processing

Re: [DISCUSS] Spark Columnar Processing

Re: [DISCUSS] Spark Columnar Processing

Re: [DISCUSS] Spark Columnar Processing

Re: [DISCUSS] Spark Columnar Processing

Re: [DISCUSS] Spark Columnar Processing

Re: [DISCUSS] Spark Columnar Processing

Re: [DISCUSS] Spark Columnar Processing

Re: [DISCUSS] Spark Columnar Processing

Re: [DISCUSS] Spark Columnar Processing

Re: [DISCUSS] Spark Columnar Processing

Re: [DISCUSS] Spark Columnar Processing

Re: [DISCUSS] Spark Columnar Processing

14 matches

Site Navigation

Mail list logo

Footer information