Re: Bitmap Indexing to increase OLAP query performance

2016-06-29 Thread Nishadi Kirielle
Thank you for the response. Can I please know the reason why bit map indexes are not appropriate for big data. Rather than using the traditional bitmap indexing techniques we are planning to implement a combination of novel bitmap indexing techniques like bit sliced indexes and projection indexes.

Re: Spark 2.0 Performance drop

2016-06-29 Thread Michael Allman
The patch we use in production is for 1.5. We're porting the patch to master (and downstream to 2.0, which is presently very similar) with the intention of submitting a PR "soon". We'll push it here when it's ready: https://github.com/VideoAmp/spark-public. Regarding benchmarking, we have a sui

Re: Spark 2.0 Performance drop

2016-06-29 Thread Maciej Bryński
2016-06-29 23:22 GMT+02:00 Michael Allman : > I'm sorry I don't have any concrete advice for you, but I hope this helps > shed some light on the current support in Spark for projection pushdown. > > Michael Michael, Thanks for the answer. This resolves one of my questions. Which Spark version you

Re: Spark 2.0 Performance drop

2016-06-29 Thread Michael Allman
Hi Maciej, In Spark, projection pushdown is currently limited to top-level columns (StructFields). VideoAmp has very large parquet-based tables (many billions of records accumulated per day) with deeply nested schema (four or five levels), and we've spent a considerable amount of time optimizin

Spark 2.0 Performance drop

2016-06-29 Thread Maciej Bryński
Hi, Did anyone measure performance of Spark 2.0 vs Spark 1.6 ? I did some test on parquet file with many nested columns (about 30G in 400 partitions) and Spark 2.0 is sometimes 2x slower. I tested following queries: 1) select count(*) where id > some_id In this query we have PPD and performance i

Re: Bitmap Indexing to increase OLAP query performance

2016-06-29 Thread Jörn Franke
Is it the traditional bitmap indexing? I would not recommend it for big data. You could use bloom filters and min/max indexes in-memory which look to be more appropriate. However, if you want to use bitmap indexes then you would have to do it as you say. However, bitmap indexes may consume a lo

Bitmap Indexing to increase OLAP query performance

2016-06-29 Thread Nishadi Kirielle
Hi All, I am a CSE undergraduate and as for our final year project, we are expecting to construct a cluster based, bit-oriented analytic platform (storage engine) to provide fast query performance when used for OLAP with the use of novel bitmap indexing techniques when and where appropriate. For

test

2016-06-29 Thread Gav
ignore -- Gav...