Re: Ordering pushdown for Spark Datasources

2021-04-06 Thread Mich Talebzadeh
Lucene. I came across it years ago. Does Lucene support JDBC connection at all? How about Solr? HTH view my Linkedin profile *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction o

Re: Ordering pushdown for Spark Datasources

2021-04-05 Thread Kohki Nishio
The log data is stored in Lucene and I have a custom data source to access it. For example, the condition is log-level = INFO, this brings in a couple of million records per partition. Then there are hundreds of partitions involved in a query. Spark has to go through all the entries to show the fir

Re: Ordering pushdown for Spark Datasources

2021-04-05 Thread Mich Talebzadeh
Hi, A couple of clarifications: 1. How is the log data stored on say HDFS? 2. You stated show the first 100 entries for a given condition. That condition is a predicate itself? There are articles for predicate pushdown in Spark. For example check Using Spark predicate push down in Spa