Hello Atul,

sorry for the long turnaround time. I finally had the time to spin up the code 
from Python. I simply did some tests with a table of New York Taxi trip data 
and Apache Drill. Using the bundled JDBC driver and JayDeBeAPI, the default for 
accessing JDBC from Python, it took 11 minutes to retrieve 739373 rows from the 
DB to Pandas. Using the Arrow JDBC adapter instead, this did run in 3.8s on my 
laptop instead. This is only 4 times slower than loading the backing Parquet 
file directly. This is a massive improvement.

I will try to look at a bit more about making it simpler to use from Python but 
this a really great example about how Arrow connects ecosystems at speed.

Regards,
Uwe

On Fri, Jun 22, 2018, at 12:41 PM, Atul Dambalkar wrote:
> Hi Wes, Uwe, Sid, Laurent,
> 
> I have now marked the JDBC Adapter related JIRA 
> (https://issues.apache.org/jira/browse/ARROW-1780) as resolved. Uwe/Wes 
> had already marked the feature for 0.10.0 release. I will continue to 
> monitor and support the feature for any issues. I remember, Uwe wanted 
> to use it in his development.
> 
> Appreciate your inputs during the development of this feature.
> 
> Regards,
> -Atul

Reply via email to