Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-03-12 Thread Jakub Wozniak
Hello, Any more thoughts on this one? Will that be let in 2.4.1 or rather not? Thanks in advance, Jakub On 8 Mar 2019, at 11:26, Jakub Wozniak mailto:jakub.wozn...@cern.ch>> wrote: Hi, To me it is backwards compatible with older Hbase versions. The code actually only falls back to the

Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-03-08 Thread Jakub Wozniak
akub On 8 Mar 2019, at 11:15, Jakub Wozniak mailto:jakub.wozn...@cern.ch>> wrote: I guess it is that one: https://github.com/apache/spark/commit/dfed439e33b7bf224dd412b0960402068d961c7b#diff-9ebb59b7b008c694a8f583b94bd24e1d Cheers, Jakub On 7 Mar 2019, at 17:25, Sean Owen mailto:sro...

Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-03-08 Thread Jakub Wozniak
2019 at 8:57 AM Jakub Wozniak mailto:jakub.wozn...@cern.ch>> wrote: Hello, I have a question regarding the 2.4.1 release. It looks like Spark 2.4 (and 2.4.1-rc) is not exactly compatible with Hbase 2.x+ for the Yarn mode. The problem is in the org.apache.spark.deploy.security.HbaseDelegat

Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-03-07 Thread Jakub Wozniak
Hello, I have a question regarding the 2.4.1 release. It looks like Spark 2.4 (and 2.4.1-rc) is not exactly compatible with Hbase 2.x+ for the Yarn mode. The problem is in the org.apache.spark.deploy.security.HbaseDelegationTokenProvider class that expects a specific version of TokenUtil class

Re: Very slow complex type column reads from parquet

2018-06-15 Thread Jakub Wozniak
you have any recommendation / experience with that? Thanks a lot for your help, Jakub On 14 Jun 2018, at 12:07, Jakub Wozniak mailto:jakub.wozn...@cern.ch>> wrote: Dear Ryan, Thanks a lot for your answer. After having sent the e-mail we have investigated a bit more the data itse

Re: Very slow complex type column reads from parquet

2018-06-14 Thread Jakub Wozniak
if it is that much slower. I'd be happy to see vectorization for nested Parquet data move forward, but I think you might want to get an idea of how much it will help before you move forward with it. Can you use Impala to test whether vectorization would help here? rb On Mon, Jun 11, 2018 at

Very slow complex type column reads from parquet

2018-06-11 Thread Jakub Wozniak
.2.1. Best regards, Jakub Wozniak - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Custom datasource as a wrapper for existing ones?

2018-05-03 Thread Jakub Wozniak
ormat`, which is the API for the Spark builtin file-based data source like parquet. It's an internal API but has not been changed for a long time. In the future, data source v2 would be the best solution. Thanks, Wenchen On Thu, May 3, 2018 at 4:17 AM, Jakub Wozniak mailto:jakub.wozn...@c

Re: Custom datasource as a wrapper for existing ones?

2018-05-02 Thread Jakub Wozniak
oach looked like a more elegant solution. Only >> the performance is still far from the desired one. >> >> Any help or direction in that matter would be greatly appreciated as we have >> only started to build our Spark expertise yet. >> >> Best regards, >> Jakub Wozniak >> Software Engineer >> CERN >> >> >> >> -- >> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ >> >> - >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>