Re: Skip Corrupted Parquet blocks / footer.

2017-01-01 Thread Abhishek
You will have to change the metadata file under _spark_metadata folder to remove the listing of corrupt files. Thanks, Shobhit G Sent from my iPhone > On Dec 31, 2016, at 8:11 PM, khyati [via Apache Spark Developers List] > wrote: > > Hi, > > I am trying to read the multiple parquet files

Custom datasource: when acquire and release a lock?

2019-05-24 Thread Abhishek Somani
called again as the RDD would be reused and the partitions would have gotten cached in the RDD. Can someone advice me on where would be the right places to acquire and release a lock with my data endpoint in this scenario. Thanks a lot, Abhishek Somani

Re: Custom datasource: when acquire and release a lock?

2019-05-26 Thread Abhishek Somani
Hi experts, I'll be very grateful if someone could help. Thanks, Abhishek On Fri, May 24, 2019 at 7:06 PM Abhishek Somani wrote: > Hi experts, > > I am trying to create a custom Spark Datasource(v1) to read from a > transactional data endpoint, and I need to acquire a lock

Re: Custom datasource: when acquire and release a lock?

2019-05-27 Thread Abhishek Somani
endpoint provides me an api to acquireLock() and one to releaseLock() (which it stores in mysql behind the scenes). Thanks again! Abhishek On Mon, May 27, 2019 at 10:38 AM Jörn Franke wrote: > What does your data source structure look like? > Can’t you release it at the end of the build scan

New Spark Datasource for Hive ACID tables

2019-07-26 Thread Abhishek Somani
e tables via Spark as well. The datasource is also available as a spark package, and instructions on how to use it are available on the Github page <https://github.com/qubole/spark-acid>. We welcome your feedback and suggestions. Thanks, Abhishek Somani

Re: New Spark Datasource for Hive ACID tables

2019-07-26 Thread Abhishek Somani
Hey Naresh, Thanks for your question. Yes it will work! Thanks, Abhishek Somani On Fri, Jul 26, 2019 at 7:08 PM naresh Goud wrote: > Thanks Abhishek. > > Will it work on hive acid table which is not compacted ? i.e table having > base and delta files? > > Let’s say hive a

Unsubscribe

2020-12-23 Thread Abhishek sm
Unsubscribe On Thu, Dec 24, 2020, 6:59 AM Shril Kumar wrote: > Unsubscribe >

Unsubscribe

2018-06-27 Thread Tripathi, Abhishek
Unsubscribe This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, di

Re: [Spark Structured Streaming on K8S]: Debug - File handles/descriptor (unix pipe) leaking

2018-07-23 Thread Abhishek Tripathi
/6f838adf6651491bd4f263956f403c74 Thanks. Best Regards, *Abhishek Tripath* On Thu, Jul 19, 2018 at 10:02 AM Abhishek Tripathi wrote: > Hello All!​​ > I am using spark 2.3.1 on kubernetes to run a structured streaming spark > job which read stream from Kafka , perform some window aggregation and > output s

Re: StructuredStreaming status

2016-10-19 Thread Abhishek R. Singh
the other essentials (which are thankfully getting addressed). Any guidance on (timelines for) expected exit from alpha state would also be greatly appreciated. -Abhishek- > On Oct 19, 2016, at 5:36 PM, Matei Zaharia wrote: > > I'm also curious whether there are concerns othe

Re: Grouping runs of elements in a RDD

2015-06-30 Thread Abhishek R. Singh
could you use a custom partitioner to preserve boundaries such that all related tuples end up on the same partition? On Jun 30, 2015, at 12:00 PM, RJ Nowling wrote: > Thanks, Reynold. I still need to handle incomplete groups that fall between > partition boundaries. So, I need a two-pass appr

Re: Writing to multiple outputs in Spark

2015-08-14 Thread Abhishek R. Singh
A workaround would be to have multiple passes on the RDD and each pass write its own output? Or in a foreachPartition do it in a single pass (open up multiple files per partition to write out)? -Abhishek- On Aug 14, 2015, at 7:56 AM, Silas Davis wrote: > Would it be right to assume that

RE: Spark 3 is Slower than Spark 2 for TPCDS Q04 query.

2021-12-19 Thread Rao, Abhishek (Nokia - IN/Bangalore)
Regards, Abhishek From: Senthil Kumar Sent: Sunday, December 19, 2021 11:58 PM To: dev Subject: Spark 3 is Slower than Spark 2 for TPCDS Q04 query. Hi All, We are comparing Spark 2.4.5 and Spark 3(without enabling spark 3 additional features) with TPCDS queries and found that Spark 3's perfor

RE: IPv6 support

2020-10-19 Thread Rao, Abhishek (Nokia - IN/Bangalore)
and Regards, Abhishek From: Steve Loughran Sent: Wednesday, July 17, 2019 4:52 PM To: dev@spark.apache.org Subject: Re: IPv6 support Fairly neglected hadoop patch, FWIW; https://issues.apache.org/jira/browse/HADOOP-11890 FB have been running HDFS &c on IPv6 for a while, but their codebase