Re: Elasticsearch support for Spark 3.x

2023-09-08 Thread Dipayan Dev
@Alfie Davidson : Awesome, it worked with "“org.elasticsearch.spark.sql”" But as soon as I switched to *elasticsearch-spark-20_2.12, *"es" also worked. On Fri, Sep 8, 2023 at 12:45 PM Dipayan Dev wrote: > > Let me try that and get back. Just wondering, if there a ch

Re: Elasticsearch support for Spark 3.x

2023-09-08 Thread Dipayan Dev
t; > Sent from my iPhone > > On 8 Sep 2023, at 03:10, Dipayan Dev wrote: > >  > > ++ Dev > > On Thu, 7 Sep 2023 at 10:22 PM, Dipayan Dev > wrote: > >> Hi, >> >> Can you please elaborate your last response? I don’t have any external >> depende

Re: Elasticsearch support for Spark 3.x

2023-09-07 Thread Dipayan Dev
s also not somehow already provided by your spark cluster (this > is what it means), then yeah this is not anywhere on the class path at > runtime. Remove the provided scope. > > On Thu, Sep 7, 2023, 4:09 PM Dipayan Dev wrote: > >> Hi, >> >> Can you please elabora

Re: Elasticsearch support for Spark 3.x

2023-09-07 Thread Dipayan Dev
n, Aug 27, 2023 at 2:58 PM Dipayan Dev > wrote: > >> Using the following dependency for Spark 3 in POM file (My Scala version >> is 2.12.14) >> >> >> >> >> >> >> *org.elasticsearch >> elasticsearch-spark-30_2.12 >> 7.12.0

Re: Elasticsearch support for Spark 3.x

2023-09-07 Thread Dipayan Dev
++ Dev On Thu, 7 Sep 2023 at 10:22 PM, Dipayan Dev wrote: > Hi, > > Can you please elaborate your last response? I don’t have any external > dependencies added, and just updated the Spark version as mentioned below. > > Can someone help me with this? > > On Fri, 1 Se

Re: Elasticsearch support for Spark 3.x

2023-08-27 Thread Dipayan Dev
ndex_name") The same code is working with Spark 2.4.0 and the following dependency *org.elasticsearch elasticsearch-spark-20_2.12 7.12.0* On Mon, 28 Aug 2023 at 12:17 AM, Holden Karau wrote: > What’s the version of the ES connector you are using? > > On Sat, Aug 26, 2023 at

Elasticsearch support for Spark 3.x

2023-08-26 Thread Dipayan Dev
Hi All, We're using Spark 2.4.x to write dataframe into the Elasticsearch index. As we're upgrading to Spark 3.3.0, it throwing out error Caused by: java.lang.ClassNotFoundException: es.DefaultSource at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476) at java.base/java.lang.Cla

Unsubscribe

2023-08-25 Thread Dipayan Dev

Unsubscribe

2023-08-23 Thread Dipayan Dev
Unsubscribe

Unsubscribe

2023-08-21 Thread Dipayan Dev
-- With Best Regards, Dipayan Dev Author of *Deep Learning with Hadoop <https://www.amazon.com/Deep-Learning-Hadoop-Dipayan-Dev/dp/1787124762>* M.Tech (AI), IISc, Bangalore

Spark doesn’t create SUCCESS file when external path is passed

2023-08-21 Thread Dipayan Dev
on the SUCCESS file. Please let me know if this is a bug or I need to any additional configuration to fix this in Spark 3.3.0. Happy to contribute if you suggest. -- With Best Regards, Dipayan Dev Author of *Deep Learning with Hadoop <https://www.amazon.com/Deep-Learning-Hadoop-Dipayan-

Re: Probable Spark Bug while inserting into flat GCS bucket?

2023-08-20 Thread Dipayan Dev
Hi Mich, It's not specific to ORC, and looks like a bug from Hadoop Common project. I have raised a bug and am happy to contribute to Hadoop 3.3.0 version. Do you know if anyone could help me to set the Assignee? https://issues.apache.org/jira/browse/HADOOP-18856 With Best Regards, Dipaya

Probable Spark Bug while inserting into flat GCS bucket?

2023-08-19 Thread Dipayan Dev
List("num").map(x => x) DF.write.option("path", "gs://test_dd1/abc/").mode(SaveMode.Overwrite).partitionBy(partKey: _*).format("orc").saveAsTable("us_wm_supply_chain_otif_stg.test_tb2") val DF1 = Seq(("test2", 125)).toDF("name", "num") DF1.write.mode(SaveMode.Overwrite).format("orc").insertInto("us_wm_supply_chain_otif_stg.test_tb2") With Best Regards, Dipayan Dev

[no subject]

2023-08-18 Thread Dipayan Dev
Unsubscribe -- With Best Regards, Dipayan Dev Author of *Deep Learning with Hadoop <https://www.amazon.com/Deep-Learning-Hadoop-Dipayan-Dev/dp/1787124762>* M.Tech (AI), IISc, Bangalore

Re: Spark File Output Committer algorithm for GCS

2023-07-21 Thread Dipayan Dev
rious what this actually does? With Best Regards, Dipayan Dev On Wed, Jul 19, 2023 at 2:25 PM Dipayan Dev wrote: > Thank you. Will try out these options. > > > > With Best Regards, > > > > On Wed, Jul 19, 2023 at 1:40 PM Mich Talebzadeh > wrote: > >> Soun

Re: Spark File Output Committer algorithm for GCS

2023-07-19 Thread Dipayan Dev
esponsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruct

Re: Spark File Output Committer algorithm for GCS

2023-07-18 Thread Dipayan Dev
it deletes and copies the partitions. My issue is something related to this - https://groups.google.com/g/cloud-dataproc-discuss/c/neMyhytlfyg?pli=1 With Best Regards, Dipayan Dev On Wed, Jul 19, 2023 at 12:06 AM Mich Talebzadeh wrote: > Spark has no role in creating that hive stag

Re: Spark File Output Committer algorithm for GCS

2023-07-18 Thread Dipayan Dev
at 9:47 PM, Dipayan Dev wrote: > Thanks Jay, is there any suggestion how much I can increase those > parameters? > > On Mon, 17 Jul 2023 at 8:25 PM, Jay wrote: > >> Fileoutputcommitter v2 is supported in GCS but the rename is a metadata >> copy and delete operation in

Re: Spark File Output Committer algorithm for GCS

2023-07-17 Thread Dipayan Dev
responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, da

Re: Spark File Output Committer algorithm for GCS

2023-07-17 Thread Dipayan Dev
amic updates in Spark. >> >> >> On Mon, 17 Jul 2023 at 7:05 PM, Jay wrote: >> >>> You can try increasing fs.gs.batch.threads and >>> fs.gs.max.requests.per.batch. >>> >>> The definitions for these flags are available here - >>> https:/

Re: Spark File Output Committer algorithm for GCS

2023-07-17 Thread Dipayan Dev
ing fs.gs.batch.threads and > fs.gs.max.requests.per.batch. > > The definitions for these flags are available here - > https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/master/gcs/CONFIGURATION.md > > On Mon, 17 Jul 2023 at 14:59, Dipayan Dev wrote: > >> No, I am using Spark 2.

Re: Spark File Output Committer algorithm for GCS

2023-07-17 Thread Dipayan Dev
dates and I need to update around 3 years of data. It usually takes 3 hours to finish the process. Anyway to speed up this? With Best Regards, Dipayan Dev On Mon, Jul 17, 2023 at 1:53 PM Mich Talebzadeh wrote: > So you are using GCP and your Hive is installed on Dataproc which happens >

Spark File Output Committer algorithm for GCS

2023-07-17 Thread Dipayan Dev
on the pros and cons of using this version? Or any ongoing Spark feature development to address this issue? With Best Regards, Dipayan Dev

Contributing to Spark MLLib

2023-07-16 Thread Dipayan Dev
there any new features in line and the best way to explore this? Looking forward to little guidance to start with. Thanks Dipayan -- With Best Regards, Dipayan Dev Author of *Deep Learning with Hadoop <https://www.amazon.com/Deep-Learning-Hadoop-Dipayan-Dev/dp/1787124762>* M.Tech (AI)