Re: High count of Active Jobs

2025-04-16 Thread nayan sharma
HI Ángel, I haven't tried disabling speculation but I will try running in DEBUG mode. Thanks & Regards, Nayan Sharma *+91-8095382952* <https://www.linkedin.com/in/nayan-sharma> <http://stackoverflow.com/users/3687426/nayan-sharma?tab=profile> On Wed, Apr 16, 2025 at 1

Re: High count of Active Jobs

2025-04-13 Thread nayan sharma
Hi Ángel, Yes, speculation is enabled. I will lower log4j logging and share. It will take atleast 24hr before we can capture anything. Thanks, Nayan Thanks & Regards, Nayan Sharma *+91-8095382952* <https://www.linkedin.com/in/nayan-sharma> <http://stackoverflow.com/users/3687426

Re: High count of Active Jobs

2025-04-12 Thread nayan sharma
I found something https://issues.apache.org/jira/browse/SPARK-45101 I tried in a lower environment but found no issues. Although there is a version mismatch running spark 3.2.3 in lower env and *3.2.0 in production.* Thanks & Regards, Nayan Sharma *+91-8095382952* <https://www.linkedin

High count of Active Jobs

2025-04-05 Thread nayan sharma
. some say GC is not working or spark task scheduler is not in sync. Can anybody faced such issue in past or can guide me where to look, Thanks & Regards, Nayan Sharma *+91-8095382952* <https://www.linkedin.com/in/nayan-sharma> <http://stackoverflow.com/users/3687426/nayan-sharma?tab=profile>

Re: Null pointer exception while replying WAL

2024-02-12 Thread nayan sharma
e : " + sr.toString) sr.toString } } Thread.sleep(12) try{ * val messagesJson = spark.read.json(messages) ===> getting NPE here after restarting using WAL* messagesJson.write.mode("append").parquet(data) } catch {

Null pointer exception while replying WAL

2024-02-09 Thread nayan sharma
Hi Users, I am trying to build fault tolerant spark solace consumer. Issue :- we have to take restart of the job due to multiple issue load average is one of them. At that time whatever spark is processing or batches in the queue is lost. We can't replay it because we already had send ack while c

Kafka Spark Structure Streaming Error

2022-05-05 Thread nayan sharma
/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) Thanks & Regards, Nayan Sharma *+91-8095382952* <https://www.linkedin.com/in/nayan-sharma> <http://stackoverflow.com/users/3687426/nayan-sharma?tab=profile>

Re: Spark Druid Ingestion

2018-03-22 Thread nayan sharma
arn user and my job is also running from the same user. Thanks, Nayan > On Mar 22, 2018, at 12:54 PM, Jorge Machado wrote: > > Seems to me permissions problems ! Can you check your user / folder > permissions ? > > Jorge Machado > > > > > >&

Spark Druid Ingestion

2018-03-22 Thread nayan sharma
Hi All,As druid uses Hadoop MapReduce to ingest batch data but I am trying spark for ingesting data into druid taking reference from https://github.com/metamx/druid-spark-batchBut we are stuck at the following error.Application Log:—>2018-03-20T07:54:28,782 INFO [task-runner-0-priority-0] org.apach

Re: splitting columns into new columns

2017-07-17 Thread nayan sharma
val data = firtRow(idx).asInstanceOf[String].split("\\^") > var j = 0 > for(d<-data){ > schema = schema + colNames + j + "," > j = j+1 > } > } > schema=schema.substring(0,schema.length-1) > val sqlSchema = > StructTyp

Re: splitting columns into new columns

2017-07-17 Thread nayan sharma
t 3:29 AM, ayan guha wrote: > > You are looking for explode function. > > On Mon, 17 Jul 2017 at 4:25 am, nayan sharma <mailto:nayansharm...@gmail.com>> wrote: > I’ve a Dataframe where in some columns there are multiple values, always > separated by ^ > >

splitting columns into new columns

2017-07-16 Thread nayan sharma
I’ve a Dataframe where in some columns there are multiple values, always separated by ^ phone|contact| ERN~58XX7~^EPN~5X551~|C~MXXX~MSO~^CAxxE~~3XXX5| phone1|phone2|contact1|contact2| ERN~5XXX7|EPN~5891551~|C~MXXXH~MSO~|CAxxE~~3XXX5| How can this be achieved using loop a

ElasticSearch Spark error

2017-05-15 Thread nayan sharma
Hi All, ERROR:- Caused by: org.apache.spark.util.TaskCompletionListenerException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[10.0.1.8*:9200, 10.0.1.**:9200, 10.0.1.***:9200]] I am getting this error while trying to show the dataframe. df.count =5190767 a

Test

2017-05-15 Thread nayan sharma
Test - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Fwd: isin query

2017-04-17 Thread nayan sharma
il 2017 at 8:13:24 PM IST > To: nayan sharma , user@spark.apache.org > > How about using OR operator in filter? > > On Tue, 18 Apr 2017 at 12:35 am, nayan sharma <mailto:nayansharm...@gmail.com>> wrote: > Dataframe (df) having column msrid(String) having values > m

filter operation using isin

2017-04-17 Thread nayan sharma
Dataframe (df) having column msrid(String) having values m_123,m_111,m_145,m_098,m_666 I wanted to filter out rows which are having values m_123,m_111,m_145 df.filter($"msrid".isin("m_123","m_111","m_145")).count count =0 while df.filter($"msrid".isin("m_123")).count count=121212 I have tried

isin query

2017-04-17 Thread nayan sharma
Dataframe (df) having column msrid(String) having values m_123,m_111,m_145,m_098,m_666 I wanted to filter out rows which are having values m_123,m_111,m_145 df.filter($"msrid".isin("m_123","m_111","m_145")).count count =0 while df.filter($"msrid".isin("m_123")).count count=121212 I have tried

Re: Error while reading the CSV

2017-04-07 Thread nayan sharma
and let us know: > Command: > spark-submit --packages com.databricks:spark-csv_2.11:1.4.0 > > On Fri, 7 Apr 2017 at 00:39 nayan sharma <mailto:nayansharm...@gmail.com>> wrote: > spark version 1.6.2 > scala version 2.10.5 > >> On 06-Apr-2017, at 8:05 PM,

Re: Error while reading the CSV

2017-04-06 Thread nayan sharma
spark version 1.6.2 scala version 2.10.5 > On 06-Apr-2017, at 8:05 PM, Jörn Franke wrote: > > And which version does your Spark cluster use? > > On 6. Apr 2017, at 16:11, nayan sharma <mailto:nayansharm...@gmail.com>> wrote: > >> scalaVersion := “2.10.5&qu

Re: Error while reading the CSV

2017-04-06 Thread nayan sharma
scalaVersion := “2.10.5" > On 06-Apr-2017, at 7:35 PM, Jörn Franke wrote: > > Maybe your Spark is based on scala 2.11, but you compile it for 2.10 or the > other way around? > > On 6. Apr 2017, at 15:54, nayan sharma <mailto:nayansharm...@gmail.com>> wro

Re: Error while reading the CSV

2017-04-06 Thread nayan sharma
In addition I am using spark version 1.6.2 Is there any chance of error coming because of Scala version or dependencies are not matching.?I just guessed. Thanks, Nayan > On 06-Apr-2017, at 7:16 PM, nayan sharma wrote: > > Hi Jorn, > Thanks for replying. > > jar -tf cataly

Re: Error while reading the CSV

2017-04-06 Thread nayan sharma
t; Is the library in your assembly jar? > > On 6. Apr 2017, at 15:06, nayan sharma <mailto:nayansharm...@gmail.com>> wrote: > >> Hi All, >> I am getting error while loading CSV file. >> >> val >> datacsv=sqlContext.read.format("com.databric

Error while reading the CSV

2017-04-06 Thread nayan sharma
Hi All, I am getting error while loading CSV file. val datacsv=sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("timeline.csv") java.lang.NoSuchMethodError: org.apache.commons.csv.CSVFormat.withQuote(Ljava/lang/Character;)Lorg/apache/commons/csv/CSVFormat; I hav

skipping header in multiple files

2017-03-23 Thread nayan sharma
Hi, I wanted to skip all the headers of CSVs present in a directory. After searching on Google I got to know that it can be done using sc.wholetextfiles. Can any one suggest me how to do that in Scala.? Thanks & Regards, Nayan Sh

Re: Persist RDD doubt

2017-03-23 Thread nayan sharma
ipes > temp files > > On Thu, Mar 23, 2017 at 10:55 AM, Jörn Franke <mailto:jornfra...@gmail.com>> wrote: > What do you mean by clear ? What is the use case? > > On 23 Mar 2017, at 10:16, nayan sharma <mailto:nayansharm...@gmail.com>> wrote: > >>

Persist RDD doubt

2017-03-23 Thread nayan sharma
Does Spark clears the persisted RDD in case if the task fails ? Regards, Nayan