[Spark SQL] spark.sql insert overwrite on existing partition not updating hive metastore partition transient_lastddltime and column_stats

2025-05-01 Thread Pradeep
_ACCURATE -> {"BASIC_STATS":"true","COLUMN_STATS":{"name":"true"}}, numRows -> 1**) lastAccessTime:0 createTime:1746112922000 Map(event_partition -> 2024-01-06) Parameters:Map(rawDataSize -> 1, numFiles -> 1, transient_lastDdlTime -> 1746113178, totalSize -> 316, COLUMN_STATS_ACCURATE -> {"BASIC_STATS":"true","COLUMN_STATS":{"name":"true"}}, numRows -> 1) lastAccessTime:0 createTime:0 partitionValues: Unit = () Below is what I implemented in spark2.4 and used to work and after upgrading to spark3.x, this functionality is broken. Get all the new partitions that are written to Hive metastore by Spark <https://stackoverflow.com/questions/57202917/get-all-the-new-partitions-that-are-written-to-hive-metastore-by-spark> Regards, Pradeep

Dynamic executor scaling spark/Kubernetes

2019-04-16 Thread purna pradeep
Hello, Is Kubernetes Dynamic executor scaling for spark is available in latest release of spark I mean scaling the executors based on the work load vs preallocating number of executors for a spark job Thanks, Purna

Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-09 Thread purna pradeep
Thanks this is a great news Can you please lemme if dynamic resource allocation is available in spark 2.4? I’m using spark 2.3.2 on Kubernetes, do I still need to provide executor memory options as part of spark submit command or spark will manage required executor memory based on the spark job s

Spark 2.3.1: k8s driver pods stuck in Initializing state

2018-09-26 Thread purna pradeep
Hello , We're running spark 2.3.1 on kubernetes v1.11.0 and our driver pods from k8s are getting stuck in initializing state like so: NAME READY STATUS RESTARTS AGE my-pod-fd79926b819d3b34b05250e23347d0e7-driver 0/1 Init:0/1 0 18h And from

Spark 2.3.1: k8s driver pods stuck in Initializing state

2018-09-26 Thread Purna Pradeep Mamillapalli
We're running spark 2.3.1 on kubernetes v1.11.0 and our driver pods from k8s are getting stuck in initializing state like so: NAME READY STATUS RESTARTS AGE my-pod-fd79926b819d3b34b05250e23347d0e7-driver 0/1 Init:0/1 0 18h And from *kubectl

Re: spark driver pod stuck in Waiting: PodInitializing state in Kubernetes

2018-08-17 Thread purna pradeep
Resurfacing The question to get more attention Hello, > > im running Spark 2.3 job on kubernetes cluster >> >> kubectl version >> >> Client Version: version.Info{Major:"1", Minor:"9", >> GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", >> GitTreeState:"clean", BuildDa

Re: spark driver pod stuck in Waiting: PodInitializing state in Kubernetes

2018-08-16 Thread purna pradeep
Hello, im running Spark 2.3 job on kubernetes cluster > > kubectl version > > Client Version: version.Info{Major:"1", Minor:"9", > GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", > GitTreeState:"clean", BuildDate:"2018-02-09T21:51:06Z", > GoVersion:"go1.9.4", Compile

spark driver pod stuck in Waiting: PodInitializing state in Kubernetes

2018-08-15 Thread purna pradeep
im running Spark 2.3 job on kubernetes cluster kubectl version Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-09T21:51:06Z", GoVersion:"go1.9.4", Compiler:"gc", Platform:"da

Re: Executor lost for unknown reasons error Spark 2.3 on kubernetes

2018-07-31 Thread purna pradeep
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) at java.lang.Thread.run(Thread.java: On Tue, Jul 31, 2018 at 8:32 AM purna pradeep wrote: > > Hello, >> >> >> >> I’m getting below error in spark d

Executor lost for unknown reasons error Spark 2.3 on kubernetes

2018-07-31 Thread purna pradeep
> Hello, > > > > I’m getting below error in spark driver pod logs and executor pods are > getting killed midway through while the job is running and even driver pod > Terminated with below intermittent error ,this happens if I run multiple > jobs in parallel. > > > > Not able to see executor logs

Executor lost for unknown reasons error Spark 2.3 on kubernetes

2018-07-30 Thread purna pradeep
Hello, I’m getting below error in spark driver pod logs and executor pods are getting killed midway through while the job is running and even driver pod Terminated with below intermittent error ,this happens if I run multiple jobs in parallel. Not able to see executor logs as executor pods a

Executor lost for unknown reasons error Spark 2.3 on kubernetes

2018-07-30 Thread Mamillapalli, Purna Pradeep
Hello, I’m getting below error in spark driver pod logs and executor pods are getting killed midway through while the job is running and even driver pod Terminated with below intermittent error ,this happens if I run multiple jobs in parallel. Not able to see executor logs as executor pods are

Spark 2.3 Kubernetes error

2018-07-06 Thread purna pradeep
> Hello, > > > > When I’m trying to set below options to spark-submit command on k8s Master > getting below error in spark-driver pod logs > > > > --conf spark.executor.extraJavaOptions=" -Dhttps.proxyHost=myhost > -Dhttps.proxyPort=8099 -Dhttp.useproxy=true -Dhttps.protocols=TLSv1.2" \ > > --conf

Spark 2.3 Kubernetes error

2018-07-05 Thread purna pradeep
Hello, When I’m trying to set below options to spark-submit command on k8s Master getting below error in spark-driver pod logs --conf spark.executor.extraJavaOptions=" -Dhttps.proxyHost=myhost -Dhttps.proxyPort=8099 -Dhttp.useproxy=true -Dhttps.protocols=TLSv1.2" \ --conf spark.driver.extraJ

Spark 2.3 Kubernetes error

2018-07-05 Thread Mamillapalli, Purna Pradeep
Hello, When I’m trying to set below options to spark-submit command on k8s Master getting below error in spark-driver pod logs --conf spark.executor.extraJavaOptions=" -Dhttps.proxyHost=myhost -Dhttps.proxyPort=8099 -Dhttp.useproxy=true -Dhttps.protocols=TLSv1.2" \ --conf spark.driver.extraJ

Spark 2.3 driver pod stuck in Running state — Kubernetes

2018-06-08 Thread purna pradeep
Hello, When I run spark-submit on k8s cluster I’m Seeing driver pod stuck in Running state and when I pulled driver pod logs I’m able to see below log I do understand that this warning might be because of lack of cpu/ Memory , but I expect driver pod be in “Pending” state rather than “ Running”

spark partitionBy with partitioned column in json output

2018-06-04 Thread purna pradeep
im reading below json in spark {"bucket": "B01", "actionType": "A1", "preaction": "NULL", "postaction": "NULL"} {"bucket": "B02", "actionType": "A2", "preaction": "NULL", "postaction": "NULL"} {"bucket": "B03", "actionType": "A3", "preaction": "NULL", "postaction": "NULL"} val df=

Re: Spark 2.3 error on Kubernetes

2018-05-29 Thread purna pradeep
y exit/crashloop due to > lack of resource. > > On Tue, May 29, 2018 at 3:18 PM, purna pradeep > wrote: > >> Hello, >> >> I’m getting below error when I spark-submit a Spark 2.3 app on >> Kubernetes *v1.8.3* , some of the executor pods were killed with below

Spark 2.3 error on Kubernetes

2018-05-29 Thread purna pradeep
Hello, I’m getting below error when I spark-submit a Spark 2.3 app on Kubernetes *v1.8.3* , some of the executor pods were killed with below error as soon as they come up Exception in thread "main" java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.Use

Spark 2.3 error on kubernetes

2018-05-29 Thread Mamillapalli, Purna Pradeep
Hello, I’m getting below intermittent error when I spark-submit a Spark 2.3 app on Kubernetes v1.8.3 , some of the executor pods were killed with below error as soon as they come up Exception in thread "main" java.lang.reflect.UndeclaredThrowableException at org.apache.hadoo

Spark 2.3 error on kubernetes

2018-05-29 Thread Mamillapalli, Purna Pradeep
Hello, I’m getting below intermittent error when I spark-submit a Spark 2.3 app on Kubernetes v1.8.3 , some of the executor pods were killed with below error as soon as they come up Exception in thread "main" java.lang.reflect.UndeclaredThrowableException at org.apache.hadoo

Spark driver pod garbage collection

2018-05-23 Thread purna pradeep
Hello, Currently I observe dead pods are not getting garbage collected (aka spark driver pods which have completed execution). So pods could sit in the namespace for weeks potentially. This makes listing, parsing, and reading pods slower and well as having junk sit on the cluster. I believe minim

Spark driver pod eviction Kubernetes

2018-05-22 Thread purna pradeep
Hi, What would be the recommended approach to wait for spark driver pod to complete the currently running job before it gets evicted to new nodes while maintenance on the current node is goingon (kernel upgrade,hardware maintenance etc..) using drain command I don’t think I can use PoDisruptionBu

Oozie with spark 2.3 in Kubernetes

2018-05-11 Thread purna pradeep
Hello, Would like to know if anyone tried oozie with spark 2.3 actions on Kubernetes for scheduling spark jobs . Thanks, Purna

Re: Scala program to spark-submit on k8 cluster

2018-04-04 Thread purna pradeep
yes “REST application that submits a Spark job to a k8s cluster by running spark-submit programmatically” and also would like to expose as a Kubernetes service so that clients can access as any other Rest api On Wed, Apr 4, 2018 at 12:25 PM Yinan Li wrote: > Hi Kittu, > > What do you mean by "a

unsubscribe

2018-04-02 Thread purna pradeep
unsubscribe

unsubscribe

2018-03-28 Thread purna pradeep
- To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Unsubscribe

2018-03-28 Thread purna pradeep
- To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Rest API for Spark2.3 submit on kubernetes(version 1.8.*) cluster

2018-03-21 Thread purna pradeep
automatically submits the applications to run on a Kubernetes cluster. > > Yinan > > On Tue, Mar 20, 2018 at 7:47 PM, purna pradeep > wrote: > >> Im using kubernetes cluster on AWS to run spark jobs ,im using spark 2.3 >> ,now i want to run spark-submit from AWS lam

Rest API for Spark2.3 submit on kubernetes(version 1.8.*) cluster

2018-03-20 Thread purna pradeep
Im using kubernetes cluster on AWS to run spark jobs ,im using spark 2.3 ,now i want to run spark-submit from AWS lambda function to k8s master,would like to know if there is any REST interface to run Spark submit on k8s Master

Re: Spark 2.3 submit on Kubernetes error

2018-03-12 Thread purna pradeep
ssue > https://github.com/apache-spark-on-k8s/spark/issues/558 might help. > > > On Sun, Mar 11, 2018 at 5:01 PM, purna pradeep > wrote: > >> Getting below errors when I’m trying to run spark-submit on k8 cluster >> >> >> *Error 1*:This looks like a warning it doesn’

Spark 2.3 submit on Kubernetes error

2018-03-11 Thread purna pradeep
Getting below errors when I’m trying to run spark-submit on k8 cluster *Error 1*:This looks like a warning it doesn’t interrupt the app running inside executor pod but keeps on getting this warning *2018-03-09 11:15:21 WARN WatchConnectionManager:192 - Exec Failure* *java.io.EOFExcepti

handling Remote dependencies for spark-submit in spark 2.3 with kubernetes

2018-03-08 Thread purna pradeep
Im trying to run spark-submit to kubernetes cluster with spark 2.3 docker container image The challenge im facing is application have a mainapplication.jar and other dependency files & jars which are located in Remote location like AWS s3 ,but as per spark 2.3 documentation there is something call

Unsubscribe

2018-02-27 Thread purna pradeep
- To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Unsubscribe

2018-02-26 Thread purna pradeep
- To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Unsubscribe

2018-02-26 Thread purna pradeep
- To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Unsubscribe

2018-02-11 Thread purna pradeep
Unsubscribe

Executor not getting added SparkUI & Spark Eventlog in deploymode:cluster

2017-11-14 Thread Mamillapalli, Purna Pradeep
Hi all, Im performing spark submit using Spark rest api POST operation on 6066 port with below config > Launch Command: > "/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.141-1.b16.el7_3.x86_64/jre/bin/java" > "-cp" "/usr/local/spark/conf/:/usr/local/spark/jars/*" "-Xmx4096M" > "-Dspark.eventLog.enabled=t

Spark http: Not showing completed apps

2017-11-08 Thread purna pradeep
Hi, I'm using spark standalone in aws ec2 .And I'm using spark rest API http::8080/Json to get completed apps but the Json completed apps as empty array though the job ran successfully.

Bulk load to HBase

2017-10-22 Thread Pradeep
We are on Hortonworks 2.5 and very soon upgrading to 2.6. Spark version 1.6.2. We have large volume of data that we bulk load to HBase using import tsv. Map Reduce job is very slow and looking for options we can use spark to improve performance. Please let me know if this can be optimized with s

Re: Select entire row based on a logic applied on 2 columns across multiple rows

2017-08-30 Thread purna pradeep
with @ayan sql > > spark.sql("select *, row_number(), last_value(income) over (partition by > id order by income_age_ts desc) r from t") > > > On Tue, Aug 29, 2017 at 11:30 PM, purna pradeep > wrote: > >> @ayan, >> >> Thanks for your response >>

Re: Select entire row based on a logic applied on 2 columns across multiple rows

2017-08-29 Thread purna pradeep
4/20/17| 1| > | 4/20/12| DS|102| 13000| 5/9/17| 1| > +++---+--+-+---+ > > This should be better because it uses all in-built optimizations in Spark. > > Best > Ayan > > On Wed, Aug 30, 2017 at 11:06 AM, purna pradeep >

Re: Select entire row based on a logic applied on 2 columns across multiple rows

2017-08-29 Thread purna pradeep
Please click on unnamed text/html link for better view On Tue, Aug 29, 2017 at 8:11 PM purna pradeep wrote: > > -- Forwarded message - > From: Mamillapalli, Purna Pradeep < > purnapradeep.mamillapa...@capitalone.com> > Date: Tue, Aug 29, 2017 at 8:0

Re: use WithColumn with external function in a java jar

2017-08-29 Thread purna pradeep
se(pexpense.toDouble, > cexpense.toDouble)) > > > > > > On Tue, Aug 29, 2017 at 6:53 AM, purna pradeep > wrote: > >> I have data in a DataFrame with below columns >> >> 1)Fileformat is csv >> 2)All below column datatypes are String >> >> employeei

Select entire row based on a logic applied on 2 columns across multiple rows

2017-08-29 Thread purna pradeep
-- Forwarded message - From: Mamillapalli, Purna Pradeep Date: Tue, Aug 29, 2017 at 8:08 PM Subject: Spark question To: purna pradeep Below is the input Dataframe(In real this is a very large Dataframe) EmployeeID INCOME INCOME AGE TS JoinDate Dept 101 19000 4/20/17 4

use WithColumn with external function in a java jar

2017-08-28 Thread purna pradeep
I have data in a DataFrame with below columns 1)Fileformat is csv 2)All below column datatypes are String employeeid,pexpense,cexpense Now I need to create a new DataFrame which has new column called `expense`, which is calculated based on columns `pexpense`, `cexpense`. The tricky part is

Re: Restart streaming query spark 2.1 structured streaming

2017-08-16 Thread purna pradeep
And also is query.stop() is graceful stop operation?what happens to already received data will it be processed ? On Tue, Aug 15, 2017 at 7:21 PM purna pradeep wrote: > Ok thanks > > Few more > > 1.when I looked into the documentation it says onQueryprogress is not > thre

Re: Restart streaming query spark 2.1 structured streaming

2017-08-15 Thread purna pradeep
hronous > unpersist+persist will probably take longer as it has to reload the data. > > > On Tue, Aug 15, 2017 at 2:29 PM, purna pradeep > wrote: > >> Thanks tathagata das actually I'm planning to something like this >> >> activeQuery.stop() >> >>

Re: Restart streaming query spark 2.1 structured streaming

2017-08-15 Thread purna pradeep
t; activeQuery.stop() > activeQuery = startQuery() >} > >activeQuery.awaitTermination(100) // wait for 100 ms. >// if there is any error it will throw exception and quit the loop >// otherwise it will keep checking the condition every 100ms} > >

Re: Restart streaming query spark 2.1 structured streaming

2017-08-15 Thread purna pradeep
.apache.org/docs/latest/structured-streaming-programming-guide.html#recovering-from-failures-with-checkpointing > > Though I think that this currently doesn't work with the console sink. > > On Tue, Aug 15, 2017 at 9:40 AM, purna pradeep > wrote: > >> Hi, >>

Restart streaming query spark 2.1 structured streaming

2017-08-15 Thread purna pradeep
Hi, > > I'm trying to restart a streaming query to refresh cached data frame > > Where and how should I restart streaming query > val sparkSes = SparkSession .builder .config("spark.master", "local") .appName("StreamingCahcePoc") .getOrCreate() import sparkSes.

StreamingQueryListner spark structered Streaming

2017-08-09 Thread purna pradeep
Im working on structered streaming application wherein im reading from Kafka as stream and for each batch of streams i need to perform S3 lookup file (which is nearly 200gb) to fetch some attributes .So im using df.persist() (basically caching the lookup) but i need to refresh the dataframe as the

Spark streaming - TIBCO EMS

2017-05-15 Thread Pradeep
What is the best way to connect to TIBCO EMS using spark streaming? Do we need to write custom receivers or any libraries already exist. Thanks, Pradeep - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Long Shuffle Read Blocked Time

2017-04-20 Thread Pradeep Gollakota
Hi All, It appears that the bottleneck in my job was the EBS volumes. Very high i/o wait times across the cluster. I was only using 1 volume. Increasing to 4 made it faster. Thanks, Pradeep On Thu, Apr 20, 2017 at 3:12 PM, Pradeep Gollakota wrote: > Hi All, > > I have a simple ETL

Long Shuffle Read Blocked Time

2017-04-20 Thread Pradeep Gollakota
rom the Spark Stats, I see large values for the Shuffle Read Blocked Time. As an example, one of my tasks completed in 18 minutes, but spent 15 minutes waiting for remote reads. I'm not sure why the shuffle is so slow. Are there things I can do to increase the performance of the shuffle? Thanks, Pradeep

Re: Equally split a RDD partition into two partition at the same node

2017-01-16 Thread Pradeep Gollakota
Usually this kind of thing can be done at a lower level in the InputFormat usually by specifying the max split size. Have you looked into that possibility with your InputFormat? On Sun, Jan 15, 2017 at 9:42 PM, Fei Hu wrote: > Hi Jasbir, > > Yes, you are right. Do you have any idea about my ques

Spark subscribe

2016-12-22 Thread pradeep s
Hi , Can you please add me to spark subscription list. Regards Pradeep S

wholeTextFiles()

2016-12-12 Thread Pradeep
Hi, Why there is an restriction on max file size that can be read by wholeTextFile() method. I can read a 1.5 gigs file but get Out of memory for 2 gig file. Also, how can I raise this as an defect in spark jira. Can someone please guide. Thanks, Pradeep

MLlib to Compute boundaries of a rectangle given random points on its Surface

2016-12-06 Thread Pradeep Gaddam
Hello, Can someone please let me know if it is possible to construct a surface(for example:- Rectangle) given random points on its surface using Spark MLlib? Thanks Pradeep Gaddam This message and any attachments may contain confidential information of View, Inc. If you are not the

Re: Design patterns for Spark implementation

2016-12-04 Thread Pradeep Gaddam
I was hoping for someone to answer this question, As it resonates with many developers who are new to Spark and trying to adopt it at their work. Regards Pradeep On Dec 3, 2016, at 9:00 AM, Vasu Gourabathina mailto:vgour...@gmail.com>> wrote: Hi, I know this is a broad question. If t

Executors under utilized

2016-10-06 Thread Pradeep Gollakota
t 25k tasks still pending, running on only 32 cores (4x8) is quite a huge slowdown (full cluster is 288 cores). My input format is a simple CombineFileInputFormat. I'm not sure what would cause this behavior to occur. Any thoughts on how I can figure out what is happening? Thanks, Pradeep

Kafka message metadata with Dstreams

2016-08-25 Thread Pradeep
Hi All, I am using Dstreams to read Kafka topics. While I can read the messages fine, I also want to get metadata on the message such as offset, time it was put to topic etc.. Is there any Java Api to get this info. Thanks, Pradeep

Log rollover in spark streaming jobs

2016-08-23 Thread Pradeep
Hi All, I am running Java spark streaming jobs in yarn-client mode. Is there a way I can manage logs rollover on edge node. I have a 10 second batch and log file volume is huge. Thanks, Pradeep - To unsubscribe e-mail: user

Re: Stop Spark Streaming Jobs

2016-08-02 Thread Pradeep
Thanks Park. I am doing the same. Was trying to understand if there are other ways. Thanks, Pradeep > On Aug 2, 2016, at 10:25 PM, Park Kyeong Hee wrote: > > So sorry. Your name was Pradeep !! > > -Original Message- > From: Park Kyeong Hee [mailto:kh1979.p...@sa

Stop Spark Streaming Jobs

2016-08-02 Thread Pradeep
Hi All, My streaming job reads data from Kafka. The job is triggered and pushed to background with nohup. What are the recommended ways to stop job either on yarn-client or cluster mode. Thanks, Pradeep - To unsubscribe e

Re: spark-submit hangs forever after all tasks finish(spark 2.0.0 stable version on yarn)

2016-07-31 Thread Pradeep
Hi, Are you running on yarn-client or cluster mode? Pradeep > On Jul 30, 2016, at 7:34 PM, taozhuo wrote: > > below is the error messages that seem run infinitely: > > > 16/07/30 23:25:38 DEBUG ProtobufRpcEngine: Call: getApplicationReport took > 1ms > 16/07/30 23

Re: Spark Website

2016-07-13 Thread Pradeep Gollakota
Worked for me if I go to https://spark.apache.org/site/ but not https://spark.apache.org On Wed, Jul 13, 2016 at 11:48 AM, Maurin Lenglart wrote: > Same here > > > > *From: *Benjamin Kim > *Date: *Wednesday, July 13, 2016 at 11:47 AM > *To: *manish ranjan > *Cc: *user > *Subject: *Re: Spark W

Re: Spark-submit hangs indefinitely after job completion.

2016-05-24 Thread Pradeep Nayak
t; On Tue, May 24, 2016 at 3:08 PM Pradeep Nayak > wrote: > >> >> >> I have posted the same question of Stack Overflow: >> http://stackoverflow.com/questions/37421852/spark-submit-continues-to-hang-after-job-completion >> >> I am trying to test spark 1.6 w

Spark-submit hangs indefinitely after job completion.

2016-05-24 Thread Pradeep Nayak
I have posted the same question of Stack Overflow: http://stackoverflow.com/questions/37421852/spark-submit-continues-to-hang-after-job-completion I am trying to test spark 1.6 with hdfs in AWS. I am using the wordcount python example available in the examples folder. I submit the job with spark-s

Is this possible to do in spark ?

2016-05-11 Thread Pradeep Nayak
Hi - I have a very unique problem which I am trying to solve and I am not sure if spark would help here. I have a directory: /X/Y/a.txt and in the same structure /X/Y/Z/b.txt. a.txt contains a unique serial number, say: 12345 and b.txt contains key value pairs. a,1 b,1, c,0 etc. Everyday you r

how does sc.textFile translate regex in the input.

2016-04-13 Thread Pradeep Nayak
les. ? Is there any documentation on how this happens ? Pradeep

Re: Why there's no api for SparkContext#textFiles to support multiple inputs ?

2015-11-11 Thread Pradeep Gollakota
Looks like what I was suggesting doesn't work. :/ On Wed, Nov 11, 2015 at 4:49 PM, Jeff Zhang wrote: > Yes, that's what I suggest. TextInputFormat support multiple inputs. So in > spark side, we just need to provide API to for that. > > On Thu, Nov 12, 2015 at 8:45

Re: Why there's no api for SparkContext#textFiles to support multiple inputs ?

2015-11-11 Thread Pradeep Gollakota
IIRC, TextInputFormat supports an input path that is a comma separated list. I haven't tried this, but I think you should just be able to do sc.textFile("file1,file2,...") On Wed, Nov 11, 2015 at 4:30 PM, Jeff Zhang wrote: > I know these workaround, but wouldn't it be more convenient and > strai

Re: [SparkR] Missing Spark APIs in R

2015-06-30 Thread Pradeep Bashyal
Thanks Shivaram. I watched your talk and the plan to use ML APIs with R flavor looks exciting. Is there a different venue where I would be able to follow the SparkR API progress? Thanks Pradeep On Mon, Jun 29, 2015 at 1:12 PM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: >

[SparkR] Missing Spark APIs in R

2015-06-29 Thread Pradeep Bashyal
d any explanations of why they were not included with the release. Can anyone shed some light on it? Thanks Pradeep

Re: ClassCastException when calling updateStateKey

2015-04-10 Thread Pradeep Rai
-list.1001551.n3.nabble.com/guava-version-conflicts-td8480.html It looks like my issue is the same cause, but different symptoms. Thanks, Pradeep. On Fri, Apr 10, 2015 at 12:51 PM, Marcelo Vanzin wrote: > On Fri, Apr 10, 2015 at 10:11 AM, Pradeep Rai wrote: > > I tried the userClasspat

How to set hadoop native library path in spark-1.1

2014-10-21 Thread Pradeep Ch
y for your platform... using builtin-java classes where applicable Thanks, Pradeep

Re: Multi master Spark

2014-04-09 Thread Pradeep Ch
s, only master slaves are being > spun by mesos slaves directly. > > > > > > On Wed, Apr 9, 2014 at 3:08 PM, Pradeep Ch wrote: > >> Hi, >> >> I want to enable Spark Master HA in spark. Documentation specifies that >> we can do this with the help of Zoo

Multi master Spark

2014-04-09 Thread Pradeep Ch
information? Thanks for the help. Thanks, Pradeep

Re: Spark packaging

2014-04-09 Thread Pradeep baji
Thanks Prabeesh. On Wed, Apr 9, 2014 at 12:37 AM, prabeesh k wrote: > Please refer > > http://prabstechblog.blogspot.in/2014/04/creating-single-jar-for-spark-project.html > > Regards, > prabeesh > > > On Wed, Apr 9, 2014 at 1:04 PM, Pradeep baji > wrote: > &

Spark packaging

2014-04-09 Thread Pradeep baji
Hi all, I am new to spark and trying to learn it. Is there any document which describes how spark is packaged. ( like dependencies needed to build spark, which jar contains what after build etc) Thanks for the help. Regards, Pradeep