from:"Ashwin"

FW: Pyspark: set Orc Stripe.size on dataframe writer issue

2018-10-17 Thread Somasundara, Ashwin

Hello Group I am having issues setting the stripe size, index stride and index on an orc file using PySpark. I am getting approx 2000 stripes for the 1.2GB file when I am expecting only 5 stripes for the 256MB setting. Tried the below options 1. Set the .options on data frame writer. The comp

Spark dataset to byte array over grpc

2018-04-23 Thread Ashwin Sai Shankar

Also is there a better way to send this output to client? Thanks, Ashwin

Re: Why python cluster mode is not supported in standalone cluster?

2018-02-14 Thread Ashwin Sai Shankar

+dev mailing list(since i didn't get a response from user DL) On Tue, Feb 13, 2018 at 12:20 PM, Ashwin Sai Shankar wrote: > Hi Spark users! > I noticed that spark doesn't allow python apps to run in cluster mode in > spark standalone cluster. Does anyone know the reason?

Why python cluster mode is not supported in standalone cluster?

2018-02-13 Thread Ashwin Sai Shankar

Hi Spark users! I noticed that spark doesn't allow python apps to run in cluster mode in spark standalone cluster. Does anyone know the reason? I checked jira but couldn't find anything relevant. Thanks, Ashwin

Recompute Spark outputs intelligently

2017-12-15 Thread Ashwin Raju

out which columns need to be recomputed and which can be left as is. Is there a best practice in the Spark ecosystem for this problem? Perhaps some metadata system/data lineage system we can use? I'm curious if this is a common problem that has already been addressed. Thanks, Ashwin

Re: Spark 2.2 streaming with append mode: empty output

2017-08-15 Thread Ashwin Raju

x27;: {u'description': u'org.apache.spark.sql.execution.streaming.ConsoleSink@7e4050cd'}} On Mon, Aug 14, 2017 at 4:55 PM, Tathagata Das wrote: > In append mode, the aggregation outputs a row only when the watermark has > been crossed and the corresponding aggregate is

Spark 2.2 streaming with append mode: empty output

2017-08-14 Thread Ashwin Raju

the same query with outputMode("append") however, the output only has the column names, no rows. I was originally trying to output to parquet, which only supports append mode. I was seeing no data in my parquet files, so I switched to console output to debug, then noticed this issue. Am I misunderstanding something about how append mode works? Thanks, Ashwin

Reusing dataframes for streaming (spark 1.6)

2017-08-08 Thread Ashwin Raju

taframe what i would like to do instead: def process(time, rdd): # create dataframe from RDD - input_df # output_df = dataframe_pipeline_fn(input_df) -ashwin

Re: Spark shuffle files

2017-03-27 Thread Ashwin Sai Shankar

rg/apache/spark/ContextCleaner.scala > > On Mon, Mar 27, 2017 at 12:38 PM, Ashwin Sai Shankar < > ashan...@netflix.com.invalid> wrote: > >> Hi! >> >> In spark on yarn, when are shuffle files on local disk removed? (Is it >> when the app completes or >> o

Spark shuffle files

2017-03-27 Thread Ashwin Sai Shankar

Hi! In spark on yarn, when are shuffle files on local disk removed? (Is it when the app completes or once all the shuffle files are fetched or end of the stage?) Thanks, Ashwin

Re: Limiting Pyspark.daemons

2016-07-04 Thread Ashwin Raaghav

Thanks. I'll try that. Hopefully that should work. On Mon, Jul 4, 2016 at 9:12 PM, Mathieu Longtin wrote: > I started with a download of 1.6.0. These days, we use a self compiled > 1.6.2. > > On Mon, Jul 4, 2016 at 11:39 AM Ashwin Raaghav > wrote: > >> I am thinki

Re: Limiting Pyspark.daemons

2016-07-04 Thread Ashwin Raaghav

Longtin wrote: > 1.6.1. > > I have no idea. SPARK_WORKER_CORES should do the same. > > On Mon, Jul 4, 2016 at 11:24 AM Ashwin Raaghav > wrote: > >> Which version of Spark are you using? 1.6.1? >> >> Any ideas as to why it is not working in ours? >> >&

Re: Limiting Pyspark.daemons

2016-07-04 Thread Ashwin Raaghav

Which version of Spark are you using? 1.6.1? Any ideas as to why it is not working in ours? On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin wrote: > 16. > > On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav > wrote: > >> Hi, >> >> I tried what you suggeste

Re: Limiting Pyspark.daemons

2016-07-04 Thread Ashwin Raaghav

e per server. However, it seems it will > start as many pyspark as there are cores, but maybe not use them. > > On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav > wrote: > >> Hi Mathieu, >> >> Isn't that the same as setting "spark.executor.cores" to 1? An

Re: Limiting Pyspark.daemons

2016-07-04 Thread Ashwin Raaghav

aemons process is still not coming down. It looks like initially >> there is one Pyspark.daemons process and this in turn spawns as many >> pyspark.daemons processes as the number of cores in the machine. >> >> Any help is appreciated :) >> >> Thanks, >> Ashwin Raagha

Re: Adding h5 files in a zip to use with PySpark

2016-06-15 Thread Ashwin Raaghav

> > - > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > > > > -- Regards, Ashwin Raaghav

Re: Question about MEOMORY_AND_DISK persistence

2016-02-28 Thread Ashwin Giridharan

Hi Vishnu, A partition will either be in memory or in disk. -Ashwin On Feb 28, 2016 15:09, "Vishnu Viswanath" wrote: > Hi All, > > I have a question regarding Persistence (MEMORY_AND_DISK) > > Suppose I am trying to persist an RDD which has 2 partitions and only 1

Spark streaming: Consistency of multiple streams in Spark

2015-12-17 Thread Ashwin

could synchronize these multiple streams. What am I missing? Thanks, Ashwin [1] http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-12.pdf - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e

Re: Hive error after update from 1.4.1 to 1.5.2

2015-12-16 Thread Ashwin Sai Shankar

Hi Bryan, I see the same issue with 1.5.2, can you pls let me know what was the resolution? Thanks, Ashwin On Fri, Nov 20, 2015 at 12:07 PM, Bryan Jeffrey wrote: > Nevermind. I had a library dependency that still had the old Spark version. > > On Fri, Nov 20, 2015 at 2:14 PM, Brya

Re: Spark on YARN multitenancy

2015-12-15 Thread Ashwin Sai Shankar

We run large multi-tenant clusters with spark/hadoop workloads, and we use 'yarn's preemption'/'spark's dynamic allocation' to achieve multitenancy. See following link on how to enable/configure preemption using fair scheduler : http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/Fai

Re: How to display column names in spark-sql output

2015-12-11 Thread Ashwin Sai Shankar

Never mind, its *set hive.cli.print.header=true* Thanks ! On Fri, Dec 11, 2015 at 5:16 PM, Ashwin Shankar wrote: > Hi, > When we run spark-sql, is there a way to get column names/headers with the > result? > > -- > Thanks, > Ashwin > > >

How to display column names in spark-sql output

2015-12-11 Thread Ashwin Shankar

Hi, When we run spark-sql, is there a way to get column names/headers with the result? -- Thanks, Ashwin

Re: Has anybody ever tried running Spark Streaming on 500 text streams?

2015-07-31 Thread Ashwin Giridharan

creating 500 Dstreams based off 500 textfile > directories, do we need at least 500 executors / nodes to be receivers for > each one of the streams? > > On Tue, Jul 28, 2015 at 6:09 PM, Tathagata Das > wrote: > >> @Ashwin: You could append the topic in the data. >>

Re: What happens when you create more DStreams then nodes in the cluster?

2015-07-31 Thread Ashwin Giridharan

t; Thanks, Ashwin On Fri, Jul 31, 2015 at 4:52 PM, Brandon White wrote: > Since one input dstream creates one receiver and one receiver uses one > executor / node. > > What happens if you create more Dstreams than nodes in the cluster? > > Say I have 30 Dstreams on a 15 node clust

Re: How to control Spark Executors from getting Lost when using YARN client mode?

2015-07-30 Thread Ashwin Giridharan

an optimal configuration would be, --num-executors 8 --executor-cores 2 --executor-memory 2G Thanks, Ashwin On Thu, Jul 30, 2015 at 12:08 PM, unk1102 wrote: > Hi I have one Spark job which runs fine locally with less data but when I > schedule it on YARN to execute I keep on getti

Re: Has anybody ever tried running Spark Streaming on 500 text streams?

2015-07-28 Thread Ashwin Giridharan

D { rdd => >> //do something >> } >> } >> >> ssc.start() >> >> Would something like this scale? What would be the limiting factor to >> performance? What is the best way to parallelize this? Any other ideas on >> design? >> > > -- Thanks & Regards, Ashwin Giridharan

Re: Long running streaming application - worker death

2015-07-26 Thread Ashwin Giridharan

owse/SPARK-1340"; corresponding to this bug is yet to be resolved. Also have a look at http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-and-the-spark-shell-td3347.html Thanks, Ashwin On Sun, Jul 26, 2015 at 9:29 AM, aviemzur wrote: > Hi all, > > I have a question

Re: Problem with pyspark on Docker talking to YARN cluster

2015-06-10 Thread Ashwin Shankar

3. use yarn-cluster mode Pyspark interactive shell(ipython) doesn't have cluster mode. SPARK-5162 <https://issues.apache.org/jira/browse/SPARK-5162> is for spark-submit python in cluster mode. Thanks, Ashwin On Wed, Jun 10, 2015 at 3:55 PM, Eron Wright wrote: > Options i

Problem with pyspark on Docker talking to YARN cluster

2015-06-10 Thread Ashwin Shankar

rt to hostmachine's ip/port. So the AM can then talk hostmachine's ip/port, which would be mapped to the container. Thoughts ? -- Thanks, Ashwin

How to pass system properties in spark ?

2015-06-03 Thread Ashwin Shankar

appening ? *When I enable log4j debug I see that following :* log4j: Setting property [file] to []. log4j: setFile called: , true log4j:ERROR setFile(null,true) call failed. java.io.FileNotFoundException: (No such file or directory) at java.io.FileOutputStream.open(Native Method) -- Thanks, Ashwin

Spark on Yarn : Map outputs lifetime ?

2015-05-12 Thread Ashwin Shankar

Hi, In spark on yarn and when running spark_shuffle as auxiliary service on node manager, does map spills of a stage gets cleaned up once the next stage completes OR is it preserved till the app completes(ie waits for all the stages to complete) ? -- Thanks, Ashwin

Re: Building spark targz

2014-11-12 Thread Ashwin Shankar

e but are you looking for the tar in assembly/target dir ? > > On Wed, Nov 12, 2014 at 3:14 PM, Ashwin Shankar > wrote: > >> Hi, >> I just cloned spark from the github and I'm trying to build to generate a >> tar ball. >> I'm doing : mvn -Pyarn -Pha

Building spark targz

2014-11-12 Thread Ashwin Shankar

d ? -- Thanks, Ashwin

Re: Multitenancy in Spark - within/across spark context

2014-10-22 Thread Ashwin Shankar

x27;s executors got preempted say while doing reduceByKey, will the application progress with the remaining resources/fair share ? I'm new to spark, sry if I'm asking something very obvious :). Thanks, Ashwin On Wed, Oct 22, 2014 at 12:07 PM, Marcelo Vanzin wrote: > Hi Ashwin, > > L

Multitenancy in Spark - within/across spark context

2014-10-22 Thread Ashwin Shankar

e about user/job isolation ? I know I'm asking a lot of questions. Thanks in advance :) ! -- Thanks, Ashwin Netflix

FW: Pyspark: set Orc Stripe.size on dataframe writer issue

Spark dataset to byte array over grpc

Re: Why python cluster mode is not supported in standalone cluster?

Why python cluster mode is not supported in standalone cluster?

Recompute Spark outputs intelligently

Re: Spark 2.2 streaming with append mode: empty output

Spark 2.2 streaming with append mode: empty output

Reusing dataframes for streaming (spark 1.6)

Re: Spark shuffle files

Spark shuffle files

Re: Limiting Pyspark.daemons

Re: Limiting Pyspark.daemons

Re: Limiting Pyspark.daemons

Re: Limiting Pyspark.daemons

Re: Limiting Pyspark.daemons

Re: Adding h5 files in a zip to use with PySpark

Re: Question about MEOMORY_AND_DISK persistence

Spark streaming: Consistency of multiple streams in Spark

Re: Hive error after update from 1.4.1 to 1.5.2

Re: Spark on YARN multitenancy

Re: How to display column names in spark-sql output

How to display column names in spark-sql output

Re: Has anybody ever tried running Spark Streaming on 500 text streams?

Re: What happens when you create more DStreams then nodes in the cluster?

Re: How to control Spark Executors from getting Lost when using YARN client mode?

Re: Has anybody ever tried running Spark Streaming on 500 text streams?

Re: Long running streaming application - worker death

Re: Problem with pyspark on Docker talking to YARN cluster

Problem with pyspark on Docker talking to YARN cluster

How to pass system properties in spark ?

Spark on Yarn : Map outputs lifetime ?

Re: Building spark targz

Building spark targz

Re: Multitenancy in Spark - within/across spark context

Multitenancy in Spark - within/across spark context

35 matches

Site Navigation

Mail list logo

Footer information