Hello Group
I am having issues setting the stripe size, index stride and index on an orc
file using PySpark. I am getting approx 2000 stripes for the 1.2GB file when I
am expecting only 5 stripes for the 256MB setting.
Tried the below options
1. Set the .options on data frame writer. The comp
Also is there a better way to send this output to client?
Thanks,
Ashwin
+dev mailing list(since i didn't get a response from user DL)
On Tue, Feb 13, 2018 at 12:20 PM, Ashwin Sai Shankar
wrote:
> Hi Spark users!
> I noticed that spark doesn't allow python apps to run in cluster mode in
> spark standalone cluster. Does anyone know the reason?
Hi Spark users!
I noticed that spark doesn't allow python apps to run in cluster mode in
spark standalone cluster. Does anyone know the reason? I checked jira but
couldn't find anything relevant.
Thanks,
Ashwin
out which columns
need to be recomputed and which can be left as is.
Is there a best practice in the Spark ecosystem for this problem? Perhaps
some metadata system/data lineage system we can use? I'm curious if this is
a common problem that has already been addressed.
Thanks,
Ashwin
x27;:
{u'description':
u'org.apache.spark.sql.execution.streaming.ConsoleSink@7e4050cd'}}
On Mon, Aug 14, 2017 at 4:55 PM, Tathagata Das
wrote:
> In append mode, the aggregation outputs a row only when the watermark has
> been crossed and the corresponding aggregate is
the same
query with outputMode("append") however, the output only has the column
names, no rows. I was originally trying to output to parquet, which only
supports append mode. I was seeing no data in my parquet files, so I
switched to console output to debug, then noticed this issue. Am I
misunderstanding something about how append mode works?
Thanks,
Ashwin
taframe
what i would like to do instead:
def process(time, rdd):
# create dataframe from RDD - input_df
# output_df = dataframe_pipeline_fn(input_df)
-ashwin
rg/apache/spark/ContextCleaner.scala
>
> On Mon, Mar 27, 2017 at 12:38 PM, Ashwin Sai Shankar <
> ashan...@netflix.com.invalid> wrote:
>
>> Hi!
>>
>> In spark on yarn, when are shuffle files on local disk removed? (Is it
>> when the app completes or
>> o
Hi!
In spark on yarn, when are shuffle files on local disk removed? (Is it when
the app completes or
once all the shuffle files are fetched or end of the stage?)
Thanks,
Ashwin
Thanks. I'll try that. Hopefully that should work.
On Mon, Jul 4, 2016 at 9:12 PM, Mathieu Longtin
wrote:
> I started with a download of 1.6.0. These days, we use a self compiled
> 1.6.2.
>
> On Mon, Jul 4, 2016 at 11:39 AM Ashwin Raaghav
> wrote:
>
>> I am thinki
Longtin
wrote:
> 1.6.1.
>
> I have no idea. SPARK_WORKER_CORES should do the same.
>
> On Mon, Jul 4, 2016 at 11:24 AM Ashwin Raaghav
> wrote:
>
>> Which version of Spark are you using? 1.6.1?
>>
>> Any ideas as to why it is not working in ours?
>>
>&
Which version of Spark are you using? 1.6.1?
Any ideas as to why it is not working in ours?
On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin
wrote:
> 16.
>
> On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav
> wrote:
>
>> Hi,
>>
>> I tried what you suggeste
e per server. However, it seems it will
> start as many pyspark as there are cores, but maybe not use them.
>
> On Mon, Jul 4, 2016 at 10:44 AM Ashwin Raaghav
> wrote:
>
>> Hi Mathieu,
>>
>> Isn't that the same as setting "spark.executor.cores" to 1? An
aemons process is still not coming down. It looks like initially
>> there is one Pyspark.daemons process and this in turn spawns as many
>> pyspark.daemons processes as the number of cores in the machine.
>>
>> Any help is appreciated :)
>>
>> Thanks,
>> Ashwin Raagha
> > -
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>
>
>
--
Regards,
Ashwin Raaghav
Hi Vishnu,
A partition will either be in memory or in disk.
-Ashwin
On Feb 28, 2016 15:09, "Vishnu Viswanath"
wrote:
> Hi All,
>
> I have a question regarding Persistence (MEMORY_AND_DISK)
>
> Suppose I am trying to persist an RDD which has 2 partitions and only 1
could synchronize these multiple streams.
What am I missing?
Thanks,
Ashwin
[1] http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-12.pdf
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e
Hi Bryan,
I see the same issue with 1.5.2, can you pls let me know what was the
resolution?
Thanks,
Ashwin
On Fri, Nov 20, 2015 at 12:07 PM, Bryan Jeffrey
wrote:
> Nevermind. I had a library dependency that still had the old Spark version.
>
> On Fri, Nov 20, 2015 at 2:14 PM, Brya
We run large multi-tenant clusters with spark/hadoop workloads, and we use
'yarn's preemption'/'spark's dynamic allocation' to achieve multitenancy.
See following link on how to enable/configure preemption using fair
scheduler :
http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/Fai
Never mind, its *set hive.cli.print.header=true*
Thanks !
On Fri, Dec 11, 2015 at 5:16 PM, Ashwin Shankar
wrote:
> Hi,
> When we run spark-sql, is there a way to get column names/headers with the
> result?
>
> --
> Thanks,
> Ashwin
>
>
>
Hi,
When we run spark-sql, is there a way to get column names/headers with the
result?
--
Thanks,
Ashwin
creating 500 Dstreams based off 500 textfile
> directories, do we need at least 500 executors / nodes to be receivers for
> each one of the streams?
>
> On Tue, Jul 28, 2015 at 6:09 PM, Tathagata Das
> wrote:
>
>> @Ashwin: You could append the topic in the data.
>>
t;
Thanks,
Ashwin
On Fri, Jul 31, 2015 at 4:52 PM, Brandon White
wrote:
> Since one input dstream creates one receiver and one receiver uses one
> executor / node.
>
> What happens if you create more Dstreams than nodes in the cluster?
>
> Say I have 30 Dstreams on a 15 node clust
an optimal configuration would be,
--num-executors 8 --executor-cores 2 --executor-memory 2G
Thanks,
Ashwin
On Thu, Jul 30, 2015 at 12:08 PM, unk1102 wrote:
> Hi I have one Spark job which runs fine locally with less data but when I
> schedule it on YARN to execute I keep on getti
D { rdd =>
>> //do something
>> }
>> }
>>
>> ssc.start()
>>
>> Would something like this scale? What would be the limiting factor to
>> performance? What is the best way to parallelize this? Any other ideas on
>> design?
>>
>
>
--
Thanks & Regards,
Ashwin Giridharan
owse/SPARK-1340";
corresponding to this bug is yet to be resolved.
Also have a look at
http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-and-the-spark-shell-td3347.html
Thanks,
Ashwin
On Sun, Jul 26, 2015 at 9:29 AM, aviemzur wrote:
> Hi all,
>
> I have a question
3. use yarn-cluster mode
Pyspark interactive shell(ipython) doesn't have cluster mode. SPARK-5162
<https://issues.apache.org/jira/browse/SPARK-5162> is for spark-submit
python in cluster mode.
Thanks,
Ashwin
On Wed, Jun 10, 2015 at 3:55 PM, Eron Wright wrote:
> Options i
rt to hostmachine's ip/port. So the AM can then talk
hostmachine's ip/port, which would be mapped
to the container.
Thoughts ?
--
Thanks,
Ashwin
appening ?
*When I enable log4j debug I see that following :*
log4j: Setting property [file] to [].
log4j: setFile called: , true
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: (No such file or directory)
at java.io.FileOutputStream.open(Native Method)
--
Thanks,
Ashwin
Hi,
In spark on yarn and when running spark_shuffle as auxiliary service on
node manager, does map spills of a stage gets cleaned up once the next
stage completes OR
is it preserved till the app completes(ie waits for all the stages to
complete) ?
--
Thanks,
Ashwin
e but are you looking for the tar in assembly/target dir ?
>
> On Wed, Nov 12, 2014 at 3:14 PM, Ashwin Shankar > wrote:
>
>> Hi,
>> I just cloned spark from the github and I'm trying to build to generate a
>> tar ball.
>> I'm doing : mvn -Pyarn -Pha
d ?
--
Thanks,
Ashwin
x27;s executors got
preempted say while doing reduceByKey, will the application progress with
the remaining resources/fair share ?
I'm new to spark, sry if I'm asking something very obvious :).
Thanks,
Ashwin
On Wed, Oct 22, 2014 at 12:07 PM, Marcelo Vanzin
wrote:
> Hi Ashwin,
>
> L
e about user/job isolation ?
I know I'm asking a lot of questions. Thanks in advance :) !
--
Thanks,
Ashwin
Netflix
35 matches
Mail list logo