This is the command I am running:
spark-submit --deploy-mode cluster --master yarn --class com.myorg.myApp
s3://my-bucket/myapp-0.1.jar
On Wed, Mar 1, 2017 at 12:22 AM, Jonathan Kelly
wrote:
> Prithish,
>
> It would be helpful for you to share the spark-submit command you are
Thanks for your response Jonathan. Yes, this works. I also added another
way of achieving this to the Stackoverflow post. Thanks for the help.
On Tue, Feb 28, 2017 at 11:58 PM, Jonathan Kelly
wrote:
> Prithish,
>
> I saw you posted this on SO, so I responded there just now. S
ebugging.properties (maybe also try
> without the "/")
>
>
> On 26 Feb 2017, at 16:31, Prithish wrote:
>
> Hoping someone can answer this.
>
> I am unable to override and use a Custom log4j.properties on Amazon EMR. I
> am running Spark on EMR (Yarn) and have t
Hoping someone can answer this.
I am unable to override and use a Custom log4j.properties on Amazon EMR. I
am running Spark on EMR (Yarn) and have tried all the below combinations in
the Spark-Submit to try and use the custom log4j.
In Client mode
--driver-java-options
"-Dlog4j.configuration=hdfs
which are local, standalone, yarn
> and Mesos. Also, "blocks" is relative to hdfs, "partitions"
> is relative to spark.
>
> liangyihuai
>
> ---Original---
> *From:* "Jacek Laskowski "
> *Date:* 2017/2/25 02:45:20
> *To:* "prithish";
> *
Hello,
Had a question. When I look at the executors tab in Spark UI, I notice that
some RDD blocks are assigned to the driver as well. Can someone please tell me
why?
Thanks for the help.
reted by spark?
> A compression logic of the spark caching depends on column types.
>
> // maropu
>
>
> On Wed, Nov 16, 2016 at 5:26 PM, Prithish wrote:
>
>> Thanks for your response.
>>
>> I did some more tests and I am seeing that when I have a flatter
>
size
> would depend on the type of data you have and how well it was compressable.
>
>
>
> The purpose of these formats is to store data to persistent storage in a
> way that's faster to read from, not to reduce cache-memory usage.
>
>
>
> Maybe others here have more i
Anyone?
On Tue, Nov 15, 2016 at 10:45 AM, Prithish wrote:
> I am using 2.0.1 and databricks avro library 3.0.1. I am running this on
> the latest AWS EMR release.
>
> On Mon, Nov 14, 2016 at 3:06 PM, Jörn Franke wrote:
>
>> spark version? Are you using tungsten?
>>
I am using 2.0.1 and databricks avro library 3.0.1. I am running this on
the latest AWS EMR release.
On Mon, Nov 14, 2016 at 3:06 PM, Jörn Franke wrote:
> spark version? Are you using tungsten?
>
> > On 14 Nov 2016, at 10:05, Prithish wrote:
> >
> > Can someone please
Can someone please explain why this happens?
When I read a 600kb AVRO file and cache this in memory (using cacheTable),
it shows up as 11mb (storage tab in Spark UI). I have tried this with
different file sizes, and the size in-memory is always proportionate. I
thought Spark compresses when using
> How big are your avro files?We collapse many small files into a single
> partition to eliminate scheduler overhead.If you need explicit
> parallelism you can also repartition.
>
>
>
> On Thu, Oct 27, 2016 at 5:19 AM, Prithish (mailto:prith...@gmail.com)> wrote:
I am trying to read a bunch of AVRO files from a S3 folder using Spark 2.0.
No matter how many executors I use or what configuration changes I make,
the cluster doesn't seem to use all the executors. I am using the
com.databricks.spark.avro library from databricks to read the AVRO.
However, if I t
Hello,
I am trying to understand how in-memory size is changing in these
situations. Specifically, why is in-memory size much higher for avro and
parquet? Are there any optimizations necessary to reduce this?
Used cacheTable on each of these:
AVRO File (600kb) - In-memory size was 12mb
Parquet F
14 matches
Mail list logo