Hi All,
Just floating this email again. Grateful for any suggestions.
Akshay Bhardwaj
+91-97111-33849
On Mon, May 20, 2019 at 12:25 AM Akshay Bhardwaj <
akshay.bhardwaj1...@gmail.com> wrote:
> Hi All,
>
> I am running Spark 2.3 on YARN using HDP 2.6
>
> I am running spark job using dynamic res
after blowing away my m2 repo cache; i was able to build just fine... i
dont know why; but now it works :-)
On Sun, May 19, 2019 at 10:22 PM Bulldog20630405
wrote:
> i am trying to build spark 2.4.3 with the following env:
>
>- fedora 29
>- 1.8.0_202
>- spark 2.4.3
>- scala 2.11.
While spark can read from S3 directly in EMR, I believe it still needs the
HDFS to perform shuffles and to write intermediate data into disk when
doing jobs (I.e. when the in memory need stop spill over to disk)
For these operations, Spark does need a distributed file system - You could
use someth
I am afraid not, because yarn needs dfs.
Huizhe Wang 于2019年5月20日周一 上午9:50写道:
> Hi,
>
> I wanna to use Spark on Yarn without HDFS.I store my resource in AWS and
> using s3a to get them. However, when I use stop-dfs.sh stoped Namenode and
> DataNode. I got an error when using yarn cluster mode. Co
i am trying to build spark 2.4.3 with the following env:
- fedora 29
- 1.8.0_202
- spark 2.4.3
- scala 2.11.12
- maven 3.5.4
- hadoop 2.6.5
according to the documentation this can be done with the following commands:
*export TERM=xterm-color*
*./build/mvn -Pyarn -DskipTests clea
Hi,
I wanna to use Spark on Yarn without HDFS.I store my resource in AWS and
using s3a to get them. However, when I use stop-dfs.sh stoped Namenode and
DataNode. I got an error when using yarn cluster mode. Could I using yarn
without start DFS, how could I use this mode?
Yours,
Jane
I'm trying to re-read however I'm getting cached data (which is a bit
confusing). For re-read I'm issuing:
spark.read.format("delta").load("/data").groupBy(col("event_hour")).count
The cache seems to be global influencing also new dataframes.
So the question is how should I re-read without loosin
Hi All,
I am running Spark 2.3 on YARN using HDP 2.6
I am running spark job using dynamic resource allocation on YARN with
minimum 2 executors and maximum 6. My job read data from parquet files
which are present on S3 buckets and store some enriched data to cassandra.
My question is, how does YA