Hi,
I am searching a useful API for getting a data URL that is accessed by a
application on Spark.
For example, when this URL is in a application
new
URL("https://s3.amazonaws.com/apache-zeppelin/tutorial/bank/bank.csv";)
How to get this url from using Spark API?
I looked in org.apach.api
Hi Supun,
Did you look at https://spark.apache.org/docs/latest/tuning.html?
In addition to the info there, if you're partitioning by some key where
you've got a lot of data skew, one of the task's memory requirements may be
larger than the RAM of a given executor, where the rest of the tasks ma
Sorry for interrupting, I have a quick question regarding the retry mechanism
on failed tasks. I like to know whether there is a way to specify the interval
between task retry attempts. I have set the spark.task.maxFailures to a
relatively large number, but due to the unstable network condition
As it says in SPARK-10320 and in the docs at
http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#consumerstrategies
, you can use SubscribePattern
On Sun, Oct 29, 2017 at 3:56 PM, Ramanan, Buvana (Nokia - US/Murray
Hill) wrote:
> Hello Cody,
>
>
>
> As the stake holders of J
Hello Asmath,
We had a similar challenge recently.
When you write back to hive, you are creating files on HDFS, and it depends on
your batch window.
If you increase your batch window lets say from 1 min to 5 mins you will end up
creating 5x times less.
The other factor is your partitioning. F
Hi,
I am using spark streaming to write data back into hive with the below code
snippet
eventHubsWindowedStream.map(x => EventContent(new String(x)))
.foreachRDD(rdd => {
val sparkSession = SparkSession
.builder.enableHiveSupport.getOrCreate
import sparkSession.implicits