Can anyone please help me with this issue?
On Fri, 3 Aug 2018 at 11:27, Shuporno Choudhury <
shuporno.choudh...@gmail.com> wrote:
> Can anyone please help me with this issue?
>
> On Wed, 1 Aug 2018 at 12:50, Shuporno Choudhury [via Apache Spark User
> List] wrote:
>
>> Hi everyone,
>> I am runn
Hi there,
You may want to look at setting the memory overhead settings higher. Spark will
then start containers with a higher memory limit (spark.executor.memory +
spark.executor.memoryOverhead, to be exact) while the heap is still locked to
spark.executor.memory. There’s some memory used by
I am trying to insert overwrite multiple partitions into existing
partitioned hive/parquet table. Table was created using sparkSession.
I have a table 'mytable' with partitions P1 and P2.
I have following set on sparkSession object:
.config("hive.exec.dynamic.partition", true)
.config("
thanks for the clarification.
the processing time on both systems seems to be fine :(a) based on the
pattern of batch processing time chart, i.e the batch processing time is
not becoming longer and longer (see charts attached below); (b) the input
data on each spark stage of every batch remains t
Yes, I am loading a text file from my local machine into a kafka topic using
the script below and I'd like to calculate the number of samples per second
which is used by kafka consumer.
if __name__ == "__main__":
print("hello spark")
sc = SparkContext(appName="STALTA")
ssc = Streaming
What is differrent between the 2 systems? If one system processes records
faster than the other, simply because it does less processing, then you can
expect the first system to have a higher throughput than the second. It's
hard to say why one system has double the throughput of another without
kno
Hello,
I just had a question. Could you refer me to a link or tell me how you
calculated these logs such as: *300K msg/sec to a kafka broker, 220bytes per
message * I'm load a text file with 36000 records into a kafka topic and
I'd like to calculate the data rate (#samples per sec) in kafka.
Tha
We are running Spark 2.3 on a Kubernetes cluster. We have set the following
spark configuration options
"spark.executor.memory": "7g",
"spark.driver.memory": "2g",
"spark.memory.fraction": "0.75"
WHat we see is
a) In the SPark UI, 5G has been allocated to each executor, which makes
sense
I tried following to explicitly specify partition columns in sql statement
and also tried different cases (upper and lower) fro partition columns.
insert overwrite table $tableName PARTITION(P1, P2) select A, B, C, P1, P2
from updateTable.
Still getting:
Caused by:
org.apache.hadoop.hive.ql.meta
Hello there,
I'm new to spark streaming and have trouble to understand spark batch
"composition" (google search keeps give me an older spark streaming
concept). Would appreciate any help and clarifications.
I'm using spark 2.2.1 for a streaming workload (see quoted code in (a)
below). The general
Thanks Koert. I'll check that out when we can update to 2.3
Meanwhile, I am trying hive sql (INSERT OVERWRITE) statement to insert
overwrite multiple partitions. (without loosing existing ones)
It's giving me issues around partition columns.
dataFrame.createOrReplaceTempView("updateTable") /
Hi Lehak
You can make a scala project with oozing class
And one run class which will ship your python file to cluster
Define oozie coordinator with spark action or shell action
We are deploying pyspark based machine learning code
Sent from my iPhone
> On Aug 2, 2018, at 8:46 AM, Lehak D
We are trying to deploy python script on spark cluster . However as per
documentations , it is not possible to deploy python applications on a
cluster . Is there any alternative
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
---
13 matches
Mail list logo