test email recieved ;p
On 4 Jul 2017 7:40 am, "Sudha KS" wrote:
--
*Disclaimer: The information in this email is confidential and may be
legally privileged. Access to this email by anyone other than the intended
addressee is unauthorized. If you are not the intended recipient of this
messag
Using a long period betweem checkpoints may cause a long linage of the graphs
computations to be created, since Spark uses checkpointing to cut it, which
can also cause a delay in the streaming job.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-redu
Did you ever find a solution to this? If so, can you share your solution? I
am running into similar issue in YARN cluster mode connecting to impala
table.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/spark-jdbc-impala-with-kerberos-using-yarn-client-tp275
Hi!
I have a question about logs and have not seen the answer through internet.
I have a spark submit process and I configure a custom log configuration to
it using the next params:
--conf
"spark.executor.extraJavaOptions=-Dlog4j.configuration=customlog4j.properties"
--driver-java-options '-Dlog4j
Hello guys,
I'm using Spark SQL with Hive thru Thrift.
I need this because I need to create a table by table mask.
Here is an example:
1. Take tables by mask, like SHOW TABLES IN db 'table__*'
2. Create query like:
CREATE TABLE total_data AS
SELECT * FROM table__1
UNION ALL
SELECT * FROM table__2
In my Spark Streaming application, I have the need to build a graph from a file
and initializing that graph takes between 5 and 10 seconds.
So I tried initializing it once per executor so it'll be initialized only once.
After running the application, I've noticed that it's initiated much more th
import org.apache.avro.Schemaimport org.apache.spark.sql.SparkSession
val schema = new Schema.Parser().parse(new File("user.avsc"))val spark
= SparkSession.builder().master("local").getOrCreate()
spark
.read
.format("com.databricks.spark.avro")
.option("avroSchema", schema.toString)
.load("
With the left join, you are joining two tables.
In your case, df is the left table, dfAgg is the right table.
The second parameter should be the joining condition, right?
For instance
dfRes = df.join(dfAgg, $”userName”===$”name", "left_outer”)
having a field in df called userName, and another i
Hello, I don't understand my error message.
Basically, all I am doing is :
- dfAgg = df.groupBy("S_ID")
- dfRes = df.join(dfAgg, Seq("S_ID"), "left_outer")
However I get this AnalysisException: "
Exception in thread "main" org.apache.spark.sql.AnalysisException: resolved
attribute(s) S_ID#1903L m
10 matches
Mail list logo