With the Spark 2.0 build from 0615, when running 4-user concurrent SQL
tests against Spark SQL on 1TB TPCDS, we are seeing
consistently the following exceptions:
10:35:33 AM: 16/06/27 23:40:37 INFO scheduler.TaskSetManager: Finished task
412.0 in stage 819.0 (TID 270396) in 8468 ms on 9.30.148.10
m.com
From: Sean Owen
To: Jesse F Chen/San Francisco/IBM@IBMUS
Cc: spark users
I had been running fine until builds around 05/07/2016
If I used the "--master yarn" in builds after 05/07, I got the following
error...sounds like something jars are missing.
I am using YARN 2.7.2 and Hive 1.2.1.
Do I need something new to deploy related to YARN?
bin/spark-sql -driver-me
Somewhat related, though this JIRA is on 1.6.
https://issues.apache.org/jira/browse/SPARK-13288#
So you have 90GB total memory, and 24 total cores.
Let's say you want to use 80% of all that memory (leaving memory for other
components) so you have 72GB to use.
You want to take advantage of all the cores and memory.
So this would be close:
executor size = 6g
number of executors = 12
cores pe
I am finding a strange issue with Spark SQL where "select count(*) "
returns wrong row counts for certain tables.
I am using TPCDS tables, so here are the actual counts:
Row count
I ran the same streaming application (compiled individually for 1.5.1 and
1.6.0) that processes 5-second tweet batches.
I noticed two things:
1. 10% regression in 1.6.0 vs 1.5.1
Spark v1.6.0: 1,564 tweets/s
Spark v1.5.1: 1,747 tweets/s
2. 1.6.0 streaming seems to have a memory leak.
v1.5.1.
Trying to enable CsvSink for metrics collecting, but I get the following
error as soon as kicking off a 'spark-submit' app:
15/12/08 11:24:02 INFO storage.BlockManagerMaster: Registered
BlockManager
15/12/08 11:24:02 ERROR metrics.MetricsSystem: Sink class
org.apache.spark.m
negligible.
From: Davies Liu
To: Jesse F Chen/San Fr
-master/target/scala-2.10/tpcdssparksql_2.10-0.9.jar
hdfs://rhel2.cisco.com:8020/user/bigsql/hadoopds100g
/TestAutomation/databricks/spark-sql-perf-master/src/main/queries/jesse/query39b.sql
From: "Cheng, Hao"
To: Todd
Cc: Jesse F Chen/San Francisco/IBM@IBMUS, Michae
Could this be a build issue (i.e., sbt package)?
If I ran the same jar build for 1.4.1 in 1.5, I am seeing large
regression too in queries (all other things identical)...
I am curious, to build 1.5 (when it isn't released yet), what do I need
to do with the build.sbt file?
any special
If you already loaded csv data into a dataframe, why not register it as a
table, and use Spark SQL
to find max/min or any other aggregates? SELECT MAX(column_name) FROM
dftable_name ... seems natural.
This is a question on general usage/best practice/best transformation
method to use for
a sentiment analysis on tweets...
Input:
Tweets (e.g, "@xyz, sorry but this movie is poorly scripted
http://t.co/uyser876";) - large data set, ie. 1 billion tweets
Sentiment dictionary (e.g, "
13 matches
Mail list logo