We have dockerized Spark Master and worker(s) separately and are using it in
our dev environment. We don't use Mesos though, running it in Standalone
mode, but adding Mesos should not be that difficult I think.
Regards
Venkat
--
View this message in context:
http://apache-spark-user-list.1001
Hi Cheng,
Thank you very much for taking your time and providing a detailed
explanation.
I tried a few things you suggested and some more things.
The ContactDetail table (8 GB) is the fact table and DAgents is the Dim
table (<500 KB), reverse of what you are assuming, but your ideas still
apply.
Bump up.
Michael Armbrust, anybody from Spark SQL team?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-table-Join-one-task-is-taking-long-tp20124p20218.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---
This is a more of a Scala question than Spark question. Which Dependency
Injection framework do you guys use for Scala when using Spark? Is
http://scaldi.org/ recommended?
Regards
Venkat
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Scala-Dependency-In
Environment: Spark 1.1, 4 Node Spark and Hadoop Dev cluster - 6 cores, 32 GB
Ram each. Default serialization, Standalone, no security
Data was sqooped from relational DB to HDFS and Data is partitioned across
HDFS uniformly. I am reading a fact table about 8 GB in size and one small
dim table fro
The method
def updateStateByKey[S: ClassTag] ( updateFunc: (Seq[V], Option[S]) =>
Option[S] ): DStream[(K, S)]
takes Dstream (K,V) and Produces DStream (K,S) in Spark Streaming
We have a input Dstream(K,V) that has 40,000 elements. We update on average
of 1000 elements of them in every 3 secon
TD,
We are seeing the same issue. We struggled through this until we found this
post and the work around.
A quick fix in the Spark Streaming software will help a lot for others who
are encountering this and pulling their hair out on why RDD on some
partitions are not computed (we ended up spending
For the time being, we decided to take a different route. We created a Rest
API layer in our app and allowed SQL query passing via the Rest. Internally
we pass that query to the SparkSQL layer on the RDD and return back the
results. With this Spark SQL is supported for our RDDs via this rest API
no
I have a data file that I need to process using Spark . The file has multiple
events for different users and I need to process the events for each user in
the order it is in the file.
User 1 : Event 1
User 2: Event 1
User 1 : Event 2
User 3: Event 1
User 2: Event 2
User 3: Event 2
etc..
I want t
1) If I have a standalone spark application that has already built a RDD,
how can SharkServer2 or for that matter Shark access 'that' RDD and do
queries on it. All the examples I have seen for Shark, the RDD (tables) are
created within Shark's spark context and processed.
This is not possible out
Thanks Michael.
OK will try SharkServer2..
But I have some basic questions on a related area:
1) If I have a standalone spark application that has already built a RDD,
how can SharkServer2 or for that matter Shark access 'that' RDD and do
queries on it. All the examples I have seen for Shark, the
We are planning to use the latest Spark SQL on RDDs. If a third party
application wants to connect to Spark via JDBC, does Spark SQL have support?
(We want to avoid going though Shark/Hive JDBC layer as we need good
performance).
BTW, we also want to do the same for Spark Streaming - With Spark SQ
12 matches
Mail list logo