Hi All,
I have a Spark SQL application to fetch data from Hive, on top I have a akka
layer to run multiple Queries in parallel.
*Please suggest a mechanism, so as to figure out the number of spark jobs
running in the cluster at a given instance of time. *
I need to do the above as, I see the ave
Hi Team,
Is there a way we can consume from Kafka using spark Streaming direct API
using multiple consumers (belonging to same consumer group)
Regards,
Sam
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-Kafka-Direct-API-Multiple-consumers-
Hi Team,
I have a spark application up & running on a 10 node Standalone cluster.
When i launch the application in cluster mode i am able to create separate
log file for driver & executors (common for all executors).
But, my requirement is to create separate log file for each executors. Is it
fe
Hi Team,
I am using *CDH 5.7.1* with spark *1.6.0*
I have a spark streaming application that read s from kafka & do some
processing.
The issue is while starting the application in CLUSTER mode, i want to pass
custom log4j.properies file to both driver & executor.
*I have the below command :-*
Thanks for the reply RK.
Using the first option, my application doesn't recognize
spark.driver.extraJavaOptions.
With the second option, the issue remains as same,
2016-07-21 12:59:41 ERROR SparkContext:95 - Error initializing SparkContext.
org.apache.spark.SparkException: Found both spark.exe
Hi All,
Is there any Pub-Sub for JMS provided by Spark out of box like Kafka?
Thanks.
Regards,
Sam
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-and-JMS-tp5371p25548.html
Sent from the Apache Spark User List mailing list archive at Nabbl
Hi Team,
I am evaluating different ways to submit & monitor spark Jobs using REST
Interfaces.
When to use Livy vs Spark Job Server?
Regards,
Sam
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/LIVY-VS-Spark-Job-Server-tp27722.html
Sent from the Apache Sp
Hi Experts,
I have a scenario, where in I want to write to a avro file from a streaming
job that reads data from kafka.
But the issue is, as there are multiple executors and when all try to write
to a given file I get a concurrent exception.
I way to mitigate the issue is to repartition & have a
Hi All,
Suppose I have a parquet file of 100 MB in HDFS & my HDFS block is 64MB, so
I have 2 block of data.
When I do, *sqlContext.parquetFile("path")* followed by an action , two
tasks are stared on two partitions.
My intend is to read this 2 blocks in more partitions to fully utilize my
cluste
Hi Experts,
I have a parquet dataset of 550 MB ( 9 Blocks) in HDFS. I want to run SQL
queries repetitively.
Few questions :
1. When I do the below (persist to memory after reading from disk), it takes
lot of time to persist to memory, any suggestions of how to tune this?
val inputP
How is spark faster than MR when data is in disk in both cases?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Vs-MR-tp22373.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---
Reduce *spark.sql.shuffle.partitions* from default of 200 to total number of
cores.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/4-seconds-to-count-13M-lines-Does-it-make-sense-tp22360p22374.html
Sent from the Apache Spark User List mailing list archive a
Hi Team,
I have a hive partition table with partition column having spaces.
When I try to run any query, say a simple "Select * from table_name", it
fails.
*Please note the same was working in spark 1.2.0, now I have upgraded to
1.3.1. Also there is no change in my application code base.*
If I
It does depend on the network IO within your cluster & CPU usage. Said that
the difference in time to run should not be huge (assumption, you are not
running any other job in the cluster in parallel).
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Jo
eated by samyamaiti on 12/25/14.
*/
object Driver {
def main(args: Array[String]) {
//CheckPoint dir in HDFS
val checkpointDirectory =
"hdfs://localhost:8020/user/samyamaiti/SparkCheckpoint1"
//functionToCreateContext
def functionToCreateContext(): StreamingContext = {
Sorry for the typo.
Apache Hadoop version is 2.6.0
Regards,
Sam
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/ReliableDeliverySupervisor-Association-with-remote-system-tp20859p20860.html
Sent from the Apache Spark User List mailing list archive at Nabbl
Hi All,
Please clarify.
Can we say 1 RDD is generated every batch interval?
If the above is true. Then, is the foreachRDD() operator executed one & only
once for each batch processing?
Regards,
Sam
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Can-we
Resolved.
I changed to Apache Hadoop 2.4.0 & Apache spark 1.2.0 combination, all works
fine.
Must be because the 1.2.0 version of spark was compiled with hadoop 2.4.0
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/ReliableDeliverySupervisor-Association-wi
Hi Experts,
Few general Queries :
1. Can a single block/partition in a RDD have more than 1 kafka message? or
there will be one & only one kafka message per block? In a more broader way,
is the message count related to block in any way or its just that any
message received with in a particular b
Hi Experts,
Like saveAsParquetFile on schemaRDD, there is a equivalent to store in ORC
file.
I am using spark 1.2.0.
As per the link below, looks like its not part of 1.2.0, so any latest
update would be great.
https://issues.apache.org/jira/browse/SPARK-2883
Till the next release, is there a w
20 matches
Mail list logo