from:"Venkat Subramanian"

Re: dockerized spark executor on mesos?

2014-12-09 Thread Venkat Subramanian

We have dockerized Spark Master and worker(s) separately and are using it in our dev environment. We don't use Mesos though, running it in Standalone mode, but adding Mesos should not be that difficult I think. Regards Venkat -- View this message in context: http://apache-spark-user-list.1001

Re: Spark SQL table Join, one task is taking long

2014-12-04 Thread Venkat Subramanian

Hi Cheng, Thank you very much for taking your time and providing a detailed explanation. I tried a few things you suggested and some more things. The ContactDetail table (8 GB) is the fact table and DAgents is the Dim table (<500 KB), reverse of what you are assuming, but your ideas still apply.

Re: Spark SQL table Join, one task is taking long

2014-12-02 Thread Venkat Subramanian

Bump up. Michael Armbrust, anybody from Spark SQL team? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-table-Join-one-task-is-taking-long-tp20124p20218.html Sent from the Apache Spark User List mailing list archive at Nabble.com. ---

Scala Dependency Injection

2014-12-02 Thread Venkat Subramanian

This is a more of a Scala question than Spark question. Which Dependency Injection framework do you guys use for Scala when using Spark? Is http://scaldi.org/ recommended? Regards Venkat -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Scala-Dependency-In

Spark SQL table Join, one task is taking long

2014-12-01 Thread Venkat Subramanian

Environment: Spark 1.1, 4 Node Spark and Hadoop Dev cluster - 6 cores, 32 GB Ram each. Default serialization, Standalone, no security Data was sqooped from relational DB to HDFS and Data is partitioned across HDFS uniformly. I am reading a fact table about 8 GB in size and one small dim table fro

UpdateStateByKey - How to improve performance?

2014-08-06 Thread Venkat Subramanian

The method def updateStateByKey[S: ClassTag] ( updateFunc: (Seq[V], Option[S]) => Option[S] ): DStream[(K, S)] takes Dstream (K,V) and Produces DStream (K,S) in Spark Streaming We have a input Dstream(K,V) that has 40,000 elements. We update on average of 1000 elements of them in every 3 secon

Re: streaming window not behaving as advertised (v1.0.1)

2014-08-01 Thread Venkat Subramanian

TD, We are seeing the same issue. We struggled through this until we found this post and the work around. A quick fix in the Spark Streaming software will help a lot for others who are encountering this and pulling their hair out on why RDD on some partitions are not computed (we ended up spending

Re: Spark SQL JDBC Connectivity

2014-07-30 Thread Venkat Subramanian

For the time being, we decided to take a different route. We created a Rest API layer in our app and allowed SQL query passing via the Rest. Internally we pass that query to the SparkSQL layer on the RDD and return back the results. With this Spark SQL is supported for our RDDs via this rest API no

Partioner to process data in the same order for each key

2014-07-30 Thread Venkat Subramanian

I have a data file that I need to process using Spark . The file has multiple events for different users and I need to process the events for each user in the order it is in the file. User 1 : Event 1 User 2: Event 1 User 1 : Event 2 User 3: Event 1 User 2: Event 2 User 3: Event 2 etc.. I want t

Re: Spark SQL JDBC Connectivity and more

2014-06-09 Thread Venkat Subramanian

1) If I have a standalone spark application that has already built a RDD, how can SharkServer2 or for that matter Shark access 'that' RDD and do queries on it. All the examples I have seen for Shark, the RDD (tables) are created within Shark's spark context and processed. This is not possible out

Re: Spark SQL JDBC Connectivity and more

2014-05-29 Thread Venkat Subramanian

Thanks Michael. OK will try SharkServer2.. But I have some basic questions on a related area: 1) If I have a standalone spark application that has already built a RDD, how can SharkServer2 or for that matter Shark access 'that' RDD and do queries on it. All the examples I have seen for Shark, the

Spark SQL JDBC Connectivity

2014-05-28 Thread Venkat Subramanian

We are planning to use the latest Spark SQL on RDDs. If a third party application wants to connect to Spark via JDBC, does Spark SQL have support? (We want to avoid going though Shark/Hive JDBC layer as we need good performance). BTW, we also want to do the same for Spark Streaming - With Spark SQ

Re: dockerized spark executor on mesos?

Re: Spark SQL table Join, one task is taking long

Re: Spark SQL table Join, one task is taking long

Scala Dependency Injection

Spark SQL table Join, one task is taking long

UpdateStateByKey - How to improve performance?

Re: streaming window not behaving as advertised (v1.0.1)

Re: Spark SQL JDBC Connectivity

Partioner to process data in the same order for each key

Re: Spark SQL JDBC Connectivity and more

Re: Spark SQL JDBC Connectivity and more

Spark SQL JDBC Connectivity

12 matches

Site Navigation

Mail list logo

Footer information