Dependency Injection and Microservice development with Spark

2016-12-23 Thread Chetan Khatri
Hello Community, Current approach I am using for Spark Job Development with Scala + SBT and Uber Jar with yml properties file to pass configuration parameters. But If i would like to use Dependency Injection and MicroService Development like Spring Boot feature in Scala then what would be the stan

Negative number of active tasks

2016-12-23 Thread Andy Dang
Hi all, Today I hit a weird bug in Spark 2.0.2 (vanilla Spark) - the executor tab shows negative number of active tasks. I have about 25 jobs, each with 20k tasks so the numbers are not that crazy. What could possibly the cause of this bug? This is the first time I've seen it and the only specia

MapOutputTracker.getMapSizesByExecutorId and mutation on the driver?

2016-12-23 Thread Jacek Laskowski
Hi, I've been reviewing how MapOutputTracker works and can't understand the comment [1]: // Synchronize on the returned array because, on the driver, it gets mutated in place How is this possible since "the returned array" is a local value? I'm stuck and would appreciate help. Thanks! (It also

Re: Best Practice for Spark Job Jar Generation

2016-12-23 Thread Andy Dang
I used to use uber jar in Spark 1.x because of classpath issues (we couldn't re-model our dependencies based on our code, and thus cluster's run dependencies could be very different from running Spark directly in the IDE. We had to use userClasspathFirst "hack" to work around this. With Spark 2, i

Re: Best Practice for Spark Job Jar Generation

2016-12-23 Thread Chetan Khatri
Andy, Thanks for reply. If we download all the dependencies at separate location and link with spark job jar on spark cluster, is it best way to execute spark job ? Thanks. On Fri, Dec 23, 2016 at 8:34 PM, Andy Dang wrote: > I used to use uber jar in Spark 1.x because of classpath issues (we

Re: Best Practice for Spark Job Jar Generation

2016-12-23 Thread Andy Dang
We remodel Spark dependencies and ours together and chuck them under the /jars path. There are other ways to do it but we want the classpath to be strictly as close to development as possible. --- Regards, Andy On Fri, Dec 23, 2016 at 6:00 PM, Chetan Khatri wrote: > Andy, Thanks for reply.

Re: Best Practice for Spark Job Jar Generation

2016-12-23 Thread Chetan Khatri
Correct, so the approach you suggested and Uber Jar Approach. What i think that Uber Jar approach is best practice because if you wish to do environment migration then would be easy. and Performance wise also Uber Jar Approach would be more optimised rather than Uber less approach. Thanks. On Fri

Re: Approach: Incremental data load from HBASE

2016-12-23 Thread Chetan Khatri
Ted Correct, In my case i want Incremental Import from HBASE and Incremental load to Hive. Both approach discussed earlier with Indexing seems accurate to me. But like Sqoop support Incremental import and load for RDBMS, Is there any tool which supports Incremental import from HBase ? On Wed, De

Re: Negative number of active tasks

2016-12-23 Thread Chetan Khatri
Could you share Pseudo code for the same. Cheers! C Khatri. On Fri, Dec 23, 2016 at 4:33 PM, Andy Dang wrote: > Hi all, > > Today I hit a weird bug in Spark 2.0.2 (vanilla Spark) - the executor tab > shows negative number of active tasks. > > I have about 25 jobs, each with 20k tasks so the nu

Re: MapOutputTracker.getMapSizesByExecutorId and mutation on the driver?

2016-12-23 Thread Liang-Chi Hsieh
Hi, I think the comment [1] is only correct for "getStatistics" as it is called at driver side. It should be added in "getMapSizesByExecutorId" by mistake. Jacek Laskowski wrote > Hi, > > I've been reviewing how MapOutputTracker works and can't understand > the comment [1]: > > // Synchroniz