Spark - Scala-Java interoperablity

2020-08-16 Thread Ramesh Mathikumar
Hi Team, A quick question from my side. Can I use spark-submit which contains both java and scala in a single workflow. By single workflow I mean main program is in Java (Wrapped in Spark) and it calls a module to calculate something on the payload which is in Scala (wrapped in Spark). Are there

Re: Spark - Scala-Java interoperablity

2020-08-16 Thread Sean Owen
That should be fine. The JVM doesn't care how the bytecode it is executing was produced. As long as you were able to compile it together - which sometimes means using plugins like scala-maven-plugin for mixed compilation - the result should be fine. On Sun, Aug 16, 2020 at 4:28 PM Ramesh Mathikuma

Is there any possibility to avoid double computation in case of RDD checkpointing

2020-08-16 Thread Ivan Petrov
Hi! i use RDD checkpoint before writing to mongo to avoid duplicate records in DB. Seems like Driver writes the same data twice in case of task failure. - data calculated - mongo _id created - spark mongo connector writes data to Mongo - task crashes - (BOOM!) spark recomputes partition and gets ne