To complete design pattern: http://stackoverflow.com/questions/30450763/spark-streaming-and-connection-pool-implementation
Petr On Mon, Sep 21, 2015 at 10:02 PM, Romi Kuntsman <r...@totango.com> wrote: > Cody, that's a great reference! > As shown there - the best way to connect to an external database from the > workers is to create a connection pool on (each) worker. > The driver mass pass, via broadcast, the connection string, but not the > connect object itself and not the spark context. > > On Mon, Sep 21, 2015 at 5:31 PM Cody Koeninger <c...@koeninger.org> wrote: > >> That isn't accurate, I think you're confused about foreach. >> >> Look at >> >> >> http://spark.apache.org/docs/latest/streaming-programming-guide.html#design-patterns-for-using-foreachrdd >> >> >> On Mon, Sep 21, 2015 at 7:36 AM, Romi Kuntsman <r...@totango.com> wrote: >> >>> foreach is something that runs on the driver, not the workers. >>> >>> if you want to perform some function on each record from cassandra, you >>> need to do cassandraRdd.map(func), which will run distributed on the spark >>> workers >>> >>> *Romi Kuntsman*, *Big Data Engineer* >>> http://www.totango.com >>> >>> On Mon, Sep 21, 2015 at 3:29 PM, Priya Ch <learnings.chitt...@gmail.com> >>> wrote: >>> >>>> Yes, but i need to read from cassandra db within a spark >>>> transformation..something like.. >>>> >>>> dstream.forachRDD{ >>>> >>>> rdd=> rdd.foreach { >>>> message => >>>> sc.cassandraTable() >>>> . >>>> . >>>> . >>>> } >>>> } >>>> >>>> Since rdd.foreach gets executed on workers, how can i make sparkContext >>>> available on workers ??? >>>> >>>> Regards, >>>> Padma Ch >>>> >>>> On Mon, Sep 21, 2015 at 5:10 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>>> >>>>> You can use broadcast variable for passing connection information. >>>>> >>>>> Cheers >>>>> >>>>> On Sep 21, 2015, at 4:27 AM, Priya Ch <learnings.chitt...@gmail.com> >>>>> wrote: >>>>> >>>>> can i use this sparkContext on executors ?? >>>>> In my application, i have scenario of reading from db for certain >>>>> records in rdd. Hence I need sparkContext to read from DB (cassandra in >>>>> our >>>>> case), >>>>> >>>>> If sparkContext couldn't be sent to executors , what is the workaround >>>>> for this ?????? >>>>> >>>>> On Mon, Sep 21, 2015 at 3:06 PM, Petr Novak <oss.mli...@gmail.com> >>>>> wrote: >>>>> >>>>>> add @transient? >>>>>> >>>>>> On Mon, Sep 21, 2015 at 11:27 AM, Priya Ch < >>>>>> learnings.chitt...@gmail.com> wrote: >>>>>> >>>>>>> Hello All, >>>>>>> >>>>>>> How can i pass sparkContext as a parameter to a method in an >>>>>>> object. Because passing sparkContext is giving me TaskNotSerializable >>>>>>> Exception. >>>>>>> >>>>>>> How can i achieve this ? >>>>>>> >>>>>>> Thanks, >>>>>>> Padma Ch >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >>