Re: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-19 Thread David Holiday
hi all - thx for the alacritous replies! so regarding how to get things from notebook to spark and back, am I correct that spark-submit is the way to go? DAVID HOLIDAY Software Engineer 760 607 3300 | Office 312 758 8385 | Mobile dav...@annaisystems.com<mailto:broo...@annaisystems.

Re: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-19 Thread David Holiday
kk - I'll put something together and get back to you with more :-) DAVID HOLIDAY Software Engineer 760 607 3300 | Office 312 758 8385 | Mobile dav...@annaisystems.com<mailto:broo...@annaisystems.com> [cid:AE39C43E-3FF7-4C90-BCE4-9711C84C4CB8@cld.annailabs.com] www.AnnaiSyste

Re: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-24 Thread David Holiday
hat I haven't specified any parameters as to which table to connect with, what the auths are, etc. so my question is: what do I need to do from here to get those first ten rows of table data into my RDD? DAVID HOLIDAY Software Engineer 760 607 3300 | Office 312 758 8385 | Mobi

Re: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-25 Thread David Holiday
hi Irfan, thanks for getting back to me - i'll try the accumulo list to be sure. what is the normal use case for spark though? I'm surprised that hooking it into something as common and popular as accumulo isn't more of an every-day task. DAVID HOLIDAY Software Engineer 760 60

Re: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-26 Thread David Holiday
see the entire thread of code, responses from notebook, etc. I'm going to try invoking the same techniques both from within a stand-alone scala problem and from the shell itself to see if I can get some traction. I'll report back when I have more data. cheers (and thx!) DAVID HOLIDAY Sof

Re: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-26 Thread David Holiday
se - there are 10,000 rows of data in the table I pointed to. however, when I try to grab the first element of data thusly: rddX.first I get the following error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 in stage 0.0 (TID 0) had a not serializable result: org

Re: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-26 Thread David Holiday
w0t! that did it! t/y so much! I'm going to put together a pastebin or something that has all the code put together so if anyone else runs into this issue they will have some working code to help them figure out what's going on. DAVID HOLIDAY Software Engineer 76

Re: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-26 Thread David Holiday
will do! I've got to clear with my boss what I can post and in what manner, but I'll definitely do what I can to put some working code out into the world so the next person who runs into this brick wall can benefit from all this :-D DAVID HOLIDAY Software Engineer 760 607 3300 | Offi

Re: Anatomy of RDD : Deep dive into RDD data structure

2015-04-01 Thread David Holiday
w00t - t/y for this! I'm currently doing a deep dive into the RDD memory footprint under various conditions so this is timely and helpful. DAVID HOLIDAY Software Engineer 760 607 3300 | Office 312 758 8385 | Mobile dav...@annaisystems.com<mailto:broo...@annaisystems.com> [cid:AE

sparkR equivalent to SparkContext.newAPIHadoopRDD?

2015-05-02 Thread David Holiday
how to make the magic happen with sparkR. Anyone got any ideas? thanks! DAVID HOLIDAY Software Engineer 760 607 3300 | Office 312 758 8385 | Mobile dav...@annaisystems.com<mailto:broo...@annaisystems.com> [cid:AE39C43E-3FF7-4C90-BCE4-9711C84C4CB8@cld.annailabs.com] www.