Re: pyspark mappartions ()

2016-05-14 Thread Mathieu Longtin
>From memory: def processor(iterator): for item in iterator: newitem = do_whatever(item) yield newitem newdata = data.mapPartition(processor) Basically, your function takes an iterator as an argument, and must either be an iterator or return one. On Sat, May 14, 2016 at 12:39 AM Abi w

Re: pyspark mappartions ()

2016-05-14 Thread Sujit Pal
I built this recently using the accepted answer on this SO page: http://stackoverflow.com/questions/26741714/how-does-the-pyspark-mappartitions-function-work/26745371 -sujit On Sat, May 14, 2016 at 7:00 AM, Mathieu Longtin wrote: > From memory: > def processor(iterator): > for item in iterat

Re: support for golang

2016-05-14 Thread Mathieu Longtin
Considering that Pyspark is a very tightly integrated library rather than an RPC integration, I doubt a Go integration would come any time soon. On Fri, May 13, 2016 at 10:22 PM Sourav Chakraborty wrote: > Folks, > Was curious to find out if anybody ever considered/attempted to support > golan

spark sql write orc table on viewFS throws exception

2016-05-14 Thread linxi zeng
hi, all: Recently, we have encountered a problem while using spark sql to write orc table, which is related to https://issues.apache.org/jira/browse/HIVE-10790. In order to fix this problem we decided to patched the PR to the hive branch which spark1.5 rely on. We pull the hive branch( https://gith

Re: spark sql write orc table on viewFS throws exception

2016-05-14 Thread Mich Talebzadeh
I am not sure this is going to resolve INSERT OVEERWRITE into ORC table issue. Can you go to hive and do show create table custom.rank_less_orc_none and send the output. Is that table defined as transactional? Other alternative is to use Spark to insert into a normal text table and do insert fr