>From memory:
def processor(iterator):
for item in iterator:
newitem = do_whatever(item)
yield newitem
newdata = data.mapPartition(processor)
Basically, your function takes an iterator as an argument, and must either
be an iterator or return one.
On Sat, May 14, 2016 at 12:39 AM Abi w
I built this recently using the accepted answer on this SO page:
http://stackoverflow.com/questions/26741714/how-does-the-pyspark-mappartitions-function-work/26745371
-sujit
On Sat, May 14, 2016 at 7:00 AM, Mathieu Longtin
wrote:
> From memory:
> def processor(iterator):
> for item in iterat
Considering that Pyspark is a very tightly integrated library rather than
an RPC integration, I doubt a Go integration would come any time soon.
On Fri, May 13, 2016 at 10:22 PM Sourav Chakraborty
wrote:
> Folks,
> Was curious to find out if anybody ever considered/attempted to support
> golan
hi, all:
Recently, we have encountered a problem while using spark sql to write orc
table, which is related to https://issues.apache.org/jira/browse/HIVE-10790.
In order to fix this problem we decided to patched the PR to the hive
branch which spark1.5 rely on.
We pull the hive branch(
https://gith
I am not sure this is going to resolve INSERT OVEERWRITE into ORC table
issue. Can you go to hive and do
show create table custom.rank_less_orc_none
and send the output.
Is that table defined as transactional?
Other alternative is to use Spark to insert into a normal text table and do
insert fr