spark.apache.org>>
Subject: Re: [discuss] new Java friendly InputSource API
In the ctor of InputSource (I'm also considering adding an explicit initialize
call), the implementation of InputSource can execute arbitrary code. The state
in it will also be serialized and passed onto the execut
In the ctor of InputSource (I'm also considering adding an explicit
initialize call), the implementation of InputSource can execute arbitrary
code. The state in it will also be serialized and passed onto the executors.
Yes - technically you can hijack getSplits in Hadoop InputFormat to do the
same
Hi Reynold,
You mentioned that the new API allows arbitrary code to be run on the
driver side, but it¹s not very clear to me how this is different from what
Hadoop API provides. In your example of using broadcast, did you mean
broadcasting something in InputSource.getPartitions() and having
InputP
I'm also super interested in this. Flambo (our clojure DSL) wraps the java
api and it would be great to have this.
On Tue, Apr 21, 2015 at 4:10 PM, Reynold Xin wrote:
> It can reuse. That's a good point and we should document it in the API
> contract.
>
>
> On Tue, Apr 21, 2015 at 4:06 PM, Punya
It can reuse. That's a good point and we should document it in the API
contract.
On Tue, Apr 21, 2015 at 4:06 PM, Punyashloka Biswal
wrote:
> Reynold, thanks for this! At Palantir we're heavy users of the Java APIs
> and appreciate being able to stop hacking around with fake ClassTags :)
>
> Re
Reynold, thanks for this! At Palantir we're heavy users of the Java APIs
and appreciate being able to stop hacking around with fake ClassTags :)
Regarding this specific proposal, is the contract of RecordReader#get
intended to be that it returns a fresh object each time? Or is it allowed
to mutate