Had a short sync with Tom. I am going to postpone this for now since this
case is very unlikely - I have seen this twice for the last 5 years.
We'll go for a vote when we happen to see this more, and make a decision
based on the feedback in the vote thread.
2020년 5월 11일 (월) 오후 11:08, Hyukjin Kwon
The guide is our official guide, see "Code Style Guide" in
http://spark.apache.org/contributing.html.
As I said this is a general guidance, instead of a hard strict policy. I
don't target to change existing APIs either.
I would like to not start the vote when I see the clear objection to
address, T
So as I've already stated and it looks like 2 others have issues with number 4
as written as well, I'm against you posting this as is. I do not think we
should recommend 4 for public user facing Scala API.
Also note the page you linked is a Databricks page, while I know we reference
it as a st
I will wait a couple of more days and if there's no objection I hear, I
will document this at
https://github.com/databricks/scala-style-guide#java-interoperability.
2020년 5월 7일 (목) 오후 9:18, Hyukjin Kwon 님이 작성:
> Hi all, I would like to proceed this. Are there more thoughts on this? If
> not, I wo
Hi all, I would like to proceed this. Are there more thoughts on this? If
not, I would like to go ahead with the proposal here.
2020년 4월 30일 (목) 오후 10:54, Hyukjin Kwon 님이 작성:
> Nothing is urgent. I just don't want to leave it undecided and just keep
> adding Java APIs inconsistently as it's curre
Nothing is urgent. I just don't want to leave it undecided and just keep
adding Java APIs inconsistently as it's currently happening.
We should have a set of coherent APIs. It's very difficult to change APIs
once they are out in releases. I guess I have seen people here agree with
having a general
I feel a little pushed... :-) I still don't get the point of why it's
urgent to make the decision now. AFAIK, it's a common practice to handle
Scala types conversions by self when Java programmers prepare to
invoke Scala libraries. I'm not sure which one is the Java programmers'
root complaint, Sca
There was a typo in the previous email. I am re-sending:
Hm, I thought you meant you prefer 3. over 4 but don't mind particularly.
I don't mean to wait for more feedback. It looks likely just a deadlock
which will be the worst case.
I was suggesting to pick one way first, and stick to it. If we fi
Hm, I thought you meant you prefer 3. over 4 but don't mind particularly.
I don't mean to wait for more feedback. It looks likely just a deadlock
which will be the worst case.
I was suggesting to pick one way first, and stick to it. If we find out
something later, we can discuss
more about changing
Sorry I'm not sure what your last email means. Does it mean you are putting it
up for a vote or just waiting to get more feedback? I disagree with saying
option 4 is the rule but agree having a general rule makes sense. I think we
need a lot more input to make the rule as it affects the api's
I think I am not seeing explicit objection here but rather see people tend
to agree with the proposal in general.
I would like to step forward rather than leaving it as a deadlock - the
worst choice here is to postpone and abandon this discussion with this
inconsistency.
I don't currently target t
Spark has targeted to have a unified API set rather than having separate
Java classes to reduce the maintenance cost,
e.g.) JavaRDD <> RDD vs DataFrame. These JavaXXX are more about the legacy.
I think it's best to stick to the approach 4. in general cases.
Other options might have to be considere
The con is much more than just more effort to maintain a parallel API. It
puts the burden for all libraries and library developers to maintain a
parallel API as well. That’s one of the primary reasons we moved away from
this RDD vs JavaRDD approach in the old RDD API.
On Tue, Apr 28, 2020 at 12:3
Be frankly, I also love the pure Java type in Java API and Scala type in
Scala API. :-)
If we don't treat Java as a "FRIEND" of Scala, just as Python, maybe we
can adopt the status of option 1, the specific Java classes. (But I don't
like the `Java` prefix, which is redundant when I'm coding Java
The problem is that calling Scala instances in Java side is discouraged in
general up to my best knowledge.
A Java user won't likely know asJava in Scala but a Scala user will likely
know both asScala and asJava.
2020년 4월 28일 (화) 오전 11:35, ZHANG Wei 님이 작성:
> How about making a small change on op
How about making a small change on option 4:
Keep Scala API returning Scala type instance with providing a
`asJava` method to return a Java type instance.
Scala 2.13 has provided CollectionConverter [1][2][3], in the following
Spark dependences upgrade, which can be supported by nature. For
cu
I would like to make sure I am open for other options that can be
considered situationally and based on the context.
It's okay, and I don't target to restrict this here. For example, DSv2, I
understand it's written in Java because Java
interfaces arguably brings better performance. That's why vecto
> One thing we could do here is use Java collections internally and make
the Scala API a thin wrapper around Java -- like how Python works.
> Then adding a method to the Scala API would require adding it to the Java
API and we would keep the two more in sync.
I think it can be an appropriate idea
I think the right choice here depends on how the object is used. For
developer and internal APIs, I think standardizing on Java collections
makes the most sense.
For user-facing APIs, it is awkward to return Java collections to Scala
code -- I think that's the motivation for Tom's comment. For use
Let's stick to the less maintenance efforts then rather than we leave it
undecided and delay with leaving this inconsistency.
I dont think we can have some very meaningful data about this soon given
that we don't hear much complaints about this in general so far.
The point of this thread is to ma
IIUC We are moving away from having 2 classes for Java and Scala, like
JavaRDD and RDD. It's much simpler to maintain and use with a single class.
I don't have a strong preference over option 3 or 4. We may need to collect
more data points from actual users.
On Mon, Apr 27, 2020 at 9:50 PM Hyukji
Scala users are arguably more prevailing compared to Java users, yes.
Using the Java instances in Scala side is legitimate, and they are already
being used in multiple please. I don't believe Scala
users find this not Scala friendly as it's legitimate and already being
used. I personally find it's
I agree a general guidance is good so we keep consistent in the apis. I don't
necessarily agree that 4 is the best solution though. I agree its nice to have
one api, but it is less friendly for the scala side. Searching for the
equivalent Java api shouldn't be hard as it should be very close
The guidance sounds fine, if the general message is 'keep it simple'.
The right approach might be pretty situational. For example RDD has a
lot of methods that need a Java variant. Putting all the overloads in
one class might be harder to figure out than making a separate return
type with those met
24 matches
Mail list logo