Re: new jenkins update + tentative release date

2014-10-12 Thread Josh Rosen
Reminder: this Jenkins migration is happening tomorrow morning (Monday). On Fri, Oct 10, 2014 at 1:01 PM, shane knapp wrote: > reminder: this IS happening, first thing monday morning PDT. :) > > On Wed, Oct 8, 2014 at 3:01 PM, shane knapp wrote: > > > greetings! > > > > i've got some updates

Re: reading/writing parquet decimal type

2014-10-12 Thread Matei Zaharia
The fixed-length binary type can hold fewer bytes than an int64, though many encodings of int64 can probably do the right thing. We can look into supporting multiple ways to do this -- the spec does say that you should at least be able to read int32s and int64s. Matei On Oct 12, 2014, at 8:20

Re: Scalastyle improvements / large code reformatting

2014-10-12 Thread Matei Zaharia
I'm also against these huge reformattings. They slow down development and backporting for trivial reasons. Let's not do that at this point, the style of the current code is quite consistent and we have plenty of other things to worry about. Instead, what you can do is as you edit a file when you

Re: Scalastyle improvements / large code reformatting

2014-10-12 Thread Patrick Wendell
Another big problem with these patches are that they make it almost impossible to backport changes to older branches cleanly (there becomes like 100% chance of a merge conflict). One proposal is to do this: 1. We only consider new style rules at the end of a release cycle, when there is the smalle

Re: Scalastyle improvements / large code reformatting

2014-10-12 Thread Reynold Xin
I actually think we should just take the bite and follow through with the reformatting. Many rules are simply not possible to enforce only on deltas (e.g. import ordering). That said, maybe there are better windows to do this, e.g. during the QA period. On Sun, Oct 12, 2014 at 9:37 PM, Josh Rosen

Scalastyle improvements / large code reformatting

2014-10-12 Thread Josh Rosen
There are a number of open pull requests that aim to extend Spark’s automated style checks (see https://issues.apache.org/jira/browse/SPARK-3849 for an umbrella JIRA).  These fixes are mostly good, but I have some concerns about merging these patches.  Several of these patches make large reforma

Re: reading/writing parquet decimal type

2014-10-12 Thread Michael Allman
Hi Matei, Thanks, I can see you've been hard at work on this! I examined your patch and do have a question. It appears you're limiting the precision of decimals written to parquet to those that will fit in a long, yet you're writing the values as a parquet binary type. Why not write them using

Re: Decision forests don't work with non-trivial categorical features

2014-10-12 Thread Evan Sparks
I was under the impression that we were using the usual sort by average response value heuristic when storing histogram bins (and searching for optimal splits) in the tree code. Maybe Manish or Joseph can clarify? > On Oct 12, 2014, at 2:50 PM, Sean Owen wrote: > > I'm having trouble getting

Re: reading/writing parquet decimal type

2014-10-12 Thread Matei Zaharia
Hi Michael, I've been working on this in my repo: https://github.com/mateiz/spark/tree/decimal. I'll make some pull requests with these features soon, but meanwhile you can try this branch. See https://github.com/mateiz/spark/compare/decimal for the individual commits that went into it. It has

reading/writing parquet decimal type

2014-10-12 Thread Michael Allman
Hello, I'm interested in reading/writing parquet SchemaRDDs that support the Parquet Decimal converted type. The first thing I did was update the Spark parquet dependency to version 1.5.0, as this version introduced support for decimals in parquet. However, conversion between the catalyst decim

Decision forests don't work with non-trivial categorical features

2014-10-12 Thread Sean Owen
I'm having trouble getting decision forests to work with categorical features. I have a dataset with a categorical feature with 40 values. It seems to be treated as a continuous/numeric value by the implementation. Digging deeper, I see there is some logic in the code that indicates that categoric