Re: HyperLogLogUDT

2015-07-01 Thread Reynold Xin
Yes - it's very interesting. However, ideally we should have a version of hyperloglog that can work directly against some raw bytes in memory (rather than java objects), in order for this to fit the Tungsten execution model where everything is operating directly against some memory address. On Wed

Re: HyperLogLogUDT

2015-07-01 Thread Nick Pentreath
Sure I can copy the code but my aim was more to understand: (A) if this is broadly interesting enough to folks to think about updating / extending the existing UDAF within Spark (b) how to register ones own custom UDAF - in which case it could be a Spark package for example  All examples

Re: [pyspark] What is the best way to run a minimum unit testing related to our developing module?

2015-07-01 Thread Yu ISHIKAWA
Thanks! --Yu 2015-07-02 13:13 GMT+09:00 Reynold Xin : > Run > > ./python/run-tests --help > > and you will see. :) > > On Wed, Jul 1, 2015 at 9:10 PM, Yu Ishikawa > wrote: > >> Hi all, >> >> When I develop pyspark modules, such as adding a spark.ml API in Python, >> I'd >> like to run a minimum

Re: [pyspark] What is the best way to run a minimum unit testing related to our developing module?

2015-07-01 Thread Yu Ishikawa
Thanks! --Yu - -- Yu Ishikawa -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/pyspark-What-is-the-best-way-to-run-a-minimum-unit-testing-related-to-our-developing-module-tp12987p12989.html Sent from the Apache Spark Developers List mailing list arc

Re: [pyspark] What is the best way to run a minimum unit testing related to our developing module?

2015-07-01 Thread Reynold Xin
Run ./python/run-tests --help and you will see. :) On Wed, Jul 1, 2015 at 9:10 PM, Yu Ishikawa wrote: > Hi all, > > When I develop pyspark modules, such as adding a spark.ml API in Python, > I'd > like to run a minimum unit testing related to the developing module again > and again. > In the p

[pyspark] What is the best way to run a minimum unit testing related to our developing module?

2015-07-01 Thread Yu Ishikawa
Hi all, When I develop pyspark modules, such as adding a spark.ml API in Python, I'd like to run a minimum unit testing related to the developing module again and again. In the previous version, that was easy with commenting out unrelated modules in the ./python/run-tests script. So what is the b

Re: enum-like types in Spark

2015-07-01 Thread Stephen Boesch
I am reviving an old thread here. The link for the example code for the java enum based solution is now dead: would someone please post an updated link showing the proper interop? Specifically: it is my understanding that java enum's may not be created within Scala. So is the proposed solution re

Re: HyperLogLogUDT

2015-07-01 Thread Daniel Darabos
It's already possible to just copy the code from countApproxDistinct and access the HLL directly, or do anything you like. On Wed, Jul 1, 2015 at 5:26 PM, Nick Pentreath wrote: > Any thoughts?

Re: HyperLogLogUDT

2015-07-01 Thread Nick Pentreath
Any thoughts? — Sent from Mailbox On Tue, Jun 23, 2015 at 11:19 AM, Nick Pentreath wrote: > Hey Spark devs > I've been looking at DF UDFs and UDAFs. The approx distinct is using > hyperloglog, > but there is only an option to return the count as a Long. > It can be useful to be able to return

Re: [VOTE] Release Apache Spark 1.4.1

2015-07-01 Thread Tathagata Das
+1 On Tue, Jun 30, 2015 at 8:12 PM, Bobby Chowdary wrote: > +1 Tested on CentOS 7 > On Jun 30, 2015 19:38, "Joseph Bradley" wrote: > >> +1 >> >> On Tue, Jun 30, 2015 at 5:27 PM, Reynold Xin wrote: >> >>> +1 >>> >>> On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell >>> wrote: >>> Please v