date:20151006

What is the difference between ml.classification.LogisticRegression and mllib.classification.LogisticRegressionWithLBFGS

2015-10-06 Thread YiZhi Liu

Hi everyone, I'm curious about the difference between ml.classification.LogisticRegression and mllib.classification.LogisticRegressionWithLBFGS. Both of them are optimized using LBFGS, the only difference I see is LogisticRegression takes DataFrame while LogisticRegressionWithLBFGS takes RDD. So

Re: failure notice

2015-10-06 Thread Tathagata Das

Unfortunately, there is not an obvious way to do this. I am guessing that you want to partition your stream such that the same keys always go to the same executor, right? You could do it by writing a custom RDD. See ShuffledRDD

Re: multiple count distinct in SQL/DataFrame?

2015-10-06 Thread Reynold Xin

To provide more context, if we do remove this feature, the following SQL query would throw an AnalysisException: select count(distinct colA), count(distinct colB) from foo; The following should still work: select count(distinct colA) from foo; The following should also work: select count(disti

multiple count distinct in SQL/DataFrame?

2015-10-06 Thread Reynold Xin

The current implementation of multiple count distinct in a single query is very inferior in terms of performance and robustness, and it is also hard to guarantee correctness of the implementation in some of the refactorings for Tungsten. Supporting a better version of it is possible in the future,

Re: Adding Spark Testing functionality

2015-10-06 Thread Holden Karau

I'll put together a google doc and send that out (in the meantime a quick guide of sort of how the current package can be used is in the blog post I did at http://blog.cloudera.com/blog/2015/09/making-apache-spark-testing-easy-with-spark-testing-base/ ) If people think its better to keep as a pack

Re: Adding Spark Testing functionality

2015-10-06 Thread Patrick Wendell

Hey Holden, It would be helpful if you could outline the set of features you'd imagine being part of Spark in a short doc. I didn't see a README on the existing repo, so it's hard to know exactly what is being proposed. As a general point of process, we've typically avoided merging modules into S

Adding Spark Testing functionality

2015-10-06 Thread Holden Karau

Hi Spark Devs, So this has been brought up a few times before, and generally on the user list people get directed to use spark-testing-base. I'd like to start moving some of spark-testing-base's functionality into Spark so that people don't need a library to do what is (hopefully :p) a very common

Re: CQs on WindowedStream created on running StreamingContext

2015-10-06 Thread Yogesh Mahajan

Anyone knows about this ? TD ? -yogesh > On 30-Sep-2015, at 1:25 pm, Yogs wrote: > > Hi, > > We intend to run adhoc windowed continuous queries on spark streaming data. > The queries could be registered/deregistered dynamically or can be submitted > through command line. Currently Spark str

Re: SparkR dataframe UDF

2015-10-06 Thread Hossein

User defined functions written in R are not supposed yet. You can implement your UDF in Scala, register it in sqlContext and use it in SparkR, provided that you share your context between R and Scala. --Hossein On Friday, October 2, 2015, Renyi Xiong wrote: > Hi Shiva, > > Is Dataframe UDF impl

Re: failure notice

2015-10-06 Thread Renyi Xiong

yes, it can recover on a different node. it uses write-ahead-log, checkpoints offsets of both ingress and egress (e.g. using zookeeper and/or kafka), replies on the streaming engine's deterministic operations. by replaying back a certain range of data based on checkpointed ingress offset (at least

Re: StructType has more rows, than corresponding Row has objects.

2015-10-06 Thread Eugene Morozov

Davies, that seemed to be my issue, my colleague helped me to resolved it. The problem was that we build RDD and corresponding StructType by ourselves (no json, parquet, cassandra, etc - we take a list of business objects and convert them to Rows, then infer struct type) and I missed one thing. --

Re: Pyspark dataframe read

2015-10-06 Thread Koert Kuipers

i personally find the comma separated paths feature much more important than commas in paths (which one could argue you should avoid). but assuming people want to keep commas as legitimate characters in paths: https://issues.apache.org/jira/browse/SPARK-10185 https://github.com/apache/spark/pull/8

Re: Pyspark dataframe read

2015-10-06 Thread Reynold Xin

I think the problem is that comma is actually a legitimate character for file name, and as a result ... On Tuesday, October 6, 2015, Josh Rosen wrote: > Could someone please file a JIRA to track this? > https://issues.apache.org/jira/browse/SPARK > > On Tue, Oct 6, 2015 at 1:21 AM, Koert Kuipers

Re: Pyspark dataframe read

2015-10-06 Thread Josh Rosen

Could someone please file a JIRA to track this? https://issues.apache.org/jira/browse/SPARK On Tue, Oct 6, 2015 at 1:21 AM, Koert Kuipers wrote: > i ran into the same thing in scala api. we depend heavily on comma > separated paths, and it no longer works. > > > On Tue, Oct 6, 2015 at 3:02 AM, B

Re: Pyspark dataframe read

2015-10-06 Thread Koert Kuipers

i ran into the same thing in scala api. we depend heavily on comma separated paths, and it no longer works. On Tue, Oct 6, 2015 at 3:02 AM, Blaž Šnuderl wrote: > Hello everyone. > > It seems pyspark dataframe read is broken for reading multiple files. > > sql.read.json( "file1,file2") fails wit

Pyspark dataframe read

2015-10-06 Thread Blaž Šnuderl

Hello everyone. It seems pyspark dataframe read is broken for reading multiple files. sql.read.json( "file1,file2") fails with java.io.IOException: No input paths specified in job. This used to work in spark 1.4 and also still work with sc.textFile Blaž

What is the difference between ml.classification.LogisticRegression and mllib.classification.LogisticRegressionWithLBFGS

Re: failure notice

Re: multiple count distinct in SQL/DataFrame?

multiple count distinct in SQL/DataFrame?

Re: Adding Spark Testing functionality

Re: Adding Spark Testing functionality

Adding Spark Testing functionality

Re: CQs on WindowedStream created on running StreamingContext

Re: SparkR dataframe UDF

Re: failure notice

Re: StructType has more rows, than corresponding Row has objects.

Re: Pyspark dataframe read

Re: Pyspark dataframe read

Re: Pyspark dataframe read

Re: Pyspark dataframe read

Pyspark dataframe read

16 matches

Site Navigation

Mail list logo

Footer information