Re: Why Filter return a DataFrame object in DataFrame.scala?

2015-09-22 Thread qiuhai
Thank you very much -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Why-Filter-return-a-DataFrame-object-in-DataFrame-scala-tp14295p14298.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. -

Re: Why Filter return a DataFrame object in DataFrame.scala?

2015-09-22 Thread Reynold Xin
There is an implicit conversion in scope https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L153 /** * An implicit conversion function internal to this class for us to avoid doing * "new DataFrame(...)" everywhere. */ @inline pri

Re: Why Filter return a DataFrame object in DataFrame.scala?

2015-09-22 Thread Reynold Xin
There is an implicit conversion in scope https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L153 /** * An implicit conversion function internal to this class for us to avoid doing * "new DataFrame(...)" everywhere. */ @inline pri

Why Filter return a DataFrame object in DataFrame.scala?

2015-09-22 Thread qiuhai
Hi, Recently,I am reading source code(1.5 version) about sparksql .  In DataFrame.scala, there is a funtion named filter in the 737 row *def filter(condition: Column): DataFrame = Filter(condition.expr, logicalPlan)* The fucntion return a Filter object,but it require a DataFrame

Fwd: Parallel collection in driver programs

2015-09-22 Thread Andy Huang
Hi Devs, Hopefully one of you know more on this? Thanks Andy -- Forwarded message -- From: Andy Huang Date: Wed, Sep 23, 2015 at 12:39 PM Subject: Parallel collection in driver programs To: u...@spark.apache.org Hi All, Would like know if anyone has experienced with parallel

Re: SparkR package path

2015-09-22 Thread Shivaram Venkataraman
As Rui says it would be good to understand the use case we want to support (supporting CRAN installs could be one for example). I don't think it should be very hard to do as the RBackend itself doesn't use the R source files. The RRDD does use it and the value comes from https://github.com/apache/s

Re: column identifiers in Spark SQL

2015-09-22 Thread Richard Hillegas
Thanks for that additional tip, Michael. Backticks fix the problem query in which an identifier was transformed into a string literal. So this works now... // now correctly resolves the unnormalized column id sqlContext.sql("""select `b` from test_data""").show Any suggestion about how to es

Re: Derby version in Spark

2015-09-22 Thread Richard Hillegas
Thanks, Ted. I'll follow up with the Hive folks. Cheers, -Rick Ted Yu wrote on 09/22/2015 03:41:12 PM: > From: Ted Yu > To: Richard Hillegas/San Francisco/IBM@IBMUS > Cc: Dev > Date: 09/22/2015 03:41 PM > Subject: Re: Derby version in Spark > > I cloned Hive 1.2 code base and saw: > >     10

Re: Derby version in Spark

2015-09-22 Thread Ted Yu
I cloned Hive 1.2 code base and saw: 10.10.2.0 So the version used by Spark is quite close to what Hive uses. On Tue, Sep 22, 2015 at 3:29 PM, Ted Yu wrote: > I see. > I use maven to build so I observe different contents under lib_managed > directory. > > Here is snippet of dependency tree

Re: Derby version in Spark

2015-09-22 Thread Ted Yu
I see. I use maven to build so I observe different contents under lib_managed directory. Here is snippet of dependency tree: [INFO] | +- org.spark-project.hive:hive-metastore:jar:1.2.1.spark:compile [INFO] | | +- com.jolbox:bonecp:jar:0.8.0.RELEASE:compile [INFO] | | +- org.apache.derby:derb

Re: Derby version in Spark

2015-09-22 Thread Richard Hillegas
Thanks, Ted. I'm working on my master branch. The lib_managed/jars directory has a lot of jarballs, including hadoop and hive. Maybe these were faulted in when I built with the following command? sbt/sbt -Phive assembly/assembly The Derby jars seem to be used in order to manage the metastore_d

Re: RDD: Execution and Scheduling

2015-09-22 Thread gsvic
I already have but I needed some clarifications. Thanks for all your help! -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-Execution-and-Scheduling-tp14177p14286.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. -

Re: Derby version in Spark

2015-09-22 Thread Ted Yu
Which Spark release are you building ? For master branch, I get the following: lib_managed/jars/datanucleus-api-jdo-3.2.6.jar lib_managed/jars/datanucleus-core-3.2.10.jar lib_managed/jars/datanucleus-rdbms-3.2.9.jar FYI On Tue, Sep 22, 2015 at 1:28 PM, Richard Hillegas wrote: > I see that l

Derby version in Spark

2015-09-22 Thread Richard Hillegas
I see that lib_managed/jars holds these old Derby versions: lib_managed/jars/derby-10.10.1.1.jar lib_managed/jars/derby-10.10.2.0.jar The Derby 10.10 release family supports some ancient JVMs: Java SE 5 and Java ME CDC/Foundation Profile 1.1. It's hard to imagine anyone running Spark on the

Re: column identifiers in Spark SQL

2015-09-22 Thread Michael Armbrust
HiveQL uses `backticks` for quoted identifiers. On Tue, Sep 22, 2015 at 1:06 PM, Richard Hillegas wrote: > Thanks for that tip, Michael. I think that my sqlContext was a raw > SQLContext originally. I have rebuilt Spark like so... > > sbt/sbt -Phive assembly/assembly > > Now I see that my sqlC

Re: column identifiers in Spark SQL

2015-09-22 Thread Richard Hillegas
Thanks for that tip, Michael. I think that my sqlContext was a raw SQLContext originally. I have rebuilt Spark like so... sbt/sbt -Phive assembly/assembly Now I see that my sqlContext is a HiveContext. That fixes one of the queries. Now unnormalized column names work: // ...unnormalized col

Re: column identifiers in Spark SQL

2015-09-22 Thread Michael Armbrust
Are you using a SQLContext or a HiveContext? The programming guide suggests the latter, as the former is really only there because some applications may have conflicts with Hive dependencies. SQLContext is case sensitive by default where as the HiveContext is not. The parser in HiveContext is al

column identifiers in Spark SQL

2015-09-22 Thread Richard Hillegas
I am puzzled by the behavior of column identifiers in Spark SQL. I don't find any guidance in the "Spark SQL and DataFrame Guide" at http://spark.apache.org/docs/latest/sql-programming-guide.html. I am seeing odd behavior related to case-sensitivity and to delimited (quoted) identifiers. Conside

Re: Open Issues for Contributors

2015-09-22 Thread Pedro Rodriguez
Thanks for the links (first one is broken or private). I think the main mistake I was making was looking at fix version instead of target version (JIRA homepage with listings of versions links to fix versions). For anyone else interested in MLlib things, I am looking at this to see what goals are

Re: Open Issues for Contributors

2015-09-22 Thread Luciano Resende
You can use Jira filters to narrow down the scope of issues you want to possible address, for instance, I use this filter to look into open issues, that are unassigned : https://issues.apache.org/jira/issues/?filter=12333428 For a specific release, you can also filter the release, and I Reynold h

Re: JENKINS: downtime next week, wed and thurs mornings (9-23 and 9-24)

2015-09-22 Thread shane knapp
ok, here's the updated downtime schedule for this week: wednesday, sept 23rd: firewall maintenance cancelled, as jon took care of the update saturday morning while we were bringing jenkins back up after the colo fire thursday, sept 24th: jenkins maintenance is still scheduled, but abbreviated a

Open Issues for Contributors

2015-09-22 Thread Pedro Rodriguez
Where is the best place to look at open issues that haven't been assigned/started for the next release? I am interested in working on something, but I don't know what issues are higher priority for the next release. On a similar note, is there somewhere which outlines the overall goals for the nex

RowMatrix tallSkinnyQR - ERROR: Second call to constructor of static parser

2015-09-22 Thread Saif.A.Ellafi
Hi all, wondering if any could make the new 1.5.0 stallSkinnyQR to work. Follows my output, which is a big loop of the same errors until the shell dies. I am curious since im failing to load any implementations from BLAS, LAPACK, etc. scala> mat.tallSkinnyQR(false) 15/09/22 10:18:11 WARN LAPACK:

Re: Why there is no snapshots for 1.5 branch?

2015-09-22 Thread Bin Wang
Thanks. I've solved it. I modified pom.xml and add my own repo into it, then use "mvn deploy". Fengdong Yu 于2015年9月22日周二 下午2:08写道: > basically, you can build snapshot by yourself. > > just clone the source code, and then 'mvn package/deploy/install…..’ > > > Azuryy Yu > > > > On Sep 22, 2015, at