RE: Unsupported Catalyst types in Parquet

2014-12-29 Thread Wang, Daoyuan
By adding a flag in SQLContext, I have modified #3822 to include nanoseconds now. Since passing too many flags is ugly, now I need the whole SQLContext, so that we can put more flags there. Thanks, Daoyuan From: Michael Armbrust [mailto:mich...@databricks.com] Sent: Tuesday, December 30, 2014 1

Problems concerning implementing machine learning algorithm from scratch based on Spark

2014-12-29 Thread danqing0703
Hi all, I am trying to use some machine learning algorithms that are not included in the Mllib. Like Mixture Model and LDA(Latent Dirichlet Allocation), and I am using pyspark and Spark SQL. My problem is: I have some scripts that implement these algorithms, but I am not sure which part I shall c

Re: Which committers care about Kafka?

2014-12-29 Thread Cody Koeninger
Assuming you're talking about spark.streaming.receiver.maxRate, I just updated my PR to configure rate limiting based on that setting. So hopefully that's issue 1 sorted. Regarding issue 3, as far as I can tell regarding the odd semantics of stateful or windowed operations in the face of failure,

Re: Help, pyspark.sql.List flatMap results become tuple

2014-12-29 Thread guoxu1231
named tuple degenerate to tuple. *A400.map(lambda i: map(None,i.INTEREST))* === [(u'x', 1), (u'y', 2)] [(u'x', 2), (u'y', 3)] -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Help-pyspark-sql-List-flatMap-results-become-tupl

Help, pyspark.sql.List flatMap results become tuple

2014-12-29 Thread guoxu1231
Hi pyspark guys, I have a json file, and its struct like below: {"NAME":"George", "AGE":35, "ADD_ID":1212, "POSTAL_AREA":1, "TIME_ZONE_ID":1, "INTEREST":[{"INTEREST_NO":1, "INFO":"x"}, {"INTEREST_NO":2, "INFO":"y"}]} {"NAME":"John", "AGE":45, "ADD_ID":1213, "POSTAL_AREA":1, "TIME_ZONE_ID":1, "IN

A question about using insert into in rdd foreach in spark 1.2

2014-12-29 Thread evil
Hi All, I have a problem when I try to use insert into in loop, and this is my code def main(args: Array[String]) { //This is an empty table, schema is (Int,String) sqlContext.parquetFile("Data\\Test\\Parquet\\Temp").registerTempTable("temp") //not empty table, schema is (Int,String

Re: Adding third party jars to classpath used by pyspark

2014-12-29 Thread Jeremy Freeman
Hi Stephen, it should be enough to include > --jars /path/to/file.jar in the command line call to either pyspark or spark-submit, as in > spark-submit --master local --jars /path/to/file.jar myfile.py and you can check the bottom of the Web UI’s “Environment" tab to make sure the jar gets on

Re: Unsupported Catalyst types in Parquet

2014-12-29 Thread Michael Armbrust
Yeah, I saw those. The problem is that #3822 truncates timestamps that include nanoseconds. On Mon, Dec 29, 2014 at 5:14 PM, Alessandro Baretta wrote: > Michael, > > Actually, Adrian Wang already created pull requests for these issues. > > https://github.com/apache/spark/pull/3820 > https://git

RE: Which committers care about Kafka?

2014-12-29 Thread Shao, Saisai
Hi Cody, From my understanding rate control is an optional configuration in Spark Streaming and is disabled by default, so user can reach maximum throughput without any configuration. The reason why rate control is so important in streaming processing is that Spark Streaming and other streamin

Re: Unsupported Catalyst types in Parquet

2014-12-29 Thread Alessandro Baretta
Michael, Actually, Adrian Wang already created pull requests for these issues. https://github.com/apache/spark/pull/3820 https://github.com/apache/spark/pull/3822 What do you think? Alex On Mon, Dec 29, 2014 at 3:07 PM, Michael Armbrust wrote: > I'd love to get both of these in. There is so

Adding third party jars to classpath used by pyspark

2014-12-29 Thread Stephen Boesch
What is the recommended way to do this? We have some native database client libraries for which we are adding pyspark bindings. The pyspark invokes spark-submit. Do we add our libraries to the SPARK_SUBMIT_LIBRARY_PATH ? This issue relates back to an error we have been seeing "Py4jError: Tryin

RE: Build Spark 1.2.0-rc1 encounter exceptions when running HiveContext - Caused by: java.lang.ClassNotFoundException: com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy

2014-12-29 Thread Andrew Lee
Hi Patrick, I manually hardcoded the hive version to 0.13.1a and it works. It turns out that for some reason, 0.13.1 is being picked up instead of the 0.13.1a version from maven. So my solution was:hardcode the hive.version to 0.13.1a in my case since I am building it against hive 0.13 only, so

Re: Spark 1.2.0 build error

2014-12-29 Thread Naveen Madhire
I am getting "The command is too long" error. Is there anything which needs to be done. However for the time being I followed the "sbt" way of buidling spark in IntelliJ. On Mon, Dec 29, 2014 at 3:52 AM, Sean Owen wrote: > It means a test failed but you have not shown the test failure. This wou

Re: Unsupported Catalyst types in Parquet

2014-12-29 Thread Michael Armbrust
I'd love to get both of these in. There is some trickiness that I talk about on the JIRA for timestamps since the SQL timestamp class can support nano seconds and I don't think parquet has a type for this. Other systems (impala) seem to use INT96. It would be great to maybe ask on the parquet ma

Re: Which committers care about Kafka?

2014-12-29 Thread Cody Koeninger
Can you give a little more clarification on exactly what is meant by 1. Data rate control If someone wants to clamp the maximum number of messages per RDD partition in my solution, it would be very straightforward to do so. Regarding the holy grail, I'm pretty certain you can't have end-to-end t

Re: Which committers care about Kafka?

2014-12-29 Thread Tathagata Das
Hey all, Some wrap up thoughts on this thread. Let me first reiterate what Patrick said, that Kafka is super super important as it forms the largest fraction of Spark Streaming user base. So we really want to improve the Kafka + Spark Streaming integration. To this end, some of the things that ne

Re: How to become spark developer in jira?

2014-12-29 Thread Jakub Dubovsky
Hi Matei,   that makes sense. Thanks a lot!   Jakub -- Původní zpráva -- Od: Matei Zaharia Komu: Jakub Dubovsky Datum: 29. 12. 2014 19:31:57 Předmět: Re: How to become spark developer in jira? "Please ask someone else to assign them for now, and just comment on them that you

Re: How to become spark developer in jira?

2014-12-29 Thread Matei Zaharia
Please ask someone else to assign them for now, and just comment on them that you're working on them. Over time if you contribute a bunch we'll add you to that list. The problem is that in the past, people would assign issues to themselves and never actually work on them, making it confusing for

RE: Unsupported Catalyst types in Parquet

2014-12-29 Thread Alessandro Baretta
Daoyuan, Thanks for creating the jiras. I need these features by... last week, so I'd be happy to take care of this myself, if only you or someone more experienced than me in the SparkSQL codebase could provide some guidance. Alex On Dec 29, 2014 12:06 AM, "Wang, Daoyuan" wrote: > Hi Alex, > >

How to become spark developer in jira?

2014-12-29 Thread Jakub Dubovsky
Hi devs,   I'd like to ask what are the procedures/conditions for being assigned a role of a developer on spark jira? My motivation is to be able to assign issues to myself. Only related resource I have found is jira permission scheme [1].   regards   Jakub  [1] https://cwiki.apache.org/confl

Re: Spark 1.2.0 build error

2014-12-29 Thread Sean Owen
It means a test failed but you have not shown the test failure. This would have been logged earlier. You would need to say how you ran tests too. The tests for 1.2.0 pass for me on several common permutations. On Dec 29, 2014 3:22 AM, "Naveen Madhire" wrote: > Hi, > > I am follow the below link f

RE: Unsupported Catalyst types in Parquet

2014-12-29 Thread Wang, Daoyuan
Hi Alex, I'll create JIRA SPARK-4985 for date type support in parquet, and SPARK-4987 for timestamp type support. For decimal type, I think we only support decimals that fits in a long. Thanks, Daoyuan -Original Message- From: Alessandro Baretta [mailto:alexbare...@gmail.com] Sent: Sa