How do I convert json_encoded_blob_column into a data frame? (This may be a feature request)

2016-11-16 Thread kant kodali
https://spark.apache.org/docs/2.0.2/sql-programming-guide.html#json-datasets "Spark SQL can automatically infer the schema of a JSON dataset and load it as a DataFrame. This conversion can be done using SQLContext.read.json() on either an RDD of String, or a JSON file." val df = spark.sql("SELECT

Re: How do I convert json_encoded_blob_column into a data frame? (This may be a feature request)

2016-11-16 Thread Hyukjin Kwon
Maybe it sounds like you are looking for from_json/to_json functions after en/decoding properly. On 16 Nov 2016 6:45 p.m., "kant kodali" wrote: > > > https://spark.apache.org/docs/2.0.2/sql-programming-guide. > html#json-datasets > > "Spark SQL can automatically infer the schema of a JSON datase

Insert data into hive internal tables using hiveContext

2016-11-16 Thread NNigam
Hi, I am getting below error while inserting data into hive internal tables using hivecontext. Is there any way to resolve this issue. I am getting below error while execution: [cid:image002.jpg@01D2402A.C173EDD0] Thanks, Nitisha Nigam ** IMPORTANT--PLEASE READ ***

Re: separate spark and hive

2016-11-16 Thread Ricardo Almeida
Great to know about the "spark.sql.catalogImplementation" configuration property. I can't find this anywhere but in Jacek Laskowski's "Mastering Apache Spark 2.0" Gitbook. I guess we should document on Spark Configuration page On 15 November 2016 at 11:49, Herman van Hövell tot Westerflier < hvan

Re: Handling questions in the mailing lists

2016-11-16 Thread Sean Owen
I updated the wiki to point to the /community.html page. (We're going to migrate the wiki real soon now anyway) I updated the /community.html page per this thread too. PR: https://github.com/apache/spark-website/pull/16 On Tue, Nov 15, 2016 at 2:49 PM assaf.mendelson wrote: Should probably also

Develop custom Estimator / Transformer for pipeline

2016-11-16 Thread Georg Heiler
HI, I want to develop a library with custom Estimator / Transformers for spark. So far not a lot of documentation could be found but http://stackoverflow.com/questions/37270446/how-to-roll-a-custom-estimator-in-pyspark-mllib Suggest that: Generally speaking, there is no documentation because as

Re: Handling questions in the mailing lists

2016-11-16 Thread Denny Lee
Awesome stuff! Thanks Sean! :-) On Wed, Nov 16, 2016 at 05:57 Sean Owen wrote: > I updated the wiki to point to the /community.html page. (We're going to > migrate the wiki real soon now anyway) > > I updated the /community.html page per this thread too. PR: > https://github.com/apache/spark-web

Re: NodeManager heap size with ExternalShuffleService

2016-11-16 Thread Artur Sukhenko
Sure Reynold, Here is pull request - [YARN][DOC] Increasing NodeManager's heap size with External Shuffle Service On Wed, Nov 16, 2016, 04:07 Reynold Xin wrote: Can you submit a pull request to add that to the documentation? On November 15, 2016 at

Re: How do I convert json_encoded_blob_column into a data frame? (This may be a feature request)

2016-11-16 Thread Michael Armbrust
On Wed, Nov 16, 2016 at 2:49 AM, Hyukjin Kwon wrote: > Maybe it sounds like you are looking for from_json/to_json functions after > en/decoding properly. > Which are new built-in functions that will be released with Spark 2.1.

Re: Running lint-java during PR builds?

2016-11-16 Thread Dongjoon Hyun
Hi, Marcelo and Ryan. That was the main purpose of my proposal about Travis.CI. IMO, that is the only way to achieve that without any harmful side-effect on Jenkins infra. Spark is already ready for that. Like AppVoyer, if one of you files an INFRA jira issue to enable that, they will turn on t

Kafka segmentation

2016-11-16 Thread Hoang Bao Thien
Hi all, I would like to ask a question related to the size of Kafka stream. I want to put data (e.g., file *.csv) to Kafka then use Spark streaming to get the output from Kafka and then save to Hive by using SparkSQL. The file csv is about 100MB with ~250K messages/rows (Each row has about 10 fiel

SparkILoop doesn't run

2016-11-16 Thread Mohit Jaggi
I am trying to use SparkILoop to write some tests(shown below) but the test hangs with the following stack trace. Any idea what is going on? import org.apache.log4j.{Level, LogManager} import org.apache.spark.repl.SparkILoop import org.scalatest.{BeforeAndAfterAll, FunSuite} class SparkReplSpec

Re: How do I convert json_encoded_blob_column into a data frame? (This may be a feature request)

2016-11-16 Thread Nathan Lande
I'm looking forward to 2.1 but, in the meantime, you can pull out the specific column into an RDD of JSON objects, pass this RDD into the read.json() and then join the results back onto your initial DF. Here is an example of what we do to unpack headers from Avro log data: def jsonLoad(path):

Multiple streaming aggregations in structured streaming

2016-11-16 Thread wszxyh
Hi Multiple streaming aggregations are not yet supported. When will it be supported? Is it in the plan? Thanks

Re: How do I convert json_encoded_blob_column into a data frame? (This may be a feature request)

2016-11-16 Thread Nathan Lande
If you are dealing with a bunch of different schemas in 1 field, figuring out a strategy to deal with that will depend on your data and does not really have anything to do with spark since mapping your JSON payloads to tractable data structures will depend on business logic. The strategy of pullin

SQL Syntax for pivots

2016-11-16 Thread Niranda Perera
Hi all, I see that the pivot functionality is being added to spark DFs from 1.6 onward. I am interested to see if there is a Spark SQL syntax available for pivoting? example: Slide 11 of [1] *pandas (Python) - pivot_table(df, values='D', index=['A', 'B'], columns=['C'], aggfunc=np.sum) * *resha

Re: SQL Syntax for pivots

2016-11-16 Thread Reynold Xin
Not right now. On Wed, Nov 16, 2016 at 10:44 PM, Niranda Perera wrote: > Hi all, > > I see that the pivot functionality is being added to spark DFs from 1.6 > onward. > > I am interested to see if there is a Spark SQL syntax available for > pivoting? example: Slide 11 of [1] > > *pandas (Python

issues with github pull request notification emails missing

2016-11-16 Thread Reynold Xin
I've noticed that a lot of github pull request notifications no longer come to my inbox. In the past I'd get an email for every reply to a pull request that I subscribed to (i.e. commented on). Lately I noticed for a lot of them I didn't get any emails, but if I opened the pull requests directly on

Re: issues with github pull request notification emails missing

2016-11-16 Thread Holden Karau
+1 it seems like I'm missing a number of my GitHub email notifications lately (although since I run my own mail server and forward I've been assuming it's my own fault). I've also had issues with having greatly delayed notifications on some of my own pull requests but that might be unrelated. On