Re: Reading JSON in Pyspark throws scala.MatchError

2015-10-20 Thread Balaji Vijayan
You are correct, that was the issue. On Tue, Oct 20, 2015 at 10:18 PM, Jeff Zhang wrote: > BTW, I think Json Parser should verify the json format at least when > inferring the schema of json. > > On Wed, Oct 21, 2015 at 12:59 PM, Jeff Zhang wrote: > >> I think this is due to the json file forma

Re: Reading JSON in Pyspark throws scala.MatchError

2015-10-20 Thread Jeff Zhang
BTW, I think Json Parser should verify the json format at least when inferring the schema of json. On Wed, Oct 21, 2015 at 12:59 PM, Jeff Zhang wrote: > I think this is due to the json file format. DataFrame can only accept > json file with one valid record per line. Multiple line per record i

Re: Reading JSON in Pyspark throws scala.MatchError

2015-10-20 Thread Jeff Zhang
I think this is due to the json file format. DataFrame can only accept json file with one valid record per line. Multiple line per record is invalid for DataFrame. On Tue, Oct 6, 2015 at 2:48 AM, Davies Liu wrote: > Could you create a JIRA to track this bug? > > On Fri, Oct 2, 2015 at 1:42 PM

Re: Reading JSON in Pyspark throws scala.MatchError

2015-10-05 Thread Davies Liu
Could you create a JIRA to track this bug? On Fri, Oct 2, 2015 at 1:42 PM, balajikvijayan wrote: > Running Windows 8.1, Python 2.7.x, Scala 2.10.5, Spark 1.4.1. > > I'm trying to read in a large quantity of json data in a couple of files and > I receive a scala.MatchError when I do so. Json, Pyth

Re: Reading JSON in Pyspark throws scala.MatchError

2015-10-02 Thread Ted Yu
I got the following when parsing your input with master branch (Python version 2.6.6): http://pastebin.com/1w8WM3tz FYI On Fri, Oct 2, 2015 at 1:42 PM, balajikvijayan wrote: > Running Windows 8.1, Python 2.7.x, Scala 2.10.5, Spark 1.4.1. > > I'm trying to read in a large quantity of json data