Running FPGrowth over a JavaPairRDD?

2015-10-29 Thread Fernando Paladini
vaRDD) in the type FPGrowth is not applicable for the arguments (JavaPairRDD>)* *What can I do in order to solve my problem (run FPGrowth over JavaPairRDD)?* I'm available to give you more information, just tell me exactly what you need. Thank you! Fernando Paladini

Re: "Method json([class java.util.HashMap]) does not exist" when reading JSON on PySpark

2015-10-05 Thread Fernando Paladini
among other useless debug information): ​ That's correct for the given JSON input <https://gist.github.com/paladini/27bb5636d91dec79bd56> (gist link above)? How can I test if Spark can understand this DataFrame and make complex manipulations with that? Thank you! Hope you can help me so

Re: "Method json([class java.util.HashMap]) does not exist" when reading JSON on PySpark

2015-10-05 Thread Fernando Paladini
ets >> >> Note that the file that is offered as *a json file* is not a typical >> JSON file. Each line must contain a separate, self-contained valid JSON >> object. As a consequence, a regular multi-line JSON file will most often >> fail. >> >> Thanks >>

Re: "Method json([class java.util.HashMap]) does not exist" when reading JSON

2015-09-29 Thread Fernando Paladini
r the help! 2015-09-29 17:14 GMT-03:00 Fernando Paladini : > Of course, I didn't saw that Gmail was only sending it for you. Sorry :/ > > 2015-09-29 17:13 GMT-03:00 Ted Yu : > >> For further analysis, can you post your most recent question on mailing >> list ? &

"Method json([class java.util.HashMap]) does not exist" when reading JSON

2015-09-29 Thread Fernando Paladini
Hello guys, I'm very new to Spark and I'm having some troubles when reading a JSON to dataframe on PySpark. I'm getting a JSON object from an API response and I would like to store it in Spark as a DataFrame (I've read that DataFrame is better than RDD, that's accurate?). For what I've read

Fwd: "Method json([class java.util.HashMap]) does not exist" when reading JSON on PySpark

2015-09-28 Thread Fernando Paladini
actCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) *What I'm doing wrong? * Check out this gist <https://gist.github.com/paladini/2e2ea913d545a407b842> to see the JSON I'm trying to load. Thanks! Fernando Paladini