Hyukjin,
This is what I did so far. I didn’t use DataSet yet or maybe I don’t need to.
var df: DataFrame = null
for(message <- messages) {
val bodyRdd = sc.parallelize(message.getBody() :: Nil)
val fileDf = sqlContext.read.json(bodyRdd)
.select(
$"Records.s3.bucket.nam
Hi!
Personally, I don't think it necessarily needs to be DataSet for your goal.
Just select your data at "s3" from DataFrame loaded by
sqlContext.read.json().
You can try to printSchema() to check the nested schema and then select the
data.
Also, I guess (from your codes) you are trying to send
Holden,
If I were to use DataSets, then I would essentially do this:
val receiveMessageRequest = new ReceiveMessageRequest(myQueueUrl)
val messages = sqs.receiveMessage(receiveMessageRequest).getMessages()
for (message <- messages.asScala) {
val files = sqlContext.read.json(message.getBody())
You could certainly use RDDs for that, you might also find using Dataset
selecting the fields you need to construct the URL to fetch and then using
the map function to be easier.
On Thu, Apr 14, 2016 at 12:01 PM, Benjamin Kim wrote:
> I was wonder what would be the best way to use JSON in Spark/
I was wonder what would be the best way to use JSON in Spark/Scala. I need to
lookup values of fields in a collection of records to form a URL and download
that file at that location. I was thinking an RDD would be perfect for this. I
just want to hear from others who might have more experience