Hi
I ve been stuck with this issue for a while and unable to get any help.
I was wondering if anyone can help.
I m trying to load email messages into a messages relation and unable to and i
was wondeirng if anyone may have a sample email dataset which would allow me to
play around with this script:
Following is the code from Agile Data Science book:
/* Load the emails in avro format (edit the path to match where you saved them)
using the AvroStorage UDF from Piggybank */
messages = LOAD '/me/Data/test_mbox' USING AvroStorage();
I have manually downloaded my gmail which ends up being 350MB and then i have
tried loading this file into messages and i got this error message:
*************************************
2014-03-03 01:52:26,294 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
1000: Error during parsing. Encountered " "(" "( "" at line 1, column 84.
Was expecting one of:
"as" ...
"parallel" ...
";" ...
"." ...
"$" ...
*************************************
Details at logfile: /home/cloudera/pig_1393839871002.log
I have then downloaded a sample email dataset and tried to load that one into
the messages relation above
i get the same error.
Then i tried saving the following content from the book in a file and load it
into the relation and i get the same error message:
here is the content:
*************************************
*************************************
Will keep the weeds from taking over.
Russell Jurney datasyndrome.com
----
I have also tried sending an email to russel but no response.
I am wondering if anyone may have a sample email dataset which would load with
the avro so i can try out my next steps.
Any help will b appreciated really.
Please let me know.
Thanks
Sai