Hi, Please see the logs are given below:
hive> SELECT t.retweeted_screen_name, > Sum(retweets) AS total_retweets, > Count(*) AS tweet_count > FROM (SELECT retweeted_status.user.screen_name AS retweeted_screen_name, > retweeted_status.text, > Max(retweet_count) AS retweets > FROM tweets > GROUP BY retweeted_status.user.screen_name, > retweeted_status.text) t > GROUP BY t.retweeted_screen_name > ORDER BY total_retweets DESC > LIMIT 1; Query ID = joe_20151022143018_f680c6fd-5d6d-4d5e-8d20-df25396a84d5 Total jobs = 2 Launching Job 1 out of 2 Number of reduce tasks not specified. Estimated from input data size: 2 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1445537142761_0002, Tracking URL = http://localhost:8088/proxy/application_1445537142761_0002/ Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1445537142761_0002 Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 22015-10-22 14:30:51,478 Stage-1 map = 0%, reduce = 0%2015-10-22 14:39:14,950 Stage-1 map = 69%, reduce = 17%, Cumulative CPU 62.09 sec2015-10-22 14:39:17,556 Stage-1 map = 70%, reduce = 17%, Cumulative CPU 63.9 sec2015-10-22 14:39:20,209 Stage-1 map = 71%, reduce = 17%, Cumulative CPU 65.86 sec2015-10-22 14:39:25,098 Stage-1 map = 72%, reduce = 17%, Cumulative CPU 67.68 sec2015-10-22 14:39:26,126 Stage-1 map = 74%, reduce = 17%, Cumulative CPU 69.33 sec2015-10-22 14:39:29,943 Stage-1 map = 75%, reduce = 17%, Cumulative CPU 71.09 sec2015-10-22 14:39:34,993 Stage-1 map = 77%, reduce = 17%, Cumulative CPU 74.86 sec2015-10-22 14:39:43,505 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 25.47 sec MapReduce Total cumulative CPU time: 25 seconds 470 msec Ended Job = job_1445537142761_0002 with errors Error during job, obtaining debugging information... Examining task ID: task_1445537142761_0002_m_000001 (and more) from job job_1445537142761_0002 Task with the most failures(1): ----- Task ID: task_1445537142761_0002_m_000000 URL: http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1445537142761_0002&tipid=task_1445537142761_0002_m_000000 ----- Diagnostic Messages for this Task:Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"filter_level":"low","retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":true,"truncated":false,"lang":"it","in_reply_to_status_id_str":null,"id":654395624406675456,"in_reply_to_user_id_str":null,"timestamp_ms":"1444855049598","in_reply_to_status_id":null,"created_at":"Wed Oct 14 20:37:29 +0000 2015","favorite_count":0,"place":null,"coordinates":null,"text":"Samaritani: \"E-fattura strumento chiave per la business intelligence nella PA\" https://t.co/CIckEdB9EG","contributors":null,"geo":null,"entities":{"symbols":[],"urls":[{"expanded_url":"https://lnkd.in/eDRb_sv","indices":[79,102],"display_url":"lnkd.in/eDRb_sv","url":"https://t.co/CIckEdB9EG"}],"hashtags":[],"user_mentions":[]},"is_quote_status":false,"source":"<a href=\"http://www.linkedin.com/\" rel=\"nofollow\">LinkedIn<\/a>","favorited":false,"in_reply_to_user_id":null,"retweet_count":0,"id_str":"654395624406675456","user":{"location":"pisa","default_profile":true,"profile_background_tile":false,"statuses_count":2924,"lang":"it","profile_link_color":"0084B4","profile_banner_url":"https://pbs.twimg.com/profile_banners/145360070/1422279238","id":145360070,"following":null,"protected":false,"favourites_count":660,"profile_text_color":"333333","verified":false,"description":null,"contributors_enabled":false,"profile_sidebar_border_color":"C0DEED","name":"marco andreozzi","profile_background_color":"C0DEED","created_at":"Tue May 18 19:49:58 +0000 2010","default_profile_image":false,"followers_count":178,"profile_image_url_https":"https://pbs.twimg.com/profile_images/643826897231724544/odFpg1zd_normal.jpg","geo_enabled":true,"profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","follow_request_sent":null,"url":null,"utc_off at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"filter_level":"low","retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":true,"truncated":false,"lang":"it","in_reply_to_status_id_str":null,"id":654395624406675456,"in_reply_to_user_id_str":null,"timestamp_ms":"1444855049598","in_reply_to_status_id":null,"created_at":"Wed Oct 14 20:37:29 +0000 2015","favorite_count":0,"place":null,"coordinates":null,"text":"Samaritani: \"E-fattura strumento chiave per la business intelligence nella PA\" https://t.co/CIckEdB9EG","contributors":null,"geo":null,"entities":{"symbols":[],"urls":[{"expanded_url":"https://lnkd.in/eDRb_sv","indices":[79,102],"display_url":"lnkd.in/eDRb_sv","url":"https://t.co/CIckEdB9EG"}],"hashtags":[],"user_mentions":[]},"is_quote_status":false,"source":"<a href=\"http://www.linkedin.com/\" rel=\"nofollow\">LinkedIn<\/a>","favorited":false,"in_reply_to_user_id":null,"retweet_count":0,"id_str":"654395624406675456","user":{"location":"pisa","default_profile":true,"profile_background_tile":false,"statuses_count":2924,"lang":"it","profile_link_color":"0084B4","profile_banner_url":"https://pbs.twimg.com/profile_banners/145360070/1422279238","id":145360070,"following":null,"protected":false,"favourites_count":660,"profile_text_color":"333333","verified":false,"description":null,"contributors_enabled":false,"profile_sidebar_border_color":"C0DEED","name":"marco andreozzi","profile_background_color":"C0DEED","created_at":"Tue May 18 19:49:58 +0000 2010","default_profile_image":false,"followers_count":178,"profile_image_url_https":"https://pbs.twimg.com/profile_images/643826897231724544/odFpg1zd_normal.jpg","geo_enabled":true,"profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","follow_request_sent":null,"url":null,"utc_off at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:516) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163) ... 8 moreCaused by: org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: was expecting closing '"' for name at [Source: java.io.StringReader@14305ac; line: 1, column: 3683] at com.cloudera.hive.serde.JSONSerDe.deserialize(JSONSerDe.java:128) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:141) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:105) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507) ... 9 more Caused by: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: was expecting closing '"' for name at [Source: java.io.StringReader@14305ac; line: 1, column: 3683] at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433) at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521) at org.codehaus.jackson.impl.JsonParserMinimalBase._reportInvalidEOF(JsonParserMinimalBase.java:454) at org.codehaus.jackson.impl.ReaderBasedParser._parseFieldName2(ReaderBasedParser.java:1025) at org.codehaus.jackson.impl.ReaderBasedParser._parseFieldName(ReaderBasedParser.java:1008) at org.codehaus.jackson.impl.ReaderBasedParser.nextToken(ReaderBasedParser.java:418) at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.mapObject(UntypedObjectDeserializer.java:219) at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:47) at org.codehaus.jackson.map.deser.std.MapDeserializer._readAndBind(MapDeserializer.java:319) at org.codehaus.jackson.map.deser.std.MapDeserializer.deserialize(MapDeserializer.java:249) at org.codehaus.jackson.map.deser.std.MapDeserializer.deserialize(MapDeserializer.java:33) at org.codehaus.jackson.map.ObjectMapper._readMapAndClose(ObjectMapper.java:2732) at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1863) at com.cloudera.hive.serde.JSONSerDe.deserialize(JSONSerDe.java:126) ... 12 more Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 java.net.ConnectException: Call From joe-virtual-machine/127.0.0.1 to 0.0.0.0:10020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefusedFAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: Map: 2 Reduce: 2 Cumulative CPU: 25.47 sec HDFS Read: 50988492 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 25 seconds 470 msec hive> Thanks, Joel On Thu, Oct 22, 2015 at 6:02 PM, Sam Joe <games2013....@gmail.com> wrote: > Hi, > > After streaming twitter data to HDFS using Flume, I'm trying to analyze it > using some HIVE queries. The data is in JSON format and not clean having > double quotes (") in wrong places causing the HIVE queries to fail. I am > getting the following error: > > Failed with exception > java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: > org.codehaus.jackson.JsonParseException: Unexpected end-of-input: was > expecting closing '"' for name > > The script used for creating the external table: > > ADD JAR > /usr/local/hive/apache-hive-1.2.1-bin/lib/hive-serdes-1.0-SNAPSHOT.jar;set > hive.support.sql11.reserved.keywords = false; > CREATE EXTERNAL TABLE tweets ( > id BIGINT, > created_at STRING, > source STRING, > favorited BOOLEAN, > retweet_count INT, > retweeted_status STRUCT< > text:STRING, > user:STRUCT<screen_name:STRING,name:STRING>>, > entities STRUCT< > urls:ARRAY<STRUCT<expanded_url:STRING>>, > user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>, > hashtags:ARRAY<STRUCT<text:STRING>>>, > text STRING, > user STRUCT< > screen_name:STRING, > name:STRING, > friends_count:INT, > followers_count:INT, > statuses_count:INT, > verified:BOOLEAN, > utc_offset:INT, > time_zone:STRING>, > in_reply_to_screen_name STRING) > ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe' > LOCATION '/usr/local/hadoop/bin/tweets'; > > Since I would not know for which row the extra double quotes is present, I > can't put an escape character. How can I escape the junk characters and > process the data successfully? > > Appreciate any help. > > Thanks, > > Joel >