[ https://issues.apache.org/jira/browse/HIVE-16351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Doubrovkine updated HIVE-16351: -------------------------------------- Description: Hive is returning broken data that contains CR/LF. {code} CREATE DATABASE positron; CREATE EXTERNAL TABLE positron.articles ( `_id` struct<oid:string>, `channel_id` struct<oid:string>, `exclude_google_news` boolean, `fair_ids` array<map<string,string>>, `hero_section` map<string,string>, `partner_ids` array<map<string,string>>, `description` string, `partner_channel_id` struct<oid:string>, `published` boolean, `published_at` map<string,string>, `slugs` array<string>, `sections` array<map<string,string>>, `thumbnail_image` string, `title` string ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH serdeproperties ('mapping.oid' = '$oid') LOCATION '/user/data/positron/articles'; SELECT a.slugs[SIZE(a.slugs) - 1], a.title, a.thumbnail_image FROM positron.articles as a WHERE a.published = true AND a.hero_section["type"] = "video" AND (a.channel_id IS NOT NULL OR a.partner_channel_id IS NOT NULL) AND a.thumbnail_image IS NOT NULL; {code} Note the NULLs below. {code} astrid-caroline-cole-sneak-peek-realities-by-marc-gumpinger Sneak peek "Realities" by Marc Gumpinger https://artsy-media-uploads.s3.amazonaws.com/bUb1l_4g6cMhcDxEaPYDxw%2Facc_signature.png artsy-editorial-how-art-fairs-expanded-the-contemporary-art-market The Art Market, Explained: The Rise of the Art Fair https://artsy-media-uploads.s3.amazonaws.com/j8GIeamyufubMBgJFNHbFA%2Fartfairsex.jpg nolongercreek Alexandra Kehayoglou x Artsy: NULL No Longer Creek https://d32dm0rphc51dk.cloudfront.net/5oRwy7ysKHohtahIYUTE9Q/larger.jpg NULL kukje-gallery-teaser-trailer-kim-yong-ik Teaser Trailer | Kim Yong-Ik https://artsy-media-uploads.s3.amazonaws.com/mmoZcz0imuUzCavkObKgVQ%2Fkyi+thumbnail.PNG {code} {code} $ hive --version Hive 1.1.0-cdh5.6.0 Subversion file:///data/jenkins/workspace/generic-package-ubuntu64-14-04/CDH5.6.0-Packaging-Hive-2016-01-28_21-19-00/hive-1.1.0+cdh5.6.0+377-1.cdh5.6.0.p0.110~trusty -r Unknown Compiled by jenkins on Thu Jan 28 21:35:50 PST 2016 >From source with checksum b4a8fadbcf1ca36d11d91805d3ec2743 {code} What's very interesting is that I am not able to reproduce this locally with the same data with any version of hive. Any help appreciated. was: Hive is returning broken data that contains CR/LF. {code} CREATE DATABASE positron; CREATE EXTERNAL TABLE positron.articles ( `_id` struct<oid:string>, `channel_id` struct<oid:string>, `exclude_google_news` boolean, `fair_ids` array<map<string,string>>, `hero_section` map<string,string>, `partner_ids` array<map<string,string>>, `description` string, `partner_channel_id` struct<oid:string>, `published` boolean, `published_at` map<string,string>, `slugs` array<string>, `sections` array<map<string,string>>, `thumbnail_image` string, `title` string ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH serdeproperties ('mapping.oid' = '$oid') LOCATION '/user/data/positron/articles'; SELECT a.slugs[SIZE(a.slugs) - 1], a.title, a.thumbnail_image FROM positron.articles as a WHERE a.published = true AND a.hero_section["type"] = "video" AND (a.channel_id IS NOT NULL OR a.partner_channel_id IS NOT NULL) AND a.thumbnail_image IS NOT NULL; {code} {code} astrid-caroline-cole-sneak-peek-realities-by-marc-gumpinger Sneak peek "Realities" by Marc Gumpinger https://artsy-media-uploads.s3.amazonaws.com/bUb1l_4g6cMhcDxEaPYDxw%2Facc_signature.png artsy-editorial-how-art-fairs-expanded-the-contemporary-art-market The Art Market, Explained: The Rise of the Art Fair https://artsy-media-uploads.s3.amazonaws.com/j8GIeamyufubMBgJFNHbFA%2Fartfairsex.jpg nolongercreek Alexandra Kehayoglou x Artsy: NULL No Longer Creek https://d32dm0rphc51dk.cloudfront.net/5oRwy7ysKHohtahIYUTE9Q/larger.jpg NULL kukje-gallery-teaser-trailer-kim-yong-ik Teaser Trailer | Kim Yong-Ik https://artsy-media-uploads.s3.amazonaws.com/mmoZcz0imuUzCavkObKgVQ%2Fkyi+thumbnail.PNG {code} {code} $ hive --version Hive 1.1.0-cdh5.6.0 Subversion file:///data/jenkins/workspace/generic-package-ubuntu64-14-04/CDH5.6.0-Packaging-Hive-2016-01-28_21-19-00/hive-1.1.0+cdh5.6.0+377-1.cdh5.6.0.p0.110~trusty -r Unknown Compiled by jenkins on Thu Jan 28 21:35:50 PST 2016 >From source with checksum b4a8fadbcf1ca36d11d91805d3ec2743 {code} What's very interesting is that I am not able to reproduce this locally with the same data with any version of hive. Any help appreciated. > Hive confused by CR/LFs > ----------------------- > > Key: HIVE-16351 > URL: https://issues.apache.org/jira/browse/HIVE-16351 > Project: Hive > Issue Type: Bug > Components: Hive, Serializers/Deserializers > Affects Versions: 1.2.1 > Environment: Hadoop 2.7.3 > Reporter: Daniel Doubrovkine > > Hive is returning broken data that contains CR/LF. > {code} > CREATE DATABASE positron; > CREATE EXTERNAL TABLE positron.articles ( > `_id` struct<oid:string>, > `channel_id` struct<oid:string>, > `exclude_google_news` boolean, > `fair_ids` array<map<string,string>>, > `hero_section` map<string,string>, > `partner_ids` array<map<string,string>>, > `description` string, > `partner_channel_id` struct<oid:string>, > `published` boolean, > `published_at` map<string,string>, > `slugs` array<string>, > `sections` array<map<string,string>>, > `thumbnail_image` string, > `title` string > ) > ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' > WITH serdeproperties ('mapping.oid' = '$oid') > LOCATION '/user/data/positron/articles'; > SELECT > a.slugs[SIZE(a.slugs) - 1], > a.title, > a.thumbnail_image > FROM > positron.articles as a > WHERE a.published = true > AND a.hero_section["type"] = "video" > AND (a.channel_id IS NOT NULL OR a.partner_channel_id IS NOT NULL) > AND a.thumbnail_image IS NOT NULL; > {code} > Note the NULLs below. > {code} > astrid-caroline-cole-sneak-peek-realities-by-marc-gumpinger Sneak peek > "Realities" by Marc Gumpinger > https://artsy-media-uploads.s3.amazonaws.com/bUb1l_4g6cMhcDxEaPYDxw%2Facc_signature.png > artsy-editorial-how-art-fairs-expanded-the-contemporary-art-market The Art > Market, Explained: The Rise of the Art Fair > https://artsy-media-uploads.s3.amazonaws.com/j8GIeamyufubMBgJFNHbFA%2Fartfairsex.jpg > nolongercreek Alexandra Kehayoglou x Artsy: NULL > No Longer Creek > https://d32dm0rphc51dk.cloudfront.net/5oRwy7ysKHohtahIYUTE9Q/larger.jpg NULL > kukje-gallery-teaser-trailer-kim-yong-ik Teaser Trailer | Kim Yong-Ik > https://artsy-media-uploads.s3.amazonaws.com/mmoZcz0imuUzCavkObKgVQ%2Fkyi+thumbnail.PNG > {code} > {code} > $ hive --version > Hive 1.1.0-cdh5.6.0 > Subversion > file:///data/jenkins/workspace/generic-package-ubuntu64-14-04/CDH5.6.0-Packaging-Hive-2016-01-28_21-19-00/hive-1.1.0+cdh5.6.0+377-1.cdh5.6.0.p0.110~trusty > -r Unknown > Compiled by jenkins on Thu Jan 28 21:35:50 PST 2016 > From source with checksum b4a8fadbcf1ca36d11d91805d3ec2743 > {code} > What's very interesting is that I am not able to reproduce this locally with > the same data with any version of hive. Any help appreciated. -- This message was sent by Atlassian JIRA (v6.3.15#6346)