[ https://issues.apache.org/jira/browse/HIVE-16667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16017812#comment-16017812 ]
Naveen Gangam commented on HIVE-16667: -------------------------------------- I have tried a couple of means to force PosgresSQL to store the values inline instead of TOASTed values via OIDs. According to their documentation, setting storage to {{PLAIN || MAIN}} should force the DB to store them in-line. {code} The TOAST code recognizes four different strategies for storing TOAST-able columns: PLAIN prevents either compression or out-of-line storage; furthermore it disables use of single-byte headers for varlena types. This is the only possible strategy for columns of non-TOAST-able data types. EXTENDED allows both compression and out-of-line storage. This is the default for most TOAST-able data types. Compression will be attempted first, then out-of-line storage if the row is still too big. EXTERNAL allows out-of-line storage but not compression. Use of EXTERNAL will make substring operations on wide text and bytea columns faster (at the penalty of increased storage space) because these operations are optimized to fetch only the required parts of the out-of-line value when it is not compressed. MAIN allows compression but not out-of-line storage. (Actually, out-of-line storage will still be performed for such columns, but only as a last resort when there is no other way to make the row small enough.) {code} Even with either of the setting, I cannot get it to store these values in-line. I am still researching .. based on a hint in a usergroups, I found this {code} select * from "COLUMNS_V2"; CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX -------+---------+-------------+-----------+------------- 1 | default | key | 27118 | 0 1 | default | value | 27119 | 1 (2 rows) select "CD_ID", "COMMENT", "COLUMN_NAME", convert_from(loread(lo_open("TYPE_NAME"::int, x'40000'::int), x'40000'::int), 'UTF8') as "TYPE_NAME" from "COLUMNS_V2" where "CD_ID" in (1) and "INTEGER_IDX" >= 0 order by "CD_ID" asc, "INTEGER_IDX" asc; CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME -------+---------+-------------+----------- 1 | default | key | string 1 | default | value | string {code} While the above conversion works fine in native client, the same does not work via hive/JDO. However, the overall request to getPartitions succeeds because it falls back to using datanucleus when DirectSQL fails. {{2017-05-19T11:24:46,604 WARN [pool-7-thread-2] metastore.MetaStoreDirectSql: Getting partitions:query=select "CD_ID", "COMMENT", "COLUMN_NAME", convert_from(loread(lo_open("TYPE_NAME"::int, x'40000'::int), x'40000'::int), 'UTF8') as "TYPE_NAME" from "COLUMNS_V2" where "CD_ID" in (1) and "INTEGER_IDX" >= 0 order by "CD_ID" asc, "INTEGER_IDX" asc 2017-05-19T11:24:46,605 WARN [pool-7-thread-2] metastore.ObjectStore: Falling back to ORM path due to direct SQL failure (this is not an error): SQL query "select "CD_ID", "COMMENT", "COLUMN_NAME", convert_from(loread(lo_open("TYPE_NAME"::int, x'40000'::int), x'40000'::int), 'UTF8') as "TYPE_NAME" from "COLUMNS_V2" where "CD_ID" in (1) and "INTEGER_IDX" >= 0 order by "CD_ID" asc, "INTEGER_IDX" asc" requires 3 parameters yet none have been supplied at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:636) at }} > PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and > other field is incorrect > ------------------------------------------------------------------------------------------------- > > Key: HIVE-16667 > URL: https://issues.apache.org/jira/browse/HIVE-16667 > Project: Hive > Issue Type: Bug > Reporter: Remus Rusanu > Assignee: Naveen Gangam > Attachments: HiveCLIOutput.txt, PostgresDBOutput.txt > > > The CLOB JDO type introduced with HIVE-12274 does not work correctly with > PostgreSQL. The value is written out-of-band and the LOB handle is written,as > an INT, into the table. SELECTs return the INT value, which should had been > read via the {{lo_get}} PG built-in, and then cast into string. > Furthermore, the behavior is different between fields upgraded from earlier > metastore versions (they retain their string storage) vs. values inserted > after the upgrade (inserted as LOB roots). > Teh code in > {{MetasoreDirectSql.getPartitionsFromPartitionIds/extractSqlClob}} expects > the underlying JDO/Datanucleus to map the column to a {{Clob}} but that does > not happen, the value is a Java String containing the int which is the LOB > root saved by PG. > This manifests at runtime with errors like: > {code} > hive> select * from srcpart; > Failed with exception java.io.IOException:java.lang.IllegalArgumentException: > Error: type expected at the position 0 of '24030:24031' but '24030' is found. > {code} > the 24030:24031 should be 'string:string'. > repro: > {code} > CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT > 'default') PARTITIONED BY (ds STRING, hr STRING) STORED AS TEXTFILE; > LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" OVERWRITE INTO > TABLE srcpart PARTITION (ds="2008-04-09", hr="11"); > select * from srcpart; > {code} > I did not see the issue being hit by non-partitioned/textfile tables, but > that is just the luck of the path taken by the code. Inspection of my PG > metastore shows all the CLOB fields suffering from this issue. -- This message was sent by Atlassian JIRA (v6.3.15#6346)