Hi all, Thanks for the suggestion Gopal. It turns out the error occurs on both "SELECT *" and "SELECT col" queries. The only sort of query that seems safe are those with aggregations or other things that cause them to be run as mr tasks (e.g. "SELECT SUM(price_f) FROM my_external_table"). Logging out the column names as you suggested doesn't turn up anything unexpected either.
I've tracked the unexpected 'NULL' values down to an early exit from my SerDe's deserialize() method. The first thing deserialize() does is make sure that the received Writable can be cast to the particular type it expects (LWDocumentWritable). In my case, this instanceof check is failing. The method returns 'null', which gets displayed as NULL in HiveCLI. (code pointer here: https://github.com/lucidworks/hive-solr/blob/master/solr-hive-core/src/main/java/com/lucidworks/hadoop/hive/LWSerDe.java#L55) Curious about what other Writable I could've been receiving, I logged out the class details. The name of the class matches the class I'm expected (and checking for with 'instanceof'). Some more logging showed that the class definitions were identical, but that the classes came from different UDFClassLoader's, and were thus being treated as different classes! I thought (partially from the UDFClassLoader itself), that each Hive session had access to one (and only one) UDFClassLoader. But whatever passes the Writable to my Serde's deserialize() passes a class object loaded by a distinct UDFClassLoader, which my SerDe then can't recognize. (I drew this conclusion from some logging shared here: https://pastebin.com/TwV0HPBA) Is it a bug that my SerDe receives input from a different class loader? Or am I misunderstanding the lifecycle/purpose of UDFClassLoader instances? Is there a more robust way to cast Writable instances in a custom SerDe implementation? Thanks in advance for any clarification you can give. Best, Jason On Mon, Sep 10, 2018 at 10:37 PM Gopal Vijayaraghavan <gop...@apache.org> wrote: > > > query the external table using HiveCLI (e.g. SELECT * FROM > > my_external_table), HiveCLI prints out a table with the correct > > If the error is always on a "select *", then the issue might be the SerDe's > handling of included columns. > > Check what you get for > > colNames = > Arrays.asList(tblProperties.getProperty(serdeConstants.LIST_COLUMNS).split(",")); > > Or to confirm it, try doing "Select col from table" instead of "*". > > Cheers, > Gopal > >