[jira] [Updated] (NIFI-3031) HiveQL processor improvements (Multi-Statement Scripts in PutHiveQL, CSV options in SelectHiveQL)

Matt Burgess (JIRA) Wed, 18 Jan 2017 10:02:00 -0800

     [ 
https://issues.apache.org/jira/browse/NIFI-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Matt Burgess updated NIFI-3031:
-------------------------------
    Description: 
Trying to use the PutHiveQL processor to execute a HiveQL script that contains 
multiple statements.

IE: 

USE my_database;

FROM my_database_src.base_table
INSERT OVERWRITE refined_table
SELECT *;

-- or --

use my_database;

create temporary table WORKING as
select a,b,c from RAW;

FROM RAW
INSERT OVERWRITE refined_table
SELECT *;

The current implementation doesn't even like it when you have a semicolon at 
the end of the single statement.

Either use a default delimiter like a semi-colon to mark the boundaries of a 
statement within the file or allow them to define there own.

This enables the building of pipelines that are testable by not embedding 
HiveQL into a product; rather sourcing them from files.  And the scripts can be 
complex.  Each statement should run in a linear manner and be part of the same 
JDBC session to ensure things like "temporary" tables will work.

Also, since SelectHiveQL offers CSV as an output format, an improvement would 
be to include properties (with existing defaults) for things like "Include 
Header in Output", "Alternate CSV Header", "CSV Delimiter", "Quote CSV" and 
"Escape CSV" 

  was:
Trying to use the PutHiveQL processor to execute a HiveQL script that contains 
multiple statements.

IE: 

USE my_database;

FROM my_database_src.base_table
INSERT OVERWRITE refined_table
SELECT *;

-- or --

use my_database;

create temporary table WORKING as
select a,b,c from RAW;

FROM RAW
INSERT OVERWRITE refined_table
SELECT *;

The current implementation doesn't even like it when you have a semicolon at 
the end of the single statement.

Either use a default delimiter like a semi-colon to mark the boundaries of a 
statement within the file or allow them to define there own.

This enables the building of pipelines that are testable by not embedding 
HiveQL into a product; rather sourcing them from files.  And the scripts can be 
complex.  Each statement should run in a linear manner and be part of the same 
JDBC session to ensure things like "temporary" tables will work.


> HiveQL processor improvements (Multi-Statement Scripts in PutHiveQL, CSV 
> options in SelectHiveQL)
> -------------------------------------------------------------------------------------------------
>
>                 Key: NIFI-3031
>                 URL: https://issues.apache.org/jira/browse/NIFI-3031
>             Project: Apache NiFi
>          Issue Type: Improvement
>    Affects Versions: 1.2.0
>            Reporter: Matt Burgess
>            Assignee: Matt Burgess
>             Fix For: 1.2.0
>
>
> Trying to use the PutHiveQL processor to execute a HiveQL script that 
> contains multiple statements.
> IE: 
> USE my_database;
> FROM my_database_src.base_table
> INSERT OVERWRITE refined_table
> SELECT *;
> -- or --
> use my_database;
> create temporary table WORKING as
> select a,b,c from RAW;
> FROM RAW
> INSERT OVERWRITE refined_table
> SELECT *;
> The current implementation doesn't even like it when you have a semicolon at 
> the end of the single statement.
> Either use a default delimiter like a semi-colon to mark the boundaries of a 
> statement within the file or allow them to define there own.
> This enables the building of pipelines that are testable by not embedding 
> HiveQL into a product; rather sourcing them from files.  And the scripts can 
> be complex.  Each statement should run in a linear manner and be part of the 
> same JDBC session to ensure things like "temporary" tables will work.
> Also, since SelectHiveQL offers CSV as an output format, an improvement would 
> be to include properties (with existing defaults) for things like "Include 
> Header in Output", "Alternate CSV Header", "CSV Delimiter", "Quote CSV" and 
> "Escape CSV" 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (NIFI-3031) HiveQL processor improvements (Multi-Statement Scripts in PutHiveQL, CSV options in SelectHiveQL)

Reply via email to