[ 
https://issues.apache.org/jira/browse/NIFI-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819970#comment-15819970
 ] 

ASF GitHub Bot commented on NIFI-2881:
--------------------------------------

Github user ijokarumawak commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1407#discussion_r95716780
  
    --- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GenerateTableFetch.java
 ---
    @@ -115,20 +128,36 @@ public GenerateTableFetch() {
     
         @OnScheduled
         public void setup(final ProcessContext context) {
    +        // The processor is invalid if there is an incoming connection and 
max-value columns are defined
    +        if (context.getProperty(MAX_VALUE_COLUMN_NAMES).isSet() && 
context.hasIncomingConnection()) {
    +            throw new ProcessException("If an incoming connection is 
supplied, no max-value column names may be specified");
    --- End diff --
    
    I understand the concerns.
    
    For backward compatibility, I think we should provide that so that existing 
flow can keep fetching rows based on the stored state even after upgrade. I've 
done the similar thing before with [TailFile 
processor](https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/TailFile.java#L348).
 Checking the state key name format to determine if it's current or older 
format, then migrate the values.
    
    I concern this statement in your previous comment, If we support Max-value 
column with incoming files:
    > and just document that all the specified tables must contain the 
max-value columns.
    
    Max-value column hasn't been required, I guess that is for users who want 
to fetch all rows periodically and don't have to track the max value. Maybe for 
things like master configuration tables. Then I think we need to keep 
supporting empty max-value column.
    
    An example flow I thought that might be useful is, using GenerateFlowFile 
or FetchFile to pass a configuration text such as:
    
    ```
    # Table name : MAX value column(s)
    USERS:LAST_UPDATED
    ITEMS
    PURCHASE_HISTORIES:LAST_UPDATED
    ```
    
    Then pass it to SplitText and ExtractText to generate flow files with 
attributes `tableName` and `maxColumns`. Then pass it to GenerateTableFetch 
processor to generate fetch SQL dynamically. This way, user can easily modify 
which table to fetch.
    
    Maybe after processing these incoming flow files, GenerateTableFetch would 
have state like this (Table `ITEMS` doesn't have max value column):
    
    |KEY|VALUE|
    |----|-------|
    |USERS.LAST_UPDATED|2017.01.12 11:42:00|
    |PURCHASE_HISTORIES.LAST_UPDATED|2017.01.12 11:59:32|
    
    How do you think? Thanks!


> Allow Database Fetch processor(s) to accept incoming flow files and use 
> Expression Language
> -------------------------------------------------------------------------------------------
>
>                 Key: NIFI-2881
>                 URL: https://issues.apache.org/jira/browse/NIFI-2881
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Matt Burgess
>            Assignee: Matt Burgess
>
> The QueryDatabaseTable and GenerateTableFetch processors do not allow 
> Expression Language to be used in the properties, mainly because they also do 
> not allow incoming connections. This means if the user desires to fetch from 
> multiple tables, they currently need one instance of the processor for each 
> table, and those table names must be hard-coded.
> To support the same capabilities for multiple tables and more flexible 
> configuration via Expression Language, these processors should have 
> properties that accept Expression Language, and GenerateTableFetch should 
> accept (optional) incoming connections.
> Conversation about the behavior of the processors is welcomed and encouraged. 
> For example, if an incoming flow file is available, do we also still run the 
> incremental fetch logic for tables that aren't specified by this flow file, 
> or do we just do incremental fetching when the processor is scheduled but 
> there is no incoming flow file. The latter implies a denial-of-service could 
> take place, by flooding the processor with flow files and not letting it do 
> its original job of querying the table, keeping track of maximum values, etc.
> This is likely a breaking change to the processors because of how state 
> management is implemented. Currently since the table name is hard coded, only 
> the column name comprises the key in the state. This would have to be 
> extended to have a compound key that represents table name, max-value column 
> name, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to