[
https://issues.apache.org/jira/browse/NIFI-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17947923#comment-17947923
]
Daniel Stieglitz edited comment on NIFI-14162 at 4/28/25 7:51 PM:
------------------------------------------------------------------
[~exceptionfactory] When taking a closer look at this issue I noticed this is
not a Calcite issue but I believe it is our issue. When using the "Infer
Schema" schema access strategy I noticed there was a difference between where
there was one record and when there was two
For the single record the schema (when calling toString on the RecordSchema
object) is
{code:java}
["ArticleCode" : "STRING", "ProductCode" : "STRING", "ArticleName" : "STRING",
"ProductName" : "STRING", "Country" : "STRING"]{code}
For the Two records the schema is
{code:java}
["ArticleCode" : "STRING", "ArticleName" : "STRING", "ProductCode" : "STRING",
"ProductName" : "STRING", "Country" : "STRING"]{code}
Note how the ordering between the two schemes are different i.e. the second
column whether its ProductCode or ArticleName. The difference in the ordering
is the difference in what values are being retrieved. In the class
RecordDataSource on lines 79 and 89 the row returned from the query is the
record retrieved from the JsonTreeReader in the exact order the record is
defined in the schema. These column values are not mapped to the query columns.
Hence with the first record the incorrect value is retrieved as the query
columns are not aligned to the schema columns while in the second case all the
values are correct as the query columns are aligned to the schema
Another way of highlighting the issue being seen which I find more troublesome
is if the query for example has columns in reverse order of the schema
{code:java}
SELECT Country, ProductName, ProductCode, ArticleName, ArticleCode FROM
FLOWFILE {code}
so the query results from the same two ingested records above are
{code:java}
[ {
"Country" : "12345",
"ProductName" : "Credit Card",
"ProductCode" : "10101",
"ArticleName" : "Porduct Credit",
"ArticleCode" : "RO"
}, {
"Country" : "12346",
"ProductName" : "Business Card",
"ProductCode" : "10102",
"ArticleName" : "Society Credit",
"ArticleCode" : "RO"
} ]{code}
Hence I see from here, any attempt to obtain all the columns in a query if its
not exactly the same order as the defined schema will get incorrect values.
was (Author: JIRAUSER294662):
[~exceptionfactory] When taking a closer look at this issue I noticed this is
not a Calcite issue but I believe it is our issue. When using the "Infer
Schema" schema access strategy I noticed there was a difference between where
there was one record and when there was two
For the single record the schema (when calling toString on the RecordSchema
object) is
{code:java}
["ArticleCode" : "STRING", "ProductCode" : "STRING", "ArticleName" : "STRING",
"ProductName" : "STRING", "Country" : "STRING"]{code}
For the Two records the schema is
{code:java}
["ArticleCode" : "STRING", "ArticleName" : "STRING", "ProductCode" : "STRING",
"ProductName" : "STRING", "Country" : "STRING"]{code}
Note how the ordering between the two schemes are different i.e. the second
column whether its ProductCode or ArticleName. The difference in the ordering
is the difference in what values are being retrieved. In the class
RecordDataSource on lines 79 and 89 the row returned from the query is the
record retrieved from the JsonTreeReader in the exact order the record is
defined in the schema. These column values are not mapped to the query columns.
Hence with the first record the incorrect value is retrieved as the query
columns are not aligned to the schema columns while in the second case all the
values are correct as the query columns are aligned to the schema
Another way of highlighting the issue being seen which I find more troublesome
is if the query for example has columns in reverse order of the schema
{code:java}
SELECT Country, ProductName, ProductCode, ArticleName, ArticleCode FROM
FLOWFILE {code}
so the query results from the same two ingested records above are
{code:java}
[ {
"Country" : "12345",
"ProductName" : "Credit Card",
"ProductCode" : "10101",
"ArticleName" : "Porduct Credit",
"ArticleCode" : "RO"
}, {
"Country" : "12346",
"ProductName" : "Business Card",
"ProductCode" : "10102",
"ArticleName" : "Society Credit",
"ArticleCode" : "RO"
} ]{code}
Hence I see from here, any attempt to obtain all the columns in a query if its
not exactly the same order as the defined schema will get incorrect values.
> QueryRecord not following column order
> --------------------------------------
>
> Key: NIFI-14162
> URL: https://issues.apache.org/jira/browse/NIFI-14162
> Project: Apache NiFi
> Issue Type: Bug
> Affects Versions: 2.0.0
> Reporter: alexduta
> Assignee: Daniel Stieglitz
> Priority: Minor
>
> It seems that, in Nifi, in the QueryRecord processor, if you define some
> columns you select from flowfile, it is following the order, but is taking
> the values in their order from the original flowfile.
> As an example:
> Original Json flowfile:
> {code:java}
> // [{
> "ArticleCode" : "12345",
> "ProductCode" : "10101",
> "ArticleName" : "Credit Card",
> "ProductName" : "Porduct Credit",
> "Country" : "RO"
> }] {code}
>
>
> Query:
>
> {code:java}
> // select ArticleCode,ArticleName,ProductCode,ProductName,Country from
> FLOWFILE
> {code}
>
>
> Returning file:
>
> {code:java}
> // [{
> "ArticleCode" : "12345",
> "ArticleName" : "10101",
> "ProductCode" : "Credit Card",
> "ProductName" : "Porduct Credit",
> "Country" : "RO"
> }] {code}
>
> So, it is somehow just adding the name of the columns, but is not changing
> the values.
>
> Otherwise, if one of the records in the flowfile has the correct order in it,
> the others will follow the correct rule:
> Original file:
> {code:java}
> // [{
> "ArticleCode" : "12345",
> "ArticleName" : "Credit Card",
> "ProductCode" : "10101",
> "ProductName" : "Porduct Credit",
> "Country" : "RO"
> },
> {
> "ArticleCode" : "12346",
> "ProductCode" : "10102",
> "ArticleName" : "Business Card",
> "ProductName" : "Society Credit",
> "Country" : "RO"
> }] {code}
> or
> {code:java}
> // [
> {
> "ArticleCode" : "12345",
> "ProductCode" : "10101",
> "ArticleName" : "Credit Card",
> "ProductName" : "Porduct Credit",
> "Country" : "RO"
> },{
> "ArticleCode" : "12346",
> "ArticleName" : "Business Card",
> "ProductCode" : "10102",
> "ProductName" : "Society Credit",
> "Country" : "RO"
> }] {code}
> Returning file:
> {code:java}
> // [{
> "ArticleCode" : "12345",
> "ArticleName" : "Credit Card",
> "ProductCode" : "10101",
> "ProductName" : "Porduct Credit",
> "Country" : "RO"
> },
> {
> "ArticleCode" : "12346",
> "ArticleName" : "Business Card",
> "ProductCode" : "10102",
> "ProductName" : "Society Credit",
> "Country" : "RO"
> }] {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)