Matt Burgess created NIFI-4109:
----------------------------------
Summary: Implement an InferRecordSchema processor
Key: NIFI-4109
URL: https://issues.apache.org/jira/browse/NIFI-4109
Project: Apache NiFi
Issue Type: New Feature
Components: Extensions
Reporter: Matt Burgess
Currently a record schema (for use in record-aware processors) must be provided
by an attribute, a Schema Registry, or embedded in the flow file, and thus
determined ahead of time. For formats that do not carry a schema (CSV, JSON,
e.g.) and for flows whose files' schemas vary or are otherwise not known a
priori, it would be helpful to have a processor to be able to infer the schema
from the content. It could have any/all of the following features:
- Record-awareness: The existing InferAvroSchema can be used for CSV and JSON
with non-record-aware processors/flows, although it does not currently support
Avro logical types such as timestamp (see NIFI-3000). The benefit of
record-awareness means better inference can be made by inspecting each record
in a flowfile.
- Type inference: Should include the primitive types (numeric, string) as well
as more complex types supported by Avro schemas (time, date, timestamp, etc.)
- Generate Schema in attribute: Recommend "avro.schema" be used as the output
attribute, as this is the default for most RecordWriters.
- Publish Schema to Registry: This is an advanced feature that could be split
out into its own Jira due to scope concerns.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)