[jira] [Commented] (NIFI-4060) Create a MergeRecord Processor

ASF GitHub Bot (JIRA) Thu, 29 Jun 2017 13:16:17 -0700

    [ 
https://issues.apache.org/jira/browse/NIFI-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068918#comment-16068918
 ]


ASF GitHub Bot commented on NIFI-4060:
--------------------------------------

Github user mattyb149 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1958#discussion_r124901819
  
    --- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/MergeRecord.java
 ---
    @@ -0,0 +1,350 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.processors.standard;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.util.ArrayList;
    +import java.util.HashSet;
    +import java.util.List;
    +import java.util.Optional;
    +import java.util.Set;
    +import java.util.concurrent.TimeUnit;
    +import java.util.concurrent.atomic.AtomicReference;
    +
    +import org.apache.nifi.annotation.behavior.InputRequirement;
    +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.behavior.ReadsAttributes;
    +import org.apache.nifi.annotation.behavior.SideEffectFree;
    +import org.apache.nifi.annotation.behavior.TriggerWhenEmpty;
    +import org.apache.nifi.annotation.behavior.WritesAttribute;
    +import org.apache.nifi.annotation.behavior.WritesAttributes;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnStopped;
    +import org.apache.nifi.avro.AvroTypeUtil;
    +import org.apache.nifi.components.AllowableValue;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.flowfile.FlowFile;
    +import org.apache.nifi.flowfile.attributes.FragmentAttributes;
    +import org.apache.nifi.processor.AbstractSessionFactoryProcessor;
    +import org.apache.nifi.processor.DataUnit;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.ProcessSessionFactory;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.util.FlowFileFilters;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.processors.standard.merge.AttributeStrategyUtil;
    +import org.apache.nifi.processors.standard.merge.RecordBinManager;
    +import org.apache.nifi.schema.access.SchemaNotFoundException;
    +import org.apache.nifi.serialization.MalformedRecordException;
    +import org.apache.nifi.serialization.RecordReader;
    +import org.apache.nifi.serialization.RecordReaderFactory;
    +import org.apache.nifi.serialization.RecordSetWriterFactory;
    +import org.apache.nifi.serialization.record.RecordSchema;
    +
    +
    +@SideEffectFree
    +@TriggerWhenEmpty
    +@InputRequirement(Requirement.INPUT_REQUIRED)
    +@Tags({"merge", "record", "content", "correlation", "stream", "event"})
    +@CapabilityDescription("This Processor merges together multiple 
record-oriented FlowFiles into a single FlowFile that contains all of the 
Records of the input FlowFiles. "
    +    + "This Processor works by creating 'bins' and then adding FlowFiles 
to these bins until they are full. Once a bin is full, all of the FlowFiles 
will be combined into "
    +    + "a single output FlowFile, and that FlowFile will be routed to the 
'merged' Relationship. A bin will consist of potentially many 'like FlowFiles'. 
In order for two "
    +    + "FlowFiles to be considered 'like FlowFiles', they must have the 
same Schema (as identified by the Record Reader) and, if the <Correlation 
Attribute Name> property "
    +    + "is set, the same value for the specified attribute. See Processor 
Usage and Additional Details for more information.")
    +@ReadsAttributes({
    +    @ReadsAttribute(attribute = "fragment.identifier", description = 
"Applicable only if the <Merge Strategy> property is set to Defragment. "
    +        + "All FlowFiles with the same value for this attribute will be 
bundled together."),
    +    @ReadsAttribute(attribute = "fragment.count", description = 
"Applicable only if the <Merge Strategy> property is set to Defragment. This "
    +        + "attribute must be present on all FlowFiles with the same value 
for the fragment.identifier attribute. All FlowFiles in the same "
    +        + "bundle must have the same value for this attribute. The value 
of this attribute indicates how many FlowFiles should be expected "
    +        + "in the given bundle."),
    +})
    +@WritesAttributes({
    +    @WritesAttribute(attribute = "record.count", description = "The merged 
FlowFile will have a 'record.count' attribute indicating the number of records "
    +        + "that were written to the FlowFile."),
    +    @WritesAttribute(attribute = "mime.type", description = "The MIME Type 
indicated by the Record Writer"),
    --- End diff --
    
    I can't find where mime.type is written to by this processor, does that 
need to be added?


> Create a MergeRecord Processor
> ------------------------------
>
>                 Key: NIFI-4060
>                 URL: https://issues.apache.org/jira/browse/NIFI-4060
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>             Fix For: 1.4.0
>
>
> When record-oriented data is received one record or a time or needs to be 
> split into small chunks for one reason or another, it will be helpful to be 
> able to combine those records into a single FlowFile that is made up of many 
> records for efficiency purposes, or to deliver to downstream systems as 
> larger batches. This processor should function similarly to MergeContent but 
> make use of Record Readers and Record Writer so that users don't have to deal 
> with headers, footers, demarcators, etc.
> The Processor will also need to ensure that records only get merge into the 
> same FlowFile if they have compatible schemas.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (NIFI-4060) Create a MergeRecord Processor

Reply via email to