Hi Adam,

more or less it is a ‚merge', puttcp, listentcp and unpack. I hope that I am 
not wrong but the nifi ListenTCP processor uses a delimiter (\n as default?). 
If you are transferring binary data the processor splits the flow into 
‚pieces'. And the attributes are not transferred to the destination.

But your idea describes what the processor is doing.

1. It converts the attributes to a json string
2. It transfers the json string and the payload (there is a header that tells 
the destination how long the json header and how long the payload is)
3. The Listener gets the flow and decodes the header (to get the size of the 
json header and the payload)
4. It writes the payload to a flow
5. It converts the json string and sets the attributes to the flow 

If you do not want to transfer attributes you can configure a different 
decoder. In this case you can just ‚nectat‘ a binary file to nifi.

The UDP version is far more complex. There must be a counter to tell the 
destination what part of the flow file was received (even in a diode 
environment packets are not received in the right order!). And you must be 
fast, very fast. It is a multithreaded architecture because one thread cannot 
receive, decode, and write a gigabit per second. I used the disruptor library. 
Receive a packet in one thread, decode it in another thread. A third thread 
gets the packet and write the content in the right order to a flow.

I am still learning (and I am not a professional software developer). If I did 
something wrong or oversaw something please tell me.

Marc 

> Am 02.08.2021 um 22:01 schrieb Adam Taft <[email protected]>:
> 
> Marc,
> 
> How would this differ from a more generic use of the existing processors,
> PutTCP/ListentTCP and PutUDP/ListenUDP?  I'm not sure what value is being
> added above these existing processors, but I'm sure I'm missing something.
> 
> There's already an ability to serialize flowfiles via MergeContent. And
> there's the deserialize side in UnpackContent. So a dataflow that looks
> like the following would seem a reasonable approach to the problem:
> 
> MergeContent -> PutTCP -> {diode} -> ListentTCP -> UnpackContent
> 
> I'm actually very interested in this topic, having a project that has a use
> case for a "diode". So I'm legitimately asking here, not trying to derail
> your work.
> 
> Thanks in advance,
> 
> Adam
> 
> On Sun, Aug 1, 2021 at 12:26 PM Marc <[email protected]> wrote:
> 
>> Greetings,
>> 
>> there are companies and organizations that strictly separate their
>> networks for security reasons. Such companies often use diodes to achieve
>> this. But of course they still have to exchange data between the networks
>> (eg. transfer data from ‚low‘ to ‚high‘). There are at least two kinds of
>> diodes. Some hardware-based ones only use one fiber optic to send data (UDP
>> based). Others use TCP, but prevent sending in the reverse direction.
>> 
>> Nifi is an amazing tool that allows data to be transferred between two
>> separate networks in a very flexible but also secure way. I have
>> implemented two processors. The first one ‚merges‘ the attributes and the
>> content of a flowfile and sends it to the destination. The second one
>> listens on a TCP port, splits attributes and content and creates a new
>> flowfile containing all attributes of the origin flow. You can send the
>> flow without attributes as well. In this case you can easily netcat a
>> binary file to Nifi.
>> 
>> These two processors are useful if you do NOT have a bidirectional
>> communication between two NiFi instances and therefore the site-2-site
>> mechanism or http(s) cannot be used.
>> 
>> We have been using these processors for a longer period of time (exactly
>> the version for 1.13.2) and would like to share these processors with
>> others. So the question to you all is: Is someone interested in these
>> processors or is this use case too special?
>> 
>> The current source code can be found on GitHub. (
>> https://github.com/nerdfunk-net/diode/ <
>> https://github.com/nerdfunk-net/diode/>)
>> 
>> I have also implemented a UDP based version of the processor. Due to the
>> nature of UDP, this is more complex and these processors are now being
>> tested.
>> 
>> Best regards
>> Marc

Reply via email to