Hi Marc, Thanks for the additional info. Just so you know you’re not the only one, I’ve also had to re-implement a ListenTCP alternative to get around the byte delimeter issue for binary and multiline text data.
Phil On Tue, Aug 3, 2021 at 6:59 AM Marc <[email protected]> wrote: > > Hi Adam, > > more or less it is a ‚merge', puttcp, listentcp and unpack. I hope that I am > not wrong but the nifi ListenTCP processor uses a delimiter (\n as default?). > If you are transferring binary data the processor splits the flow into > ‚pieces'. And the attributes are not transferred to the destination. > > But your idea describes what the processor is doing. > > 1. It converts the attributes to a json string > 2. It transfers the json string and the payload (there is a header that tells > the destination how long the json header and how long the payload is) > 3. The Listener gets the flow and decodes the header (to get the size of the > json header and the payload) > 4. It writes the payload to a flow > 5. It converts the json string and sets the attributes to the flow > > If you do not want to transfer attributes you can configure a different > decoder. In this case you can just ‚nectat‘ a binary file to nifi. > > The UDP version is far more complex. There must be a counter to tell the > destination what part of the flow file was received (even in a diode > environment packets are not received in the right order!). And you must be > fast, very fast. It is a multithreaded architecture because one thread cannot > receive, decode, and write a gigabit per second. I used the disruptor > library. Receive a packet in one thread, decode it in another thread. A third > thread gets the packet and write the content in the right order to a flow. > > I am still learning (and I am not a professional software developer). If I > did something wrong or oversaw something please tell me. > > Marc > > > Am 02.08.2021 um 22:01 schrieb Adam Taft <[email protected]>: > > > > Marc, > > > > How would this differ from a more generic use of the existing processors, > > PutTCP/ListentTCP and PutUDP/ListenUDP? I'm not sure what value is being > > added above these existing processors, but I'm sure I'm missing something. > > > > There's already an ability to serialize flowfiles via MergeContent. And > > there's the deserialize side in UnpackContent. So a dataflow that looks > > like the following would seem a reasonable approach to the problem: > > > > MergeContent -> PutTCP -> {diode} -> ListentTCP -> UnpackContent > > > > I'm actually very interested in this topic, having a project that has a use > > case for a "diode". So I'm legitimately asking here, not trying to derail > > your work. > > > > Thanks in advance, > > > > Adam > > > > On Sun, Aug 1, 2021 at 12:26 PM Marc <[email protected]> wrote: > > > >> Greetings, > >> > >> there are companies and organizations that strictly separate their > >> networks for security reasons. Such companies often use diodes to achieve > >> this. But of course they still have to exchange data between the networks > >> (eg. transfer data from ‚low‘ to ‚high‘). There are at least two kinds of > >> diodes. Some hardware-based ones only use one fiber optic to send data (UDP > >> based). Others use TCP, but prevent sending in the reverse direction. > >> > >> Nifi is an amazing tool that allows data to be transferred between two > >> separate networks in a very flexible but also secure way. I have > >> implemented two processors. The first one ‚merges‘ the attributes and the > >> content of a flowfile and sends it to the destination. The second one > >> listens on a TCP port, splits attributes and content and creates a new > >> flowfile containing all attributes of the origin flow. You can send the > >> flow without attributes as well. In this case you can easily netcat a > >> binary file to Nifi. > >> > >> These two processors are useful if you do NOT have a bidirectional > >> communication between two NiFi instances and therefore the site-2-site > >> mechanism or http(s) cannot be used. > >> > >> We have been using these processors for a longer period of time (exactly > >> the version for 1.13.2) and would like to share these processors with > >> others. So the question to you all is: Is someone interested in these > >> processors or is this use case too special? > >> > >> The current source code can be found on GitHub. ( > >> https://github.com/nerdfunk-net/diode/ < > >> https://github.com/nerdfunk-net/diode/>) > >> > >> I have also implemented a UDP based version of the processor. Due to the > >> nature of UDP, this is more complex and these processors are now being > >> tested. > >> > >> Best regards > >> Marc >
