Many many thanks 🙏 Joe, that makes my flow a lot simpler. Thanks Jens
> Den 13. okt. 2021 kl. 16.50 skrev Joe Witt <[email protected]>: > > Jens > > If you use MergContent [1] you can create streams of flowfile bundles > (attributes/content serialized together) in groups of 1 or more. Then > on the other end you can use UnpackContent [2] > > Thanks > Joe > > [1] > http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.14.0/org.apache.nifi.processors.standard.MergeContent/index.html > [2] > http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.14.0/org.apache.nifi.processors.standard.UnpackContent/index.html > >> On Tue, Oct 12, 2021 at 11:07 PM Jens M. Kofoed <[email protected]> >> wrote: >> >> Dear Joe >> >> Regarding you point 5. This is almost also what I'm doing. But last night >> at my phone I "just wrote" we created a hash file. What I'm actually doing >> is converting the flowfile to json. >> Are there a way where NIFI can export the complete flowfile (attributes and >> content) into 1 file, which we can import again on the other side? Right >> now I do it in 2 steps >> Below is a short description of my flow for transferring data between >> systems where we can't use S2S. >> At low side: >> get data -> >> CryptographicHashContent -> >> UpdateAttribute: original.filename = ${filename}, >> rootHash=${content_SHA-256} -> >> UpdateAttribute: filename=${UUID()} -> >> PutSFTP -> >> AttributesToJSON: Destination=flowfile-content -> >> UpdateAttribute: filename=${filename:append('.flowfile')} -> >> PutSFTP >> >> At high side: >> ListSFTP: File filter Regex = .*\.flowfile -> >> FetchSFTP -> >> ExecuteScript: (converting json data into attributes) -> >> UpdateAttribute: filename = ${filename:substringBefore('.flowfile')} >> -> >> FetchSFTP -> >> CryptographicHashContent -> >> RouteOnAttribute: Hash_OK = >> ${rootHash:equals(${content_SHA-256})} -> >> Hash_OK -> following production flow >> Unmatched -> Error flow >> >> Kind regards >> Jens >> >>> Den tir. 12. okt. 2021 kl. 21.36 skrev Joe Witt <[email protected]>: >>> >>> Jens >>> >>> For such a setup the very specific details matter and here there are a >>> lot of details. It isn't easy to sort through this for me so I'll >>> keep it high level based on my experience in very similar >>> situations/setups: >>> >>> 1. I'd generally trust SFTP to be awesome and damn near failure proof >>> in itself. I'd focus on other things. >>> 2. I'd generally trust that data packet corruption in terms of network >>> transfer is bulletproof and not think that is a problem especially >>> since SFTP and various protocols employed here offer certain >>> guarantees themselves (including nifi). >>> 3. I'd be suspect of one way transfer/guard devices creating issues. >>> I'd remove that and try to reproduce the problem. >>> 4. In linux a cp/mv is not atomic as I understand if data is spanning >>> across file systems so you could have partially written data scenarios >>> here potentially. >>> 5. I'd be careful to avoid multiple file scenarios such as original >>> content and the sha256. Instead if the low side is a NiFi and the >>> high side is a NiFi I'd have lowside nifi write out flowfiles and pass >>> those over the guard device. Why? Because this gives you your >>> original content AND the flowfile attributes (where I'd have the >>> sha256). On the high side nifi i'd unpack that flow file and ensure >>> the content matches the stated sha256. >>> >>> Joe >>> >>> On Tue, Oct 12, 2021 at 12:25 PM Jens M. Kofoed <[email protected]> >>> wrote: >>>> >>>> Hi Joe >>>> >>>> I know what you are thinking but that’s not the case. >>>> Check my very short description of my test flow. >>>> In my loop the PutSFTP process is using default settings which means >>> it’s uploading files as .filename and rename it when done. The next process >>> is the FetchSFTP which will load the file as filename. If PutSFTP is not >>> finished uploading the file it will have the wrong filename and the flow >>> file will not go from the PutSFTP -> FetchSFTP and therefore the FetchSFTP >>> can’t fetch the file. So in my test flow it is not the case. >>>> >>>> In our production flow, after nifi gets its data it calculates the >>> sha256. uploads the data to a sftp server as .filename and rename it when >>> done. Default settings for PutSFTP. Next it create a new file with the >>> value of the hash and save it as filename.sha256. >>>> At that sftp server a bash script is looking for NOT hidden files every >>> 2 seconds with a ls command. If there are files the bash script does a cp >>> filename /archive/filename and sends the data to server 3 via a data diode. >>> At the other side another nifi server reads the filename.sha256, reads in >>> the hash value and reads in the original data. Calculate a new sha256 and >>> compare the two hashes. >>>> Yesterday there was a corruption again and we checked the file at the >>> first sftp server where the first nifi saved it after creating the first >>> hash. Running a sha256sum at the /archive/filename produced a different >>> hash than nifi. So after the PutSFTP and a Linux cp command the file was >>> corrupted. >>>> It have been less than 1 file pr. 1.000.000 files where we have seen >>> theses issues. But we see them. >>>> Now we try to investigate that course the issue. Therefore I created the >>> small test flow and already after nearly 9000 iteration in the loop the >>> file has been corrupted just being uploaded and downloaded again. >>>> >>>> Are we facing a network issue where a data packed is corrupted? >>>> Are there a very rare cases where the sftp implementation is doing >>> something wrong? >>>> We don’t know yet but we are running some more tests and at different >>> systems to narrow it down >>>> >>>> Kind regards >>>> Jens M. Kofoed >>>> >>>>> Den 12. okt. 2021 kl. 19.39 skrev Joe Witt <[email protected]>: >>>>> >>>>> Hello >>>>> >>>>> How does nifi grab the data from the file system? It sounds like it is >>>>> doing partial reads due to a competing consumer (data still being >>> written) >>>>> scenario. >>>>> >>>>> Thanks >>>>> >>>>> On Mon, Oct 11, 2021 at 10:36 PM Jens M. Kofoed < >>> [email protected]> >>>>> wrote: >>>>> >>>>>> Dear Developers >>>>>> >>>>>> We have a situation where we see corrupted file after using PutSFTP >>> and >>>>>> FetchSFTP in NIFI 1.13.2 with openjdk version "1.8.0_292", OpenJDK >>> Runtime >>>>>> Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10), OpenJDK >>> 64-Bit >>>>>> Server VM (build 25.292-b10, mixed mode) running on a Ubuntu Server >>> 20.04 >>>>>> >>>>>> We have a flow between 2 separated systems where we use a PUTSFTP to >>> export >>>>>> data from one NIFI instance to a datadiode and use FetchSFTP to grep >>> data >>>>>> on the other end. To be sure data is not corrupted we calculate a >>> SHA256 on >>>>>> each side, and transfer the flowfile metadata in a seperate file. In >>> rare >>>>>> cases have see that the SHA256 doesn't match on both sides and are >>>>>> investigation where the errors happens. We see 2 errors. Manually >>>>>> calculation a SHA256 on both side of the diodes the file is OK and we >>> have >>>>>> found that the errors at happens between NIFI and the SFTP servers. >>> And it >>>>>> can happens at both sides. >>>>>> So for testing I created this little flow: >>>>>> GeneratingFlowFile (size 100MB) (Run once) -> >>>>>> CryptographicHashContent (SHA256) -> >>>>>> UpdateAttribute ( hash.root = ${content_SHA-256} , iteration=1) -> >>>>>> PutSFTP -> >>>>>> FetchSFTP -> >>>>>> CryptographicHashContent (SHA256) -> >>>>>> routeOnAttribute (compare root.hash vs.content_SHA-256) >>>>>> If unmatch -> >>>>>> Going to a disabled process for placeholding the corrupted >>> file in >>>>>> a file queue >>>>>> If match -> >>>>>> UpdateAttribute ( iteration= ${iteration:plus(1)} ) -> looping >>> back >>>>>> to PutSFTP >>>>>> >>>>>> After 8992 iteration the file is corrupted. To test if the errors are >>> in >>>>>> the calculation of the SHA256 I have a copy of the flow without the >>>>>> PUT/FETCH SFTP processors which haven't got any errors yet. >>>>>> >>>>>> It is very rare that we see these errors, millions of files are going >>>>>> through without any issues but some time it happens which is not good. >>>>>> >>>>>> Can any one please help? Maybe trying to setup the same test and see >>> if you >>>>>> also have a corrupted file after some days. >>>>>> >>>>>> Kind regards >>>>>> Jens M. Kofoed >>>>>> >>>
