As long as you have a filesystem implementation [1] for your p2p fs, hadoop (and other software like Hive and Spark that use the hadoop fs) should work just fine. Performance may be a concern, but you may have to tune your implementation to adapt as far as possible.
1. https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/index.html Thanks, Hariharan On Wed, 23 Sep 2020, 22:27 Lauren Taylor, <chaylehdtra...@gmail.com> wrote: > Hello! > > I am currently provisioning a P2P peer network (there are already a couple > of live networks that have been created, but we do not want to test this in > production, fo course). > > In this p2p network, I was looking at the best ways in which one could > distribute file storage (and access it to) in an efficient manner. > > The difference between this solution & Bittorrent (DHT / mainline DHT), is > *that > all of the files that are uploaded to the network are meant to be stored > and distributed*. > > Putting the complexities of that to the side (the sustainability of that > proposal has been accounted for), I am wondering whether Apache Hadoop > would be a good structure to run on top of that system. > > *Why I Ask* > The p2p structure of this protocol is absolutely essential to its > functioning. Thus, if I am going to leverage it for the purposes of storage > / distribution, it is imperative that I ensure I'm not injecting something > into the ecosystem that could ultimately harm it (i.e., DoS vulnerability). > > *Hadoop-LAFS?* > I was on the 'Tahoe-LAFS' website and I saw that there was a proposal for > 'Hadoop-LAFS' - which is a deployment of Apache Hadoop over top of the > Tahoe-LAFS layer. > > According to the project description given by Google's Code Archive, this > allows for: > > "Provides an integration layer between Tahoe LAFS and Hadoop so Map Reduce >> jobs can be run over encrypted data stored in Tahoe." >> > > Any and all answers would help a ton, thank you! > > Sincerely, > Buck Wiston > > > > > >