Hi, what's the best way to read a compressed (bz2 / gz) XML file splitting it at a specific XML-tag?
So far I've been using hadoop's TextInputFormat in combination with mahouts XmlInputFormat ([0]) with env.readHadoopFile(). Whereas the plain TextInputFormat can handle compressed data, the XmlInputFormat can't for some reason. Is there a flink-ish way to accomplish this? Best regards, Sebastian [0]: https://github.com/apache/mahout/blob/ad84344e4055b1e6adff5779339a33fa29e1265d/examples/src/main/java/org/apache/mahout/classifier/bayes/XmlInputFormat.java