Hi,

what's the best way to read a compressed (bz2 / gz) XML file splitting
it at a specific XML-tag?

So far I've been using hadoop's TextInputFormat in combination with
mahouts XmlInputFormat ([0]) with env.readHadoopFile(). Whereas the
plain TextInputFormat can handle compressed data, the XmlInputFormat
can't for some reason.

Is there a flink-ish way to accomplish this?

Best regards,
Sebastian

[0]:
https://github.com/apache/mahout/blob/ad84344e4055b1e6adff5779339a33fa29e1265d/examples/src/main/java/org/apache/mahout/classifier/bayes/XmlInputFormat.java

Reply via email to