Small-files source - partitioning based on prefix of file

Averell Mon, 30 Jul 2018 03:58:40 -0700

Hi everyone,

We are collecting log files from tens of thousands of network nodes, and we
need to do some data insights using that. The files are coming with the
corresponding node ID in the file name, and I want to do custom partitioning
using that Node ID.
Right now (with Flink 1.5) I think that is not supported. I have been trying
to look into the code, but it would take some time for me to understand.
>From the GUI, it looks like the first step of file source (directory
monitoring) is rebalancing the stream to the 2nd step (file reader). And as
per Flink document, rebalancing means round-robin. However, I could not find
the call of "rebalancing" method, but "transform" is called. Not much
information about that "transform" method though.


Would it possible for me to ask for some guideline on this?

Thanks for your help.
Averell



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Small-files source - partitioning based on prefix of file

Reply via email to