Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

Mihail Vieru Thu, 02 Jul 2015 12:18:34 -0700

I've implemented the alternating 2 files solution and everything works now.


Thanks a lot! You saved my day :)

Cheers,
Mihail

On 02.07.2015 12:37, Maximilian Michels wrote:

The problem is that your input and output path are the same. BecauseFlink executes in a pipelined fashion, all the operators will come upat once. When you set WriteMode.OVERWRITE for the sink, it will deletethe path before writing anything. That means that when your DataSourcereads the input, there will be nothing to read from. Thus you get anempty DataSet which you write to HDFS again. Any further loops willthen just write nothing.

You can circumvent this problem, by prefixing every output file with acounter that you increment in your loop. Alternatively, if you onlywant to keep the latest output, you can use two files and let themalternate to be input and output.


Let me know if you have any further questions.

Kind regards,
Max

On Thu, Jul 2, 2015 at 10:20 AM, Maximilian Michels <m...@apache.org<mailto:m...@apache.org>> wrote:


    Hi Mihail,

    Thanks for the code. I'm trying to reproduce the problem now.

    On Wed, Jul 1, 2015 at 8:30 PM, Mihail Vieru
    <vi...@informatik.hu-berlin.de
    <mailto:vi...@informatik.hu-berlin.de>> wrote:

        Hi Max,

        thank you for your reply. I wanted to revise and dismiss all
        other factors before writing back. I've attached you my code
        and sample input data.

        I run the /APSPNaiveJob/ using the following arguments:

        /0 100 hdfs://path/to/vertices-test-100
        hdfs://path/to/edges-test-100 hdfs://path/to/tempgraph 10 0.5
        hdfs://path/to/output-apsp 9/

        I was wrong, I originally thought that the first writeAsCsv
        call (line 50) doesn't work. An exception is thrown without
        the WriteMode.OVERWRITE when the file exists.

        But the problem lies with the second call (line 74), trying to
        write to the same path on HDFS.

        This issue is blocking me, because I need to persist the
        vertices dataset between iterations.

        Cheers,
        Mihail

        P.S.: I'm using the latest 0.10-SNAPSHOT and HDFS 1.2.1.



        On 30.06.2015 16:51, Maximilian Michels wrote:

        HI Mihail,

        Thank you for your question. Do you have a short example that
        reproduces the problem? It is hard to find the cause without
        an error message or some example code.

        I wonder how your loop works without WriteMode.OVERWRITE
        because it should throw an exception in this case. Or do you
        change the file names on every write?

        Cheers,
        Max

        On Tue, Jun 30, 2015 at 3:47 PM, Mihail Vieru
        <vi...@informatik.hu-berlin.de
        <mailto:vi...@informatik.hu-berlin.de>> wrote:

            I think my problem is related to a loop in my job.

            Before the loop, the writeAsCsv method works fine, even
            in overwrite mode.

            In the loop, in the first iteration, it writes an empty
            folder containing empty files to HDFS. Even though the
            DataSet it is supposed to write contains elements.

            Needless to say, this doesn't occur in a local execution
            environment, when writing to the local file system.


            I would appreciate any input on this.

            Best,
            Mihail



            On 30.06.2015 12:10, Mihail Vieru wrote:

            Hi Till,

            thank you for your reply.

            I have the following code snippet:

            /intermediateGraph.getVertices().writeAsCsv(tempGraphOutputPath,
            "\n", ";", WriteMode.OVERWRITE);/

            When I remove the WriteMode parameter, it works. So I
            can reason that the DataSet contains data elements.

            Cheers,
            Mihail


            On 30.06.2015 12:06, Till Rohrmann wrote:


            Hi Mihail,

            have you checked that the |DataSet| you want to write
            to HDFS actually contains data elements? You can try
            calling |collect| which retrieves the data to your
            client to see what’s in there.

            Cheers,
            Till

            

            On Tue, Jun 30, 2015 at 12:01 PM, Mihail Vieru
            <vi...@informatik.hu-berlin.de
            <mailto:vi...@informatik.hu-berlin.de>> wrote:

                Hi,

                the writeAsCsv method is not writing anything to
                HDFS (version 1.2.1) when the WriteMode is set to
                OVERWRITE.
                A file is created but it's empty. And no trace of
                errors in the Flink or Hadoop logs on all nodes in
                the cluster.

                What could cause this issue? I really really need
                this feature..

                Best,
                Mihail

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

Reply via email to