You're welcome. I'm glad I could help out :) Cheers, Max
On Thu, Jul 2, 2015 at 9:17 PM, Mihail Vieru <vi...@informatik.hu-berlin.de> wrote: > I've implemented the alternating 2 files solution and everything works > now. > > Thanks a lot! You saved my day :) > > Cheers, > Mihail > > > On 02.07.2015 12:37, Maximilian Michels wrote: > > The problem is that your input and output path are the same. Because > Flink executes in a pipelined fashion, all the operators will come up at > once. When you set WriteMode.OVERWRITE for the sink, it will delete the > path before writing anything. That means that when your DataSource reads > the input, there will be nothing to read from. Thus you get an empty > DataSet which you write to HDFS again. Any further loops will then just > write nothing. > > You can circumvent this problem, by prefixing every output file with a > counter that you increment in your loop. Alternatively, if you only want to > keep the latest output, you can use two files and let them alternate to be > input and output. > > Let me know if you have any further questions. > > Kind regards, > Max > > On Thu, Jul 2, 2015 at 10:20 AM, Maximilian Michels <m...@apache.org> > wrote: > >> Hi Mihail, >> >> Thanks for the code. I'm trying to reproduce the problem now. >> >> On Wed, Jul 1, 2015 at 8:30 PM, Mihail Vieru < >> vi...@informatik.hu-berlin.de> wrote: >> >>> Hi Max, >>> >>> thank you for your reply. I wanted to revise and dismiss all other >>> factors before writing back. I've attached you my code and sample input >>> data. >>> >>> I run the *APSPNaiveJob* using the following arguments: >>> >>> *0 100 hdfs://path/to/vertices-test-100 hdfs://path/to/edges-test-100 >>> hdfs://path/to/tempgraph 10 0.5 hdfs://path/to/output-apsp 9* >>> >>> I was wrong, I originally thought that the first writeAsCsv call (line >>> 50) doesn't work. An exception is thrown without the WriteMode.OVERWRITE >>> when the file exists. >>> >>> But the problem lies with the second call (line 74), trying to write to >>> the same path on HDFS. >>> >>> This issue is blocking me, because I need to persist the vertices >>> dataset between iterations. >>> >>> Cheers, >>> Mihail >>> >>> P.S.: I'm using the latest 0.10-SNAPSHOT and HDFS 1.2.1. >>> >>> >>> >>> On 30.06.2015 16:51, Maximilian Michels wrote: >>> >>> HI Mihail, >>> >>> Thank you for your question. Do you have a short example that >>> reproduces the problem? It is hard to find the cause without an error >>> message or some example code. >>> >>> I wonder how your loop works without WriteMode.OVERWRITE because it >>> should throw an exception in this case. Or do you change the file names on >>> every write? >>> >>> Cheers, >>> Max >>> >>> On Tue, Jun 30, 2015 at 3:47 PM, Mihail Vieru < >>> vi...@informatik.hu-berlin.de> wrote: >>> >>>> I think my problem is related to a loop in my job. >>>> >>>> Before the loop, the writeAsCsv method works fine, even in overwrite >>>> mode. >>>> >>>> In the loop, in the first iteration, it writes an empty folder >>>> containing empty files to HDFS. Even though the DataSet it is supposed to >>>> write contains elements. >>>> >>>> Needless to say, this doesn't occur in a local execution environment, >>>> when writing to the local file system. >>>> >>>> >>>> I would appreciate any input on this. >>>> >>>> Best, >>>> Mihail >>>> >>>> >>>> >>>> On 30.06.2015 12:10, Mihail Vieru wrote: >>>> >>>> Hi Till, >>>> >>>> thank you for your reply. >>>> >>>> I have the following code snippet: >>>> >>>> *intermediateGraph.getVertices().writeAsCsv(tempGraphOutputPath, "\n", >>>> ";", WriteMode.OVERWRITE);* >>>> >>>> When I remove the WriteMode parameter, it works. So I can reason that >>>> the DataSet contains data elements. >>>> >>>> Cheers, >>>> Mihail >>>> >>>> >>>> On 30.06.2015 12:06, Till Rohrmann wrote: >>>> >>>> Hi Mihail, >>>> >>>> have you checked that the DataSet you want to write to HDFS actually >>>> contains data elements? You can try calling collect which retrieves >>>> the data to your client to see what’s in there. >>>> >>>> Cheers, >>>> Till >>>> >>>> >>>> On Tue, Jun 30, 2015 at 12:01 PM, Mihail Vieru < >>>> vi...@informatik.hu-berlin.de> wrote: >>>> >>>>> Hi, >>>>> >>>>> the writeAsCsv method is not writing anything to HDFS (version 1.2.1) >>>>> when the WriteMode is set to OVERWRITE. >>>>> A file is created but it's empty. And no trace of errors in the Flink >>>>> or Hadoop logs on all nodes in the cluster. >>>>> >>>>> What could cause this issue? I really really need this feature.. >>>>> >>>>> Best, >>>>> Mihail >>>>> >>>> >>>> >>>> >>>> >>> >>> >> > >