Would it be an option to just write the results of each job into separate tables and then run a UNION on all of them at the end into a final target table? Just thinking of an alternative!
Thanks, Subhash Sent from my iPhone > On Apr 20, 2017, at 3:48 AM, Rick Moritz <rah...@gmail.com> wrote: > > Hi List, > > I'm wondering if the following behaviour should be considered a bug, or > whether it "works as designed": > > I'm starting multiple concurrent (FIFO-scheduled) jobs in a single > SparkContext, some of which write into the same tables. > When these tables already exist, it appears as though both jobs [at least > believe that they] successfully appended to the table (i.e., both jobs > terminate succesfully, but I haven't checked whether the data from both jobs > was actually written, or if one job overwrote the other's data, despite > Mode.APPEND). If the table does not exist, both jobs will attempt to create > the table, but whichever job's turn is second (or later) will then fail with > a AlreadyExistsException (org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: AlreadyExistsException). > > I think the issue here is, that both jobs don't register the table with the > metastore, until they actually start writing to it, but determine early on > that they will need to create it. The slower job then oobviously fails > creating the table, and instead of falling back to appending the data to the > existing table crashes out. > > I would consider this a bit of a bug, but I'd like to make sure that it isn't > merely a case of me doing something stupid elsewhere, or indeed simply an > inherent architectural limitation of working with the metastore, before going > to Jira with this. > > Also, I'm aware that running the jobs strictly sequentially would work around > the issue, but that would require reordering jobs before sending them off to > Spark, or kill efficiency. > > Thanks for any feedback, > > Rick --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org