Hi, What about the DAG can you send that as well? From the resulting "write" call?
On Wed, Nov 1, 2017 at 5:44 AM, אורן שמון <[email protected]> wrote: > The version is 2.2.0 . > The code for the write is : > sortedApiRequestLogsDataSet.write > .bucketBy(numberOfBuckets, "userId") > .mode(SaveMode.Overwrite) > .format("parquet") > .option("path", outputPath + "/") > .option("compression", "snappy") > .saveAsTable("sorted_api_logs") > > And code for the read : > val df = sparkSession.read.parquet(path).toDF() > > The read code run on other cluster than the write . > > > > > On Tue, Oct 31, 2017 at 7:02 PM Michael Artz <[email protected]> > wrote: > >> What version of spark? Do you have code sample? Screen shot of the DAG >> or the printout from .explain? >> >> On Tue, Oct 31, 2017 at 11:01 AM, אורן שמון <[email protected]> >> wrote: >> >>> Hi all, >>> I have Parquet files as result from some job , the job saved them in >>> bucket mode by userId . How can I read the files in bucket mode in another >>> job ? I tried to read it but it didnt bucket the data (same user in same >>> partition) >>> >> >>
