Re: Read parquet files as buckets

Michael Artz Wed, 01 Nov 2017 05:42:09 -0700

Hi,
   What about the DAG can you send that as well?  From the resulting
"write" call?


On Wed, Nov 1, 2017 at 5:44 AM, אורן שמון <[email protected]> wrote:

> The version is 2.2.0 .
> The code for the write is :
> sortedApiRequestLogsDataSet.write
>       .bucketBy(numberOfBuckets, "userId")
>       .mode(SaveMode.Overwrite)
>       .format("parquet")
>       .option("path", outputPath + "/")
>       .option("compression", "snappy")
>       .saveAsTable("sorted_api_logs")
>
> And code for the read :
> val df = sparkSession.read.parquet(path).toDF()
>
> The read code run on other cluster than the write .
>
>
>
>
> On Tue, Oct 31, 2017 at 7:02 PM Michael Artz <[email protected]>
> wrote:
>
>> What version of spark?  Do you have code sample?  Screen shot of the DAG
>> or the printout from .explain?
>>
>> On Tue, Oct 31, 2017 at 11:01 AM, אורן שמון <[email protected]>
>> wrote:
>>
>>> Hi all,
>>> I have Parquet files as result from some job , the job saved them in
>>> bucket mode by userId . How can I read the files in bucket mode in another
>>> job ? I tried to read it but it didnt bucket the data (same user in same
>>> partition)
>>>
>>
>>

Re: Read parquet files as buckets

Reply via email to