Almost all dataframe stuff are tracked by this umbrella ticket:
https://issues.apache.org/jira/browse/SPARK-6116

For the reader/writer interface, it's here:

https://issues.apache.org/jira/browse/SPARK-7654

https://github.com/apache/spark/pull/6175

On Tue, Jun 2, 2015 at 3:57 PM, Matt Cheah <mch...@palantir.com> wrote:

> Excellent! Where can I find the code, pull request, and Spark ticket where
> this was introduced?
>
> Thanks,
>
> -Matt Cheah
>
> From: Reynold Xin <r...@databricks.com>
> Date: Monday, June 1, 2015 at 10:25 PM
> To: Matt Cheah <mch...@palantir.com>
> Cc: "dev@spark.apache.org" <dev@spark.apache.org>, Mingyu Kim <
> m...@palantir.com>, Andrew Ash <a...@palantir.com>
> Subject: Re: [SQL] Write parquet files under partition directories?
>
> There will be in 1.4.
>
> df.write.partitionBy("year", "month", "day").parquet("/path/to/output")
>
> On Mon, Jun 1, 2015 at 10:21 PM, Matt Cheah <mch...@palantir.com> wrote:
>
>> Hi there,
>>
>> I noticed in the latest Spark SQL programming guide
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__spark.apache.org_docs_latest_sql-2Dprogramming-2Dguide.html&d=BQMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=hzwIMNQ9E99EMYGuqHI0kXhVbvX3nU3OSDadUnJxjAs&m=_7T9n01KFlQS8djMTP3ylblUaOYNr68mj286s8zIdQ8&s=VQxAw6mG9yopDs37lNi7H_CnYiFQumqDAn9A8881Xyc&e=>,
>> there is support for optimized reading of partitioned Parquet files that
>> have a particular directory structure (year=1/month=10/day=3, for example).
>> However, I see no analogous way to write DataFrames as Parquet files with
>> similar directory structures based on user-provided partitioning.
>>
>> Generally, is it possible to write DataFrames as partitioned Parquet
>> files that downstream partition discovery can take advantage of later? I
>> considered extending the Parquet output format, but it looks like
>> ParquetTableOperations.scala has fixed the output format to
>> AppendingParquetOutputFormat.
>>
>> Also, I was wondering if it would be valuable to contribute writing
>> Parquet in partition directories as a PR.
>>
>> Thanks,
>>
>> -Matt Cheah
>>
>
>

Reply via email to