[ https://issues.apache.org/jira/browse/SPARK-50616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17906907#comment-17906907 ]
Yang Jie commented on SPARK-50616: ---------------------------------- This is an unresolved issue, so there is no need to fill in the 'Fix Version/s' for now. Therefore, I have removed its content. > Add File Extension Option to CSV DataSource Writer > -------------------------------------------------- > > Key: SPARK-50616 > URL: https://issues.apache.org/jira/browse/SPARK-50616 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.5.3 > Reporter: James Baugh > Priority: Minor > > h3. What changes were proposed in this pull request? > The existing CSV DataSource allows one to set the delimiter/separator but > does not allow the changing of the file extension. This means that a file can > have values separated by tabs but me marked as a ".csv" file. This change > allows one to change the file extension to match the delimiter/separator > (e.g. ".tsv" for a tab separated value file). > PR: [https://github.com/apache/spark/pull/49233] > h3. Why are the changes needed? > This PR adds an additional option to set the fileExtension. The end result is > that when a separator is set that is not a comma that the output file has a > file extension that matches the separator (e.g. file.tsv, file.psv, etc...). > Notes on Previous Pull Request > [#17973|https://github.com/apache/spark/pull/17973] > A pull request adding this option was discussed 7 years ago. One reason it > wasn't added was: > "I would like to suggest to leave this out if there is no better reason for > now. Downside of this is, it looks this allows arbitrary name and it does not > gurantee the extention is, say, tsv when the delmiter is a tab. It is purely > up to the user." > I don't believe this is a good reason to not let the user set the extension. > If we let them set the delimiter/separator to an arbitrary string/char then > why not let the user also set the file extension to specify the separator > that the file uses (e.g. tsv, psv, etc...). This addition keeps the "csv" > file extension as the default and has the benefit of allowing other > separators to match the file extension. > h3. Does this PR introduce _any_ user-facing change? > Yes. This PR adds one row to the options table for the CSV DataSource > documentation to include the "fileExtension" option. > h3. How was this patch tested? > One unit test was added to validate a file is written with the new extension. > h3. Was this patch authored or co-authored using generative AI tooling? > No -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org