[jira] [Work logged] (HIVE-20917) OpenCSVSerde quotes all columns

ASF GitHub Bot (Jira) Mon, 31 Oct 2022 14:25:29 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-20917?focusedWorklogId=822115&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-822115
 ]


ASF GitHub Bot logged work on HIVE-20917:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 31/Oct/22 21:24
            Start Date: 31/Oct/22 21:24
    Worklog Time Spent: 10m 
      Work Description: github-actions[bot] commented on PR #3718:
URL: https://github.com/apache/hive/pull/3718#issuecomment-1297703933

   
   # @check-spelling-bot Report
   ### :red_circle: Please review
   See the [files](3718/files/) view or the [action 
log](https://github.com/apache/hive/actions/runs/3364862660) for details.
   
   #### Unrecognized words (1)
   
   APPLYQUOTESTOALL
   
   <details><summary>Previously acknowledged words that are now absent
   </summary>aarry bytecode timestamplocal yyyy </details>
   
   <details><summary>To accept these unrecognized words as correct (and remove 
the previously acknowledged and now absent words),
   run the following commands</summary>
   
   ... in a clone of the 
[g...@github.com:gigem/hive.git](https://github.com/gigem/hive.git) repository
   on the `HIVE-20917` branch:
   
   ```
   update_files() {
   perl -e '
   my @expect_files=qw('".github/actions/spelling/expect.txt"');
   @ARGV=@expect_files;
   my @stale=qw('"$patch_remove"');
   my $re=join "|", @stale;
   my $suffix=".".time();
   my $previous="";
   sub maybe_unlink { unlink($_[0]) if $_[0]; }
   while (<>) {
   if ($ARGV ne $old_argv) { maybe_unlink($previous); $previous="$ARGV$suffix"; 
rename($ARGV, $previous); open(ARGV_OUT, ">$ARGV"); select(ARGV_OUT); $old_argv 
= $ARGV; }
   next if /^(?:$re)(?:(?:\r|\n)*$| .*)/; print;
   }; maybe_unlink($previous);'
   perl -e '
   my $new_expect_file=".github/actions/spelling/expect.txt";
   use File::Path qw(make_path);
   use File::Basename qw(dirname);
   make_path (dirname($new_expect_file));
   open FILE, q{<}, $new_expect_file; chomp(my @words = <FILE>); close FILE;
   my @add=qw('"$patch_add"');
   my %items; @items{@words} = @words x (1); @items{@add} = @add x (1);
   @words = sort {lc($a)."-".$a cmp lc($b)."-".$b} keys %items;
   open FILE, q{>}, $new_expect_file; for my $word (@words) { print FILE 
"$word\n" if $word =~ /\w/; };
   close FILE;
   system("git", "add", $new_expect_file);
   '
   }
   
   comment_json=$(mktemp)
   curl -L -s -S \
   -H "Content-Type: application/json" \
   "https://api.github.com/repos/apache/hive/issues/comments/1297703933"; > 
"$comment_json"
   comment_body=$(mktemp)
   jq -r ".body // empty" "$comment_json" > $comment_body
   rm $comment_json
   
   patch_remove=$(perl -ne 'next unless s{^</summary>(.*)</details>$}{$1}; 
print' < "$comment_body")
   
   patch_add=$(perl -e '$/=undef; $_=<>; if (m{Unrecognized 
words[^<]*</summary>\n*```\n*([^<]*)```\n*</details>$}m) { print "$1" } elsif 
(m{Unrecognized words[^<]*\n\n((?:\w.*\n)+)\n}m) { print "$1" };' < 
"$comment_body")
   
   update_files
   rm $comment_body
   git add -u
   ```
   </details>
   
   <!

Issue Time Tracking
-------------------

    Worklog Id:     (was: 822115)
    Time Spent: 20m  (was: 10m)

> OpenCSVSerde quotes all columns
> -------------------------------
>
>                 Key: HIVE-20917
>                 URL: https://issues.apache.org/jira/browse/HIVE-20917
>             Project: Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>            Reporter: nicolas paris
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> The OpenCSVSerde produces a CSV with all its columns quoted 
> no matter of they type or if the string columns contain a separator or not.
>  
> The problem is some readers (such postgresql) are not compatible with
> such CSV, in particular when bulk loading them thought COPY statement.
>  
> I propose a new CsvSerde, based on a Univocity Parser (wich is used by Apache 
> Spark)
> that has been described a 2 times faster thant OpenCSV. 
> [https://github.com/uniVocity/csv-parsers-comparison] . This new CsvSerde 
> whould only quote columns when needed.
>  
> Regards,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-20917) OpenCSVSerde quotes all columns

Reply via email to