details... celeste:gg mtj$ ls -l ~/corpora/go/corpusZ.tar -rw-r--r-- 1 mtj staff 92386304 Jul 4 02:27 /Users/mtj/corpora/go/corpusZ.tar
this is the Go corpus compressed in chunks and then gathered as a tar file. 92MB at zstd #19, original size is 752āÆ311āÆ514 bytes. (Zstd compresses some large Go files 50:1) Here is looking for strings in that tar's unarchived, decompressed, Go code using gg (https://github.com/MichaelTJones/gg): celeste:gg mtj$ gg -summary -log log -digits s pizza ~/corpora/go/corpusZ.tar /Users/mtj/corpora/go/corpusZ.tar::blob_000022.go: add(pair("blake", "eats pizza")) /Users/mtj/corpora/go/corpusZ.tar::blob_000022.go: t.Fatalf("after pizza, size = %d; want %d", d.dynTab.size, want) /Users/mtj/corpora/go/corpusZ.tar::blob_000022.go: "pizza", /Users/mtj/corpora/go/corpusZ.tar::blob_000022.go: "pizza", /Users/mtj/corpora/go/corpusZ.tar::blob_000022.go: "piwatepizzapkonskowolayangroupharmacyshirahamatonbetsurgutsiracu" + /Users/mtj/corpora/go/corpusZ.tar::blob_000031.go: data: `{"tags": [{"list": [{"one":"pizza"}]}]}`, /Users/mtj/corpora/go/corpusZ.tar::blob_000031.go: output: "pizza", /Users/mtj/corpora/go/corpusZ.tar::blob_000065.go: "raskaunbieidsvollpiwatepizzapkosaigawaplanetariuminanoplantation" + /Users/mtj/corpora/go/corpusZ.tar::blob_000099.go: add(pair("blake", "eats pizza")) /Users/mtj/corpora/go/corpusZ.tar::blob_000099.go: t.Fatalf("after pizza, size = %d; want %d", d.dynTab.size, want) /Users/mtj/corpora/go/corpusZ.tar::blob_000099.go: "pizza", /Users/mtj/corpora/go/corpusZ.tar::blob_000099.go: "pizza", /Users/mtj/corpora/go/corpusZ.tar::blob_000099.go: "piwatepizzapkoninjamisonplanetariuminnesotaketakayamatsumaebashi" + /Users/mtj/corpora/go/corpusZ.tar::blob_000137.go: {"test :\n```bash\nthis is a test\n```\n\ntest\n\n:cool::blush:::pizza:\\:blush : : blush: :pizza:", []byte("test :\n```bash\nthis is a test\n```\n\ntest\n\nšš:š\\:blush : : blush: š")}, /Users/mtj/corpora/go/corpusZ.tar::blob_000137.go: ":pizza:": "\U0001f355", /Users/mtj/corpora/go/corpusZ.tar::blob_000137.go: pizzaMessage := emoji.Sprint("I like a :pizza: and :sushi:!!") performance grep 16 matches work 752āÆ311āÆ514 bytes, 170āÆ302āÆ873 tokens, 22āÆ927āÆ078 lines, 176 files time 3.975375 sec elapsed, 25.803189 sec user + 0.985283 system rate 189āÆ242āÆ887 bytes/sec, 42āÆ839āÆ444 tokens/sec, 5āÆ767āÆ273 lines/sec, 44 files/sec cpus 7 workers (parallel speedup = 6.74x) celeste:gg mtj$ This rate, 189 mb/sec, is on a 4 core 8 thread macbook pro and is the average over the whole corpus, so reflects not just lexical tokenization of 752 MB of Go code, grep-like regular expression matching for "pizza", but also Klaus' Go-only, native Zstandard decompression, which transforms the 92 MB tar file to the 752 MB of code. Here are timing numbers, per file, for Zstd expansion: (gg -log log) celeste:gg mtj$ cat log 2019/07/11 10:05:31.433048 scan begins 2019/07/11 10:05:31.433240 processing files listed on command line 2019/07/11 10:05:31.433324 processing tar archive /Users/mtj/corpora/go/corpusZ.tar 2019/07/11 10:05:31.471641 206993 ā 4023123 bytes (19.436Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000005.go.zst 2019/07/11 10:05:31.475658 97312 ā 4889323 bytes (50.244Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000003.go.zst 2019/07/11 10:05:31.478435 588144 ā 4058916 bytes ( 6.901Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000000.go.zst 2019/07/11 10:05:31.478751 252664 ā 4343752 bytes (17.192Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000001.go.zst 2019/07/11 10:05:31.483804 254045 ā 4245209 bytes (16.710Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000006.go.zst 2019/07/11 10:05:31.483819 93940 ā 4666515 bytes (49.675Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000004.go.zst 2019/07/11 10:05:31.484640 212543 ā 5266751 bytes (24.780Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000002.go.zst 2019/07/11 10:05:31.596666 796173 ā 5097981 bytes ( 6.403Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000012.go.zst 2019/07/11 10:05:31.605878 222357 ā 4149988 bytes (18.664Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000013.go.zst 2019/07/11 10:05:31.606986 115654 ā 5244204 bytes (45.344Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000008.go.zst 2019/07/11 10:05:31.621446 202508 ā 4353542 bytes (21.498Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000010.go.zst 2019/07/11 10:05:31.624954 326016 ā 4010657 bytes (12.302Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000011.go.zst 2019/07/11 10:05:31.632554 207580 ā 4065186 bytes (19.584Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000009.go.zst 2019/07/11 10:05:31.645472 174997 ā 4008262 bytes (22.905Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000007.go.zst 2019/07/11 10:05:31.740442 426643 ā 4001493 bytes ( 9.379Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000019.go.zst 2019/07/11 10:05:31.744056 264530 ā 4046021 bytes (15.295Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000016.go.zst 2019/07/11 10:05:31.745338 627327 ā 4004365 bytes ( 6.383Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000017.go.zst 2019/07/11 10:05:31.751263 258532 ā 4042393 bytes (15.636Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000018.go.zst 2019/07/11 10:05:31.751756 471019 ā 4008325 bytes ( 8.510Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000015.go.zst 2019/07/11 10:05:31.760835 379616 ā 4320295 bytes (11.381Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000014.go.zst 2019/07/11 10:05:31.802144 556027 ā 4003356 bytes ( 7.200Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000020.go.zst 2019/07/11 10:05:31.901767 372064 ā 4193635 bytes (11.271Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000024.go.zst 2019/07/11 10:05:31.909306 397412 ā 4554722 bytes (11.461Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000023.go.zst 2019/07/11 10:05:31.916362 1248756 ā 8379069 bytes ( 6.710Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000026.go.zst 2019/07/11 10:05:31.932392 770723 ā 4808476 bytes ( 6.239Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000022.go.zst 2019/07/11 10:05:31.970928 224452 ā 4027062 bytes (17.942Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000025.go.zst 2019/07/11 10:05:32.002406 651210 ā 4400457 bytes ( 6.757Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000027.go.zst 2019/07/11 10:05:32.004718 1231592 ā 4316079 bytes ( 3.504Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000021.go.zst 2019/07/11 10:05:32.087856 353702 ā 4029909 bytes (11.394Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000031.go.zst 2019/07/11 10:05:32.105936 848426 ā 4039351 bytes ( 4.761Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000030.go.zst 2019/07/11 10:05:32.125037 650697 ā 4003712 bytes ( 6.153Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000029.go.zst 2019/07/11 10:05:32.149261 518660 ā 4008953 bytes ( 7.729Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000032.go.zst 2019/07/11 10:05:32.157127 654288 ā 4007926 bytes ( 6.126Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000033.go.zst 2019/07/11 10:05:32.183090 896938 ā 4703274 bytes ( 5.244Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000028.go.zst 2019/07/11 10:05:32.213361 744952 ā 4053128 bytes ( 5.441Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000034.go.zst 2019/07/11 10:05:32.248550 491073 ā 4008267 bytes ( 8.162Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000038.go.zst 2019/07/11 10:05:32.265478 680203 ā 4001134 bytes ( 5.882Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000037.go.zst 2019/07/11 10:05:32.271071 734833 ā 4000138 bytes ( 5.444Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000036.go.zst 2019/07/11 10:05:32.284729 360922 ā 4005048 bytes (11.097Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000039.go.zst 2019/07/11 10:05:32.303961 308798 ā 4018343 bytes (13.013Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000035.go.zst 2019/07/11 10:05:32.306066 391388 ā 4010727 bytes (10.247Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000040.go.zst 2019/07/11 10:05:32.362341 484355 ā 4001344 bytes ( 8.261Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000045.go.zst 2019/07/11 10:05:32.380463 375405 ā 4000209 bytes (10.656Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000041.go.zst 2019/07/11 10:05:32.417273 358429 ā 4046348 bytes (11.289Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000044.go.zst 2019/07/11 10:05:32.420960 375823 ā 4378995 bytes (11.652Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000043.go.zst 2019/07/11 10:05:32.444477 730772 ā 4246251 bytes ( 5.811Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000046.go.zst 2019/07/11 10:05:32.459183 604840 ā 4027513 bytes ( 6.659Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000042.go.zst 2019/07/11 10:05:32.480410 518876 ā 4008279 bytes ( 7.725Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000047.go.zst 2019/07/11 10:05:32.511231 330245 ā 4667908 bytes (14.135Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000048.go.zst 2019/07/11 10:05:32.538113 641796 ā 4017023 bytes ( 6.259Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000052.go.zst 2019/07/11 10:05:32.587558 686143 ā 4358371 bytes ( 6.352Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000053.go.zst 2019/07/11 10:05:32.592458 1011230 ā 4002547 bytes ( 3.958Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000051.go.zst 2019/07/11 10:05:32.597859 627406 ā 4003373 bytes ( 6.381Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000050.go.zst 2019/07/11 10:05:32.605734 502047 ā 4009282 bytes ( 7.986Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000049.go.zst 2019/07/11 10:05:32.655583 586706 ā 4066973 bytes ( 6.932Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000054.go.zst 2019/07/11 10:05:32.679516 620595 ā 4002107 bytes ( 6.449Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000059.go.zst 2019/07/11 10:05:32.687838 369297 ā 4001749 bytes (10.836Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000058.go.zst 2019/07/11 10:05:32.691284 408358 ā 4008987 bytes ( 9.817Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000055.go.zst 2019/07/11 10:05:32.744006 636283 ā 4028244 bytes ( 6.331Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000060.go.zst 2019/07/11 10:05:32.768291 277219 ā 4002291 bytes (14.437Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000057.go.zst 2019/07/11 10:05:32.781058 557989 ā 4005679 bytes ( 7.179Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000056.go.zst 2019/07/11 10:05:32.822744 414283 ā 4269034 bytes (10.305Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000061.go.zst 2019/07/11 10:05:32.835958 468418 ā 4059427 bytes ( 8.666Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000066.go.zst 2019/07/11 10:05:32.847792 608908 ā 4014328 bytes ( 6.593Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000062.go.zst 2019/07/11 10:05:32.888771 690474 ā 4000820 bytes ( 5.794Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000065.go.zst 2019/07/11 10:05:32.897517 521619 ā 4150632 bytes ( 7.957Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000067.go.zst 2019/07/11 10:05:32.921717 680148 ā 4000246 bytes ( 5.881Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000064.go.zst 2019/07/11 10:05:32.956136 639151 ā 4000706 bytes ( 6.259Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000063.go.zst 2019/07/11 10:05:32.962578 471848 ā 4040865 bytes ( 8.564Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000073.go.zst 2019/07/11 10:05:32.991789 489327 ā 4005132 bytes ( 8.185Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000068.go.zst 2019/07/11 10:05:33.020254 523091 ā 4004755 bytes ( 7.656Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000069.go.zst 2019/07/11 10:05:33.025607 450401 ā 4083159 bytes ( 9.066Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000072.go.zst 2019/07/11 10:05:33.029834 520708 ā 4007521 bytes ( 7.696Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000074.go.zst 2019/07/11 10:05:33.050782 378552 ā 4001925 bytes (10.572Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000071.go.zst 2019/07/11 10:05:33.118232 269084 ā 4009476 bytes (14.900Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000080.go.zst 2019/07/11 10:05:33.138117 592627 ā 4526083 bytes ( 7.637Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000075.go.zst 2019/07/11 10:05:33.140114 738182 ā 4008928 bytes ( 5.431Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000070.go.zst 2019/07/11 10:05:33.167534 663139 ā 4001954 bytes ( 6.035Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000079.go.zst 2019/07/11 10:05:33.177132 516721 ā 4001675 bytes ( 7.744Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000081.go.zst 2019/07/11 10:05:33.189430 499878 ā 4000568 bytes ( 8.003Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000078.go.zst 2019/07/11 10:05:33.216166 308152 ā 4067621 bytes (13.200Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000076.go.zst 2019/07/11 10:05:33.249731 582032 ā 4394445 bytes ( 7.550Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000082.go.zst 2019/07/11 10:05:33.269010 364537 ā 4396798 bytes (12.061Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000087.go.zst 2019/07/11 10:05:33.283364 184365 ā 4373843 bytes (23.724Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000088.go.zst 2019/07/11 10:05:33.287415 712220 ā 4055369 bytes ( 5.694Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000077.go.zst 2019/07/11 10:05:33.321402 751294 ā 4003781 bytes ( 5.329Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000086.go.zst 2019/07/11 10:05:33.324074 354366 ā 4002277 bytes (11.294Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000085.go.zst 2019/07/11 10:05:33.334724 716464 ā 4006384 bytes ( 5.592Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000094.go.zst 2019/07/11 10:05:33.355151 363508 ā 4001510 bytes (11.008Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000083.go.zst 2019/07/11 10:05:33.400834 598521 ā 4620592 bytes ( 7.720Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000095.go.zst 2019/07/11 10:05:33.409613 198910 ā 4094836 bytes (20.586Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000089.go.zst 2019/07/11 10:05:33.435069 386309 ā 4002560 bytes (10.361Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000084.go.zst 2019/07/11 10:05:33.445655 184047 ā 4364073 bytes (23.712Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000090.go.zst 2019/07/11 10:05:33.457476 541657 ā 4290849 bytes ( 7.922Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000092.go.zst 2019/07/11 10:05:33.470683 868726 ā 4003490 bytes ( 4.608Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000093.go.zst 2019/07/11 10:05:33.495689 1260317 ā 5556243 bytes ( 4.409Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000101.go.zst 2019/07/11 10:05:33.519851 579813 ā 4005688 bytes ( 6.909Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000096.go.zst 2019/07/11 10:05:33.543813 672782 ā 4002577 bytes ( 5.949Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000102.go.zst 2019/07/11 10:05:33.551926 566709 ā 4006434 bytes ( 7.070Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000097.go.zst 2019/07/11 10:05:33.571590 569324 ā 4041363 bytes ( 7.099Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000100.go.zst 2019/07/11 10:05:33.583279 200062 ā 5051258 bytes (25.248Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000091.go.zst 2019/07/11 10:05:33.606004 472200 ā 4024953 bytes ( 8.524Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000099.go.zst 2019/07/11 10:05:33.635531 575127 ā 4001741 bytes ( 6.958Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000108.go.zst 2019/07/11 10:05:33.686650 534639 ā 4012273 bytes ( 7.505Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000109.go.zst 2019/07/11 10:05:33.695229 1285860 ā 5272770 bytes ( 4.101Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000103.go.zst 2019/07/11 10:05:33.720187 412864 ā 4049274 bytes ( 9.808Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000107.go.zst 2019/07/11 10:05:33.720982 605166 ā 4005251 bytes ( 6.618Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000098.go.zst 2019/07/11 10:05:33.737777 726153 ā 4001435 bytes ( 5.510Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000104.go.zst 2019/07/11 10:05:33.755917 799097 ā 4000015 bytes ( 5.006Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000106.go.zst 2019/07/11 10:05:33.773795 422867 ā 4000786 bytes ( 9.461Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000115.go.zst 2019/07/11 10:05:33.810990 393810 ā 4001894 bytes (10.162Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000110.go.zst 2019/07/11 10:05:33.847795 549097 ā 4457876 bytes ( 8.119Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000114.go.zst 2019/07/11 10:05:33.863927 271390 ā 4008218 bytes (14.769Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000105.go.zst 2019/07/11 10:05:33.867743 233106 ā 4000444 bytes (17.161Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000111.go.zst 2019/07/11 10:05:33.867836 562574 ā 4003299 bytes ( 7.116Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000116.go.zst 2019/07/11 10:05:33.911283 311625 ā 4018660 bytes (12.896Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000112.go.zst 2019/07/11 10:05:33.925529 694084 ā 4509171 bytes ( 6.497Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000113.go.zst 2019/07/11 10:05:33.929689 455910 ā 4000240 bytes ( 8.774Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000122.go.zst 2019/07/11 10:05:33.943061 568708 ā 4969481 bytes ( 8.738Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000117.go.zst 2019/07/11 10:05:33.999025 701356 ā 4041892 bytes ( 5.763Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000121.go.zst 2019/07/11 10:05:34.019081 947328 ā 5271725 bytes ( 5.565Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000123.go.zst 2019/07/11 10:05:34.046375 503850 ā 4000020 bytes ( 7.939Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000118.go.zst 2019/07/11 10:05:34.067051 264587 ā 4053813 bytes (15.321Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000120.go.zst 2019/07/11 10:05:34.088091 531199 ā 4705989 bytes ( 8.859Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000129.go.zst 2019/07/11 10:05:34.091944 502142 ā 4010187 bytes ( 7.986Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000119.go.zst 2019/07/11 10:05:34.121482 416870 ā 4002183 bytes ( 9.601Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000124.go.zst 2019/07/11 10:05:34.126305 503153 ā 4010237 bytes ( 7.970Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000130.go.zst 2019/07/11 10:05:34.137981 573336 ā 4029106 bytes ( 7.027Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000128.go.zst 2019/07/11 10:05:34.203384 464844 ā 4022011 bytes ( 8.652Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000127.go.zst 2019/07/11 10:05:34.209802 434146 ā 4005918 bytes ( 9.227Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000125.go.zst 2019/07/11 10:05:34.226746 562023 ā 4036194 bytes ( 7.182Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000126.go.zst 2019/07/11 10:05:34.253277 534846 ā 4298152 bytes ( 8.036Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000136.go.zst 2019/07/11 10:05:34.283738 474416 ā 4015827 bytes ( 8.465Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000135.go.zst 2019/07/11 10:05:34.297443 676273 ā 4017143 bytes ( 5.940Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000137.go.zst 2019/07/11 10:05:34.350197 485533 ā 4003784 bytes ( 8.246Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000134.go.zst 2019/07/11 10:05:34.367692 558522 ā 4012951 bytes ( 7.185Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000133.go.zst 2019/07/11 10:05:34.379223 2771127 ā 13048604 bytes ( 4.709Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000131.go.zst 2019/07/11 10:05:34.385873 2528170 ā 10905639 bytes ( 4.314Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000132.go.zst 2019/07/11 10:05:34.420100 309923 ā 4001685 bytes (12.912Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000143.go.zst 2019/07/11 10:05:34.440169 376289 ā 4028740 bytes (10.707Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000142.go.zst 2019/07/11 10:05:34.447682 316988 ā 4001667 bytes (12.624Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000144.go.zst 2019/07/11 10:05:34.496823 290607 ā 4002723 bytes (13.774Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000141.go.zst 2019/07/11 10:05:34.522682 434315 ā 4018276 bytes ( 9.252Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000150.go.zst 2019/07/11 10:05:34.529963 658292 ā 4000448 bytes ( 6.077Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000140.go.zst 2019/07/11 10:05:34.543966 279896 ā 4110833 bytes (14.687Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000151.go.zst 2019/07/11 10:05:34.572687 1086021 ā 4494390 bytes ( 4.138Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000149.go.zst 2019/07/11 10:05:34.576312 611655 ā 4095210 bytes ( 6.695Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000138.go.zst 2019/07/11 10:05:34.626874 317859 ā 4172019 bytes (13.125Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000148.go.zst 2019/07/11 10:05:34.638622 376698 ā 4003119 bytes (10.627Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000157.go.zst 2019/07/11 10:05:34.639242 533870 ā 4003901 bytes ( 7.500Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000139.go.zst 2019/07/11 10:05:34.690782 274293 ā 4083501 bytes (14.887Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000147.go.zst 2019/07/11 10:05:34.702534 744282 ā 4006230 bytes ( 5.383Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000156.go.zst 2019/07/11 10:05:34.723371 291584 ā 4168844 bytes (14.297Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000145.go.zst 2019/07/11 10:05:34.729076 422603 ā 4006953 bytes ( 9.482Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000155.go.zst 2019/07/11 10:05:34.731728 711141 ā 4008574 bytes ( 5.637Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000158.go.zst 2019/07/11 10:05:34.775064 603118 ā 4136079 bytes ( 6.858Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000163.go.zst 2019/07/11 10:05:34.792702 340353 ā 4424067 bytes (12.998Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000164.go.zst 2019/07/11 10:05:34.796162 618352 ā 4002249 bytes ( 6.472Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000154.go.zst 2019/07/11 10:05:34.820524 289494 ā 4036048 bytes (13.942Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000146.go.zst 2019/07/11 10:05:34.820605 309016 ā 5260095 bytes (17.022Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000152.go.zst 2019/07/11 10:05:34.865440 131045 ā 4280066 bytes (32.661Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000165.go.zst 2019/07/11 10:05:34.870480 406615 ā 4000797 bytes ( 9.839Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000162.go.zst 2019/07/11 10:05:34.918977 252124 ā 4012267 bytes (15.914Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000153.go.zst 2019/07/11 10:05:34.927295 456129 ā 4042538 bytes ( 8.863Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000161.go.zst 2019/07/11 10:05:34.942748 717740 ā 4319130 bytes ( 6.018Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000170.go.zst 2019/07/11 10:05:35.011325 731744 ā 4013175 bytes ( 5.484Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000171.go.zst 2019/07/11 10:05:35.040012 565284 ā 4163041 bytes ( 7.365Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000169.go.zst 2019/07/11 10:05:35.062413 289918 ā 4000001 bytes (13.797Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000159.go.zst 2019/07/11 10:05:35.066697 703842 ā 4000001 bytes ( 5.683Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000160.go.zst 2019/07/11 10:05:35.075955 190967 ā 4197565 bytes (21.981Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000168.go.zst 2019/07/11 10:05:35.085458 290878 ā 4010780 bytes (13.789Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000172.go.zst 2019/07/11 10:05:35.170622 97265 ā 4243884 bytes (43.632Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000166.go.zst 2019/07/11 10:05:35.184665 171285 ā 4069187 bytes (23.757Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000167.go.zst 2019/07/11 10:05:35.224262 534458 ā 2880005 bytes ( 5.389Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000175.go.zst 2019/07/11 10:05:35.312594 619070 ā 4002126 bytes ( 6.465Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000174.go.zst 2019/07/11 10:05:35.313823 518866 ā 4350379 bytes ( 8.384Ć) decompress and scan /Users/mtj/corpora/go/corpusZ.tar::blob_000173.go.zst 2019/07/11 10:05:35.408361 scan ends 2019/07/11 10:05:35.408391 performance 2019/07/11 10:05:35.408400 grep 16 matches 2019/07/11 10:05:35.408408 work 752āÆ311āÆ514 bytes, 170āÆ302āÆ873 tokens, 22āÆ927āÆ078 lines, 176 files 2019/07/11 10:05:35.408417 time 3.975375 sec elapsed, 25.803189 sec user + 0.985283 system 2019/07/11 10:05:35.408424 rate 189āÆ242āÆ887 bytes/sec, 42āÆ839āÆ444 tokens/sec, 5āÆ767āÆ273 lines/sec, 44 files/sec 2019/07/11 10:05:35.408431 cpus 7 workers (parallel speedup = 6.74x) celeste:gg mtj$ There is no doubt that Zstd is a brilliance, nor that Klaus' library is excellent. On Thu, Jul 11, 2019 at 9:55 AM Aliaksandr Valialkin <valy...@gmail.com> wrote: > > > On Thu, Jul 11, 2019 at 7:29 PM Michael Jones <michael.jo...@gmail.com> > wrote: > >> I use Klaus' library to decompress ~GiB files that have been compressed >> by zstd command line (c/c++ code) at level 19. Works great. >> > > Thanks for sharing this information! > > >> On Thu, Jul 11, 2019 at 9:10 AM Klaus Post <klausp...@gmail.com> wrote: >> >>> On Thursday, 11 July 2019 17:37:09 UTC+2, Aliaksandr Valialkin wrote: >>>> >>>> >>>> >>>> This is really great idea! Will try implementing it. >>>> >>>> Does github.com/klauspost/compress support all the levels for data >>>> decompression? VictoriaMetrics varies compression level depending on the >>>> data type. It would be great if github.com/klauspost/compress could >>>> decompress data compressed by the upstream zstd on higher compression >>>> levels. >>>> >>> >>> Decompression will work for all input. It is implementing the full spec. >>> >> > Great! I filed feature request for implementing pure Go builds for > VictoriaMetrics - > https://github.com/VictoriaMetrics/VictoriaMetrics/issues/94 . > > >> >>> Compression has "Fastest" and "Default" implemented, roughly matching >>> level 1 and 3 in zstd in speed and performance. I plan on adding something >>> around level 7-9 (as Better) and level 19 (as Best). But for it to be >>> useful I have mainly focused on the fastest modes. I also am planning more >>> concurrency in compression and decompression for streams - blocks will >>> probably remain as single goroutines. For now I am taking a small break and >>> having a bit of fun revisiting deflate and experimenting with Snappy. >>> >>> If there is anything I can do to help feel free to ping me. >>> >>> >>> /Klaus >>> >>> >>>> >>>> -- >>>> Best Regards, >>>> >>>> Aliaksandr >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "golang-nuts" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to golang-nuts+unsubscr...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/golang-nuts/b12c7562-b3a6-426b-bb1c-a62fcfc41714%40googlegroups.com >>> <https://groups.google.com/d/msgid/golang-nuts/b12c7562-b3a6-426b-bb1c-a62fcfc41714%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> >> -- >> >> *Michael T. jonesmichael.jo...@gmail.com <michael.jo...@gmail.com>* >> >> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "golang-nuts" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/golang-nuts/onlD1GIG00g/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> golang-nuts+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/golang-nuts/CALoEmQwocTkYXf7bn39mxpkhuF%2Bynogb8BC6YwzXa9%3Dj89%3DvQw%40mail.gmail.com >> <https://groups.google.com/d/msgid/golang-nuts/CALoEmQwocTkYXf7bn39mxpkhuF%2Bynogb8BC6YwzXa9%3Dj89%3DvQw%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > > -- > Best Regards, > > Aliaksandr > -- *Michael T. jonesmichael.jo...@gmail.com <michael.jo...@gmail.com>* -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CALoEmQxLZiPEiiYkMyVeLXweVqHEzbFiZJSUpZE_fJXomgj%3DXQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.