On Sat, 8 Sep 2018 11:11:08 -0300 Antonio Terceiro wrote: > On Thu, Sep 06, 2018 at 12:12:57AM +0200, Francesco Poli wrote: > > Proposed strategy > > ================= [...] > > > > What do you think? Is the above described strategy reasonable? Or do > > you see a flaw which will backfire in the future? > > Looks OK to me, but it also looks a little bit too cautious, and > complex.
Hello Antonio! :-) Thanks a lot for your kind reply. I am glad my code doesn't look too crazy... ;-) I acknowledge that my proposed strategy is not the simplest possible one. > In this case you only care about the lines that are uncommented > and only contain ASCII, so you can just ignore everything else: > > ----------------8<----------------8<----------------8<----------------- > $ cat /tmp/ignore_bugs > 123456 > # secönd bug > 234567 > # a package > my-package0+ > $ cat /tmp/read_bugs.rb > ARGV.each do |f| > File.readlines(f, encoding: Encoding::BINARY).each do |line| > puts line if line !~ /^\s*#/ > end > end > $ ruby /tmp/read_bugs.rb /tmp/ignore_bugs > 123456 > 234567 > my-package0+ > $ LANG=C ruby /tmp/read_bugs.rb /tmp/ignore_bugs > 123456 > 234567 > my-package0+ > ----------------8<----------------8<----------------8<----------------- I must confess that I was skeptical about this simple strategy. The reason was that I am not comfortable with the idea that the array of ignored bugs and packages would contain strings tagged as ASCII-8BIT encoded, that is to say, effectively tagged as binary data. Actually, I tried to read the file with BINARY encoding in apt-listbugs, and it seems to work (even in cases where the "ignore_bugs" file includes non comment lines with non-US-ASCII characters, thus violating its format specification...). As in: $ cat ignore_bugs # first bug 123456 # secönd bug 234567 # a package my-package0+ # an invalid line tëxtø If I understand correctly, the reason why it works is that the array is only tested through its include?() method, which basically tests for equality. The equality operator for strings only compares length and content, without comparing the encoding. Hence, it doesn't matter if the array contains binary strings or actual text strings: the test works anyway. Nonetheless, I still feel uneasy with the idea of carrying an array of binary data objects around, when the array is instead supposed to contain strings... Maybe I am not being crystal clear, so I don't know whether you get what I mean. Any other comments on this? P.S.: I am probably annoying everyone too much, hence you are definitely authorized to tell me "come on! stop worrying and love the BINARY encoding!" ;-) P.P.S.: One last question: why "encoding: Encoding::BINARY", in stead of "external_encoding: Encoding::BINARY" ? I thought that "encoding" was meant to set both external_encoding and internal_encoding, as explained in https://ruby-doc.org/core-2.5.1/IO.html#method-c-new-label-IO+Encoding Am I misunderstanding something? -- http://www.inventati.org/frx/ There's not a second to spare! To the laboratory! ..................................................... Francesco Poli . GnuPG key fpr == CA01 1147 9CD2 EFDF FB82 3925 3E1C 27E1 1F69 BFFE
pgpqaSdbOYqUw.pgp
Description: PGP signature