Hi all,

I dug through the code and now know how the parts fits together.

Since it is a combination of Elixir & OTP, I would like to ask for your 
input before continuing:

When using the following code, the key code paths are:

*{:ok, io} = StringIO.open("", encoding: :unicode) # Starts a StringIO 
GenServer*
*IO.write(io, <<222>>)*

https://github.com/erlang/otp/blob/56240f084d80de80d5f1c7fef94db68a6f1b81cc/lib/stdlib/src/io.erl#L91
=> :io.request(pid, {:put_chars, :unicode,<<222>>}, _ref)

https://github.com/elixir-lang/elixir/blob/a64d42f5d3cb6c32752af9d3312897e8cd5bb7ec/lib/elixir/lib/string_io.ex#L289
=> {{:error, req}, state}

https://github.com/erlang/otp/blob/56240f084d80de80d5f1c7fef94db68a6f1b81cc/lib/stdlib/src/io.erl#L340
=> badarg

https://github.com/erlang/otp/blob/56240f084d80de80d5f1c7fef94db68a6f1b81cc/lib/stdlib/src/io.erl#L99
=> raises argument error without any additional info

Based on this, I think multiple things are needed:


   1. `io:conv_reason/1` needs to accept a new tuple for the error
      1. I would return the error directly from 
      `unicode:characters_to_binary/3`: {:error, {:encoding, result}}
      2. where result: {error, binary(), RestData} | {incomplete, binary(), 
      binary()}
   2. `io:o_request/2` needs to handle that new error and raise a better 
   error
      1. new error kind?
      2. existing badarg with more detail data?
   3. All those sources need to produce the new error (does not produce OTP 
   incompatibility in Elixir sind everything unknown is converted to badarg):
      1. `StringIO.put_chars/4`
      2. `file_io_server:put_chars/3`
      3. `group:io_request/5`
      4. `standard_error:wrap_characters_to_binary/3`
      5. `user_drv:io_command/1`
      6. `user:wrap_characters_to_binary/2`
      7. `ssh_cli:io_request/4`
   
Does that sound like I'm going into the right direction?

If yes: Should that be one PR in OTP or several?

Thanks for any input & Best,
Jonatan


On Thursday, April 7, 2022 at 2:14:39 PM UTC+8 José Valim wrote:

> Make sure to also check out this doc then: 
> https://github.com/erlang/otp/blob/master/HOWTO/DEVELOPMENT.md Have fun!
>
> On Thu, Apr 7, 2022 at 8:12 AM Jonatan Männchen <[email protected]> 
> wrote:
>
>> I’ve never done a PR to OTP so far. I’ll check it out and ping back here 
>> for updates.
>>
>> Sent from my iPhone
>>
>> On 7 Apr 2022, at 07:09, José Valim <[email protected]> wrote:
>>
>> 
>> I would start with 2 by submitting a PR to Erlang/OTP and then access 
>> their appetite from there. :)
>>
>> On Thu, Apr 7, 2022 at 8:03 AM Jonatan Männchen <[email protected]> 
>> wrote:
>>
>>> I guess leaving nr. 3 out would be fine.
>>>
>>> What would he the proper path to do this in erlang? Just writing some 
>>> erlang example code and opening issues?
>>>
>>> Sent from my iPhone
>>>
>>> On 7 Apr 2022, at 06:46, José Valim <[email protected]> wrote:
>>>
>>> 
>>> All three of them would require changes to Erlang and I would love for 
>>> someone to chase this path. 2 sounds doable with EEP 54 but 1 would 
>>> definitely require more discussion.
>>>
>>> I am not sure about 3. binwrite pretty much means "I know what I am 
>>> doing" and we should allow the user to write whatever they want as is.
>>>
>>> On Thu, Apr 7, 2022 at 7:25 AM Jonatan Männchen <[email protected]> 
>>> wrote:
>>>
>>>> Thanks, that solved my specific issue. I still think that some 
>>>> improvements are needed.
>>>>
>>>> Looking at the code for CaptureIO, I think those changes should be made 
>>>> directly in the StringIO / IO modules and not specifically for CaptureIO.
>>>>
>>>> The following things are not great today IMHO:
>>>>
>>>>    1. The encoding that lets everything through is called latin1. I 
>>>>    think we should introduce & properly document a new encoding called 
>>>>    something like raw_binary. It would work exactly the same though.
>>>>    2. IO.write with invalid characters into an io device with any 
>>>>    encoding should have a better error message. (Something like "<<222>> 
>>>> is 
>>>>    not a valid unicode string. Provide `encoding: :raw_binary` when 
>>>> opening 
>>>>    the io device")
>>>>    3. IO.binwrite with invalid characters into an encoding: :unicode 
>>>>    io device should probably at least warn if invalid characters are 
>>>> passed.
>>>>
>>>> This would something like this in the form of a test: 
>>>> https://gist.github.com/maennchen/f428360a71d23a323538d9b7d51e638b
>>>>
>>>> On Wednesday, April 6, 2022 at 9:29:12 PM UTC+2 Wojtek Mach wrote:
>>>>
>>>>> > Additionally, it would be good if there was a proper error for 
>>>>> invalid characters instead of the currently raised ArgumentError.
>>>>>
>>>>> Yeah the error is pretty bad:
>>>>>
>>>>> IO.puts(<<222>>)
>>>>> ** (ArgumentError) errors were found at the given arguments: unknown 
>>>>> error: put_chars
>>>>>
>>>>>
>>>>> It is slightly more informative when used inside capture io though:
>>>>>
>>>>>     assert capture_io(fn ->
>>>>>              IO.puts(<<222>>)
>>>>>            end) == <<222>>
>>>>> ** (ArgumentError) errors were found at the given arguments: unknown 
>>>>> error: {put_chars,unicode,[<<"Þ">>,10]}
>>>>>
>>>>>
>>>>> We get error because stdio is with unicode encoding:
>>>>>
>>>>> iex> :io.getopts(:standard_io)[:encoding]
>>>>> :unicode
>>>>>
>>>>>
>>>>> and we're writing <<222>> which isn't unicode.
>>>>>
>>>>> For writing in raw mode, use IO.binwrite. This and the previously 
>>>>> mentioned :encoding option will make the following test succeed:
>>>>>
>>>>>     assert capture_io([encoding: :latin1], fn ->
>>>>>              IO.binwrite(<<222>>)
>>>>>            end) == <<222>>
>>>>>
>>>>>
>>>>> On April 6, 2022, "maennchen.ch" <[email protected]> wrote:
>>>>>
>>>>> That unfortunately gives me the same result.
>>>>>
>>>>> On Wednesday, April 6, 2022 at 6:57:35 PM UTC+1 Wojtek Mach wrote:
>>>>>>
>>>>>> I believe capture_io(encoding: :latin1, fun) should do the trick, can 
>>>>>> you check?
>>>>>>
>>>>>> On April 6, 2022, "maennchen.ch" <[email protected]> wrote:
>>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> *Background*
>>>>>>
>>>>>> While developing tests for a mix task, that returns non UTF8 binaries 
>>>>>> into STDOUT (building block to be piped into a file / pipe), I found 
>>>>>> that 
>>>>>> ExUnit.CaptureIO can only handle UTF8 and Latin1.
>>>>>>
>>>>>> Example Test that does not work:
>>>>>> https://gist.github.com/maennchen/16d411eeda3255fa3d3152fe9d836a82
>>>>>>
>>>>>> *Proposal*
>>>>>>
>>>>>> For testing this use case, it would be good if any raw binary would 
>>>>>> also be passed through. (Maybe via option "encoding: :raw_binary")
>>>>>>
>>>>>> Additionally, it would be good if there was a proper error for 
>>>>>> invalid characters instead of the currently raised ArgumentError.
>>>>>>
>>>>>> *Real World Example*
>>>>>>
>>>>>> Here is a real test, that would be made possible by this change: 
>>>>>> https://github.com/elixir-gettext/expo/blob/9048fe242830614f6d4235cbd345de844693f28a/test/mix/tasks/expo.msgfmt_test.exs#L18
>>>>>>
>>>>>> *PR*
>>>>>>
>>>>>> I'm happy to provide a PR for this as well.
>>>>>>
>>>>>> Thanks & Kind Regards,
>>>>>> Jonatan
>>>>>>
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "elixir-lang-core" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to [email protected].
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/elixir-lang-core/e35e97cf-00d6-422e-b3c1-ec508ff1e36fn%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/e35e97cf-00d6-422e-b3c1-ec508ff1e36fn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "elixir-lang-core" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>>
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/elixir-lang-core/9e817fc9-6f61-4476-bb27-c062ed6167fan%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/9e817fc9-6f61-4476-bb27-c062ed6167fan%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elixir-lang-core" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/elixir-lang-core/f2816480-4d91-4576-9241-3bdc9351a920n%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/elixir-lang-core/f2816480-4d91-4576-9241-3bdc9351a920n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> -- 
>>> You received this message because you are subscribed to a topic in the 
>>> Google Groups "elixir-lang-core" group.
>>> To unsubscribe from this topic, visit 
>>> https://groups.google.com/d/topic/elixir-lang-core/RR7nbeHsluQ/unsubscribe
>>> .
>>> To unsubscribe from this group and all its topics, send an email to 
>>> [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4KdzYkX%3DL%3D3%3DYHSJk4R8iq1%3Dbe9262Fg54eX1yLrx-c9g%40mail.gmail.com
>>>  
>>> <https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4KdzYkX%3DL%3D3%3DYHSJk4R8iq1%3Dbe9262Fg54eX1yLrx-c9g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elixir-lang-core" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elixir-lang-core/F3C0BE5C-EC1A-4574-8586-7FC5E7AFD39A%40maennchen.ch
>>>  
>>> <https://groups.google.com/d/msgid/elixir-lang-core/F3C0BE5C-EC1A-4574-8586-7FC5E7AFD39A%40maennchen.ch?utm_medium=email&utm_source=footer>
>>> .
>>>
>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "elixir-lang-core" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/elixir-lang-core/RR7nbeHsluQ/unsubscribe
>> .
>> To unsubscribe from this group and all its topics, send an email to 
>> [email protected].
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4%2BSqS4j-VRG_aBfSfs3uc3c0Q_m3svGTKS%2Bw2OR5aO_rg%40mail.gmail.com
>>  
>> <https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4%2BSqS4j-VRG_aBfSfs3uc3c0Q_m3svGTKS%2Bw2OR5aO_rg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elixir-lang-core" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elixir-lang-core/08729069-5CF7-4400-BAC3-A93E1DA43B1A%40maennchen.ch
>>  
>> <https://groups.google.com/d/msgid/elixir-lang-core/08729069-5CF7-4400-BAC3-A93E1DA43B1A%40maennchen.ch?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/d5886ad9-002e-48fd-948a-067a96770830n%40googlegroups.com.

Reply via email to