have you tried?
2011/5/9 Jasper Knulst
> Hi Ankit,
>
>
> I got this in my java mapper code
>
> String oldSeperator = "�"; //the thorn as java sees it
> String newSeperator = "~";
>
> In Eclipse it shows as �, which is the standard java way of saying "I don't
> know this multibyte character".
>
>
Hi Ankit,
I got this in my java mapper code
String oldSeperator = "�"; //the thorn as java sees it
String newSeperator = "~";
In Eclipse it shows as �, which is the standard java way of saying "I don't
know this multibyte character".
When you copy paste this � to the linux shell it depicts as
Hi Jasper,
could you please share your MR program.
I am not able to grab this character
Ankit
Hi Ankit,
It all depends on your environment and locale en encoding. This proved to
work in my case, but I believe to have seen your characters as well, but
after all it is not your browser that has to do the work and interpret the
multibyte character. That is the main problem with the thorn; ever
Hi Jasper,
How did you find - 'þ'
My browser shows this - �
Ankit
Hi Ankit,
I know your problem because I had to deal with a thorn 'þ' separated file
too. Hive ,so far, cannot handle multibyte separators so I turned to the
custom SerDe option myself. If you manage to capture the 'þ' in the regex
you could try
I tried:
ROW FORMAT SERDE 'org.apache.hadoop.hive.