Re: Custom Serde with thorn

2011-09-07 Thread Jasper Knulst
have you tried? 2011/5/9 Jasper Knulst > Hi Ankit, > > > I got this in my java mapper code > > String oldSeperator = "�"; //the thorn as java sees it > String newSeperator = "~"; > > In Eclipse it shows as �, which is the standard java way of saying "I don't > know this multibyte character". > >

Re: Custom Serde with thorn

2011-05-09 Thread Jasper Knulst
Hi Ankit, I got this in my java mapper code String oldSeperator = "�"; //the thorn as java sees it String newSeperator = "~"; In Eclipse it shows as �, which is the standard java way of saying "I don't know this multibyte character". When you copy paste this � to the linux shell it depicts as

Re: Custom Serde with thorn

2011-05-09 Thread ankit bhatnagar
Hi Jasper, could you please share your MR program. I am not able to grab this character Ankit

Re: Custom Serde with thorn

2011-05-09 Thread Jasper Knulst
Hi Ankit, It all depends on your environment and locale en encoding. This proved to work in my case, but I believe to have seen your characters as well, but after all it is not your browser that has to do the work and interpret the multibyte character. That is the main problem with the thorn; ever

Re: Custom Serde with thorn

2011-05-09 Thread ankit bhatnagar
Hi Jasper, How did you find - 'þ' My browser shows this - � Ankit

Re: Custom Serde with thorn

2011-05-08 Thread Jasper Knulst
Hi Ankit, I know your problem because I had to deal with a thorn 'þ' separated file too. Hive ,so far, cannot handle multibyte separators so I turned to the custom SerDe option myself. If you manage to capture the 'þ' in the regex you could try I tried: ROW FORMAT SERDE 'org.apache.hadoop.hive.