Yes, a Bean is probably the best way to do the work.
However, I tried to inject the exchange, get the body as InputStream and read
the first 4 bytes from the body (because an InputStream is a byte
representation and therefore not encoded). When I read a file that is UTF-16
(Big endian) encoded, I get the output "Hex: efbfbdef"
public void determineEncoding(Exchange exchange) throws Exception {
InputStream is = exchange.getIn().getBody(InputStream.class);
DataInputStream dis = new DataInputStream(is);
int fourBytes = dis.readInt();
String hex = Integer.toHexString(fourBytes);
log.info("Hex: " + hex);
}
But when I read the file directly, I get the output "Hex: feff003c"
public void testUtf16BeBom() throws Exception {
InputStream utf16FileStream =
this.getClass().getClassLoader().getResourceAsStream("testfiles/XmlUtf16Be.xml");
DataInputStream dis = new DataInputStream(utf16FileStream);
int fourBytes = dis.readInt();
String hex = Integer.toHexString(fourBytes);
log.info("Hex: " + hex);
}
The output of the direct read is correct since "feff" is the UTF-16 BE BOM,
followed by "003c" which is the first character "<" in a 2-byte representation.
Any idea why the output through the Camel route/Bean is wrong? Is it because
the body has already be encoded (with a wrong encoding)?
Thanks
Stephan
-----Ursprüngliche Nachricht-----
Von: souciance [mailto:[email protected]]
Gesendet: Donnerstag, 4. Mai 2017 12:13
An: [email protected]
Betreff: Re: Charset on file poller endpoint
Probably the easiest is to read the file and send the exchange to a bean.
In the bean try to read it and determine the encoding and if it has a BOM
character. Finally do your conversion and put the body back to the exchange.
from(file:/myDir)
.to(DetermineEncoding.class, "determineEncoding")
.to(activemq:queue:myQueue)
On Thu, May 4, 2017 at 12:01 PM, Burkard Stephan [via Camel] <
[email protected]> wrote:
> Hi Camel users
>
> I read files with a Camel file poller and they can have different
> encodings (UTF-8 with or without BOM, UTF-16). Therefore I would like
> to determine the given encoding and convert the message body to UTF-8
> without BOM for the further processing.
>
> How can I do this and what is exactly the result in the message
> payload in the exchange? Is it payload an inputstream (just bytes, no
> encoding) or is it already converted to a string or a reader (already
> encoded).
>
> And what does the "charset" option change? Does it overwrite the
> default encoding of the operating system?
>
> from(file:/myDir)
> // can I read here the first bytes of the file?
> .to(activemq:queue:myQueue)
>
> Thanks for any hints
> Stephan
>
>
> ------------------------------
> If you reply to this email, your message will be added to the
> discussion
> below:
> http://camel.465427.n5.nabble.com/Charset-on-file-poller-
> endpoint-tp5798625.html
> To start a new topic under Camel - Users, email ml+s465427n465428h31@n5.
> nabble.com
> To unsubscribe from Camel - Users, click here
> <http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=unsu
> bscribe_by_code&node=465428&code=c291Y2lhbmNlLmVxZGFtLnJhc2h0aUBnbWFpb
> C5jb218NDY1NDI4fDE1MzI5MTE2NTY=>
> .
> NAML
> <http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=macr
> o_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namesp
> aces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.vi
> ew.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%
> 3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%2
> 1nabble%3Aemail.naml>
>
--
View this message in context:
http://camel.465427.n5.nabble.com/Charset-on-file-poller-endpoint-tp5798625p5798627.html
Sent from the Camel - Users mailing list archive at Nabble.com.