I was able to solve my issue.
Once I verified that all my simulation logic was valid, I started looking
for reasons why the registrations in my decorator weren't being picked up.
Knowing that this was a consistent behavior, as opposed to what I
originally thought, helped greatly (thanks Bill).
Ultimately, I found a component in chill that contains an implementation of
KryoFactory. Substituting this for the default one storm provides solved my
problem.
In case someone else happens upon this with the same issue, I first had to
add chill-storm as a dependency in sbt. Then I added the following to my
topology's configuration:
conf.put("com.twitter.chill.config.configuredinstantiator",
"com.twitter.chill.ScalaKryoInstantiator")
conf.setKryoFactory(classOf[com.twitter.chill.storm.BlizzardKryoFactory])
The first line tells chill which KryoInstantiator to use (it has many).
Also, with this in place, the KryoDecorator I have, which I still needed
for my custom serialization, worked fine as well.
Matthew
On Mon, Mar 2, 2015 at 12:22 PM, Brunner, Bill <[email protected]>
wrote:
> Yeah, I use the storm serializer out of the box… I had a chill
> implementation a while back but didn’t notice much of an improvement in my
> case. But my particular use case is not designed to be super fast so I
> can’t really answer that irt a high performance system. I’ve only ever
> run into serialization problems with scala maps and the filterKeys method,
> which is documented as unserializable anyway (and simple enough to work
> around).
>
>
>
> *From:* Matthew Waymost [mailto:[email protected]]
> *Sent:* Monday, March 02, 2015 2:50 PM
> *To:* [email protected]
> *Subject:* Re: KryoDecorator not working when setNumWorkers > 1
>
>
>
> I didn't realize that locally storm would optimize to not serialize, but
> that makes total sense and is extremely helpful to know.
>
>
>
> I've had issues in the past with kryo not properly serializing scala case
> classes, and I've solved by adding twitter/chill's scala registrations
> before. So I assumed I would need the same thing here as I didn't see any
> documentation indicating that they were already included.
>
>
>
> The custom serializer is for a class that uses MapProxy (which I need to
> get away from using admittedly). Neither kryo nor chill have handled
> MapProxy properly in the past, so that's what the custom serializer is for.
>
>
>
> I'll definitely take a much closer look at my serialization logic and see
> if I can isolate the problem there.
>
>
>
> Out of curiosity, do you typically use java's built-in serialization
> instead of kryo? I've read and heard that it's very slow and inefficient,
> so I'd be interested in hearing your experience.
>
>
>
> On Mon, Mar 2, 2015 at 6:49 AM, Brunner, Bill <[email protected]>
> wrote:
>
> The reason your code is working locally or with a single worker is because
> there is no reason for serialization to happen when everything is contained
> in the same JVM. Once you add a worker, your parallelism hint now has the
> opportunity to ship the tuples to another JVM, thus serialization has to
> occur. So the issue is not with an increasing number of workers, it’s with
> your serialization. I am using scala as well and have yet to uncover an
> instance where I needed custom serialization… the out of the box java
> serialization seems to work well.
>
>
>
> *From:* Matthew Waymost [mailto:[email protected]]
> *Sent:* Friday, February 27, 2015 4:14 PM
> *To:* [email protected]
> *Subject:* KryoDecorator not working when setNumWorkers > 1
>
>
>
> Hi everybody,
>
>
>
> I'm a new user to storm and have hit a roadblock in getting my topology to
> run over multiple workers.
>
>
>
> Our codebase is in scala and we send scala classes to storm, so I'm using
> a kryo decorator to call to chill's scala registrar to add all the
> serialization logic for scala classes to kryo. In addition, I have a custom
> serializer than I'm adding in the same decorator.
>
>
>
> This has worked perfectly fine for me so far locally and on our cluster
> until I tried turning up the number of workers on which the topology runs.
> When I use conf.setNumWorkers to set the number of workers greater than 1,
> the topology gives me InvalidClassExceptions when attempting to deserialize
> our classes. Removing the setNumWorkers call such that the number of
> workers stays at the default of 1 resolves the problem and everything runs
> fine.
>
>
>
> I'm completely stumped as to why this is happening, and I'm not sure how
> to diagnose the issue. I've tried the following:
>
>
>
> * Configure the decorator through storm.yaml instead of in source code on
> all worker nodes and nimbus.
>
> * Kill the topology, shut down all worker nodes, nimbus, and zookeeper,
> clear all temporary data, and bring it all back up.
>
> * Verify that everything is using the same version of storm
>
> * Searching google and staring at code
>
>
>
> Looking at what's going on in the UI, it doesn't fail at the very first
> chance either. It appears only to fail around the part of the topology
> where I have a parallelismHint set, which is a few steps in. So I'm
> guessing it's directly a result of trying to run it over multiple workers,
> but I don't know what to do with that info.
>
>
>
> We're running openjdk 7, zk 3.4.6, and storm 0.9.3 on gce. We've got 1 zk
> server, 1 nimbus server, and 3 worker servers. The call to the topology is
> made over drpc, and drpc is hosted on the nimbus server. The topology is
> implemented using trident.
>
>
>
> Thanks for any help you can provide.
>
>
>
> Matthew
> ------------------------------
>
> This message, and any attachments, is for the intended recipient(s) only,
> may contain information that is privileged, confidential and/or proprietary
> and subject to important terms and conditions available at
> http://www.bankofamerica.com/emaildisclaimer. If you are not the intended
> recipient, please delete this message.
>
>
> ------------------------------
> This message, and any attachments, is for the intended recipient(s) only,
> may contain information that is privileged, confidential and/or proprietary
> and subject to important terms and conditions available at
> http://www.bankofamerica.com/emaildisclaimer. If you are not the intended
> recipient, please delete this message.
>