Justin Deoliveira ha scritto:
> Hi all,
> 
> A while back I implemented some hacks on the xsd/xml encoder to
> improve GML encoding performance. I finally got around to benchmarking. 
>    Here are the results. What I actually did is described afterward.
> 
> Here they are:
> 
> Test 1: 100,000 multi polygons
> ------------------------------
> 
> The polygons are fairly big with lots of points. Basically the 
> topp:states layer duplicated ~ 2000 times.
> 
> First step was the baseline, using FeatureTransformer:
> 
> * GML2 Transformer: 540 M, 4.4 M/s, 124 s
> 
> - First number being the total amount of data encoded.
> - Second being the average encoding rate.
> - Third number being the total encoding time.
> 
> Next step was using the encoder as is, no optimizations:
> 
> * GML2 Normal: 528 M, 2.4 M/s, 255 s
> 
> Hmmm... twice as slow.
> 
> And finally with the optimizations:
> 
> * GML2 Optimized: 528 M, 4.3 M/s, 126 s
> 
> Much better, Still a bit slower but not by much.
> 
> The last test I did was GML 3 with the optimizations, and similar results:
> 
> * GML3 Optimized: 518 M, 4.2 M/s, 12 6s
> 
> Test 2: 500,000 line strings
> ----------------------------
> 
> The second test was encoding 500,000 line strings from tiger, so not 
> many coordinates, just two point line strings. And the numbers:
> 
> * GML2 Transformer: 466M, 8.5 M/s, 56s
> * GML2 Normal: 365 M, 1.1 M/s, 345s
> * GML2 Optimized: 391M, 6.2 M/s, 64s
> * GML3 Optimized: 379M, 5.4 M/s, 72s
> 
> Yikes, the non-optimized encoder is almost 7 times as slow. The 
> optimized encoder is still slower, but again not by much.
> 
> So all in all good results with the optimizations. The two encoders are 
> now comparable for GML. I also ran the optimizations through the wfs 
> cite tests to ensure that with the optimizations the GML being produced 
> is still "correct".
> 
> What I did
> ----------
> 
> * A custom FeatureEncoderDelegate for feature collections
> 
> A while back I came up with an interface, EncoderDelegate. The original 
> purpose of this interface was allow other XML encoders to be embedded in 
> the encoder. When the main encoding routine encounters one of these 
> objects, it fully delegates all encoding to it, rather than continue on 
> with the stack based schema assisted encoding.
> 
> So my idea for optimization was to make one of these implementations for 
> FeatureCollections. This would totally remove the walking up and down 
> the encoding stack that the encoder does for each feature that is encoded.
> 
> The problem is that that walking up and down the stack is what looks up 
> the bindings based on type, using the correct binding to encode 
> attributes, etc... So what I did was basically simulate this inside the 
> encoder delegate. IT grabs the feature type, and figures out what 
> bindings would be used to encode each attribute, rolls it into a list. 
> Then for each feature looks up the binding directly and encodes.
> 
> * A custom EncoderDelegate for geometries
> 
> The above gave quite a speed up, but not exactly what i was hoping for. 
> Initial benchmarks still came back about twice as slow. A bit of 
> profiling pointed to the geometry encoding bindings. The above strategy 
> of rolling the bindings into a list only works for simple content, 
> geometries still go through the main encoding routine.
> 
> So the next step was to break out EncoderDelegate's for geometries as 
> well, and have them used directly. And it helped. After this numbers 
> were closer, with the optimized encoder coming back just a bit slower.
> 
> * Respecting number of decimals
> 
> Analyzing the above results I noticed that the optimized xsd encoder was 
> delivering substantially more data than the transformer. Which puzzled 
> me since based on my optimizations it should actually be producing less. 
> After analyzing data from both, the answer was clear, the number of 
> decimals being encoded.
> 
> GML from the xsd encoder was not respecting a limited number of decimals 
> at all. Which resulted quite a bit more data encoding than is necessary.
> 
> Adding the cutting off of decimals gave the amount of data coming back 
> much less, and the total time increase. Giving the final results being 
> quite close in the polygon case (lots of coordinates).
> 
> Things to note
> --------------
> 
> * This only works for simple feature data (sorry ben)
> * These speeds are only for GML, not for general encoding
> * The optimizations are engaged via explicitly setting a property, so if 
> you don't ask for them you won't get them
> 
> I have a bit of clean up to do with the patches but I plan to commit soon.

Wow, way to go Justin, the speedup is really interesting.

I was wondering if you noticed the GML2 specific optimization I made
some time ago to speed up decimal number writing, you can find it
in the CoordinateWriter.formatDecimal method. Basically it skips
the (slow) DecimalFormat and does simple math to do the formatting
instead in the common case, falling back on DecimalFormat only
for very big or very small coordinates (with a further optimization
that notices how converting a long into a string is faster than
doing the same with a double).

If you still don't have it, it should give you a small extra speedup.

Also, have you checked performance for cases in which there
are lots and lots of attributes as opposed to heavy geometries?

Cheers
andrea


-- 
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

------------------------------------------------------------------------------
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT 
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, & 
iPhoneDevCamp as they present alongside digital heavyweights like Barbarian 
Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com 
_______________________________________________
Geotools-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Reply via email to