I've built a proof of concept and it seems to work okay; it's
here: https://github.com/tundra/capnprof.
The one issue I've run into is that in order to figure out how much space
(approximately) a value took up in the zipped data I need to know the
location in memory of all values, so I can map them back to the input. I
haven't been able to figure out a way to get that for the pointer sections
in structs, except by just reading past where I know the data section ends.
That seems a little hacky though -- is there a non-hacky way to do this?
I've attached two small examples of the output, they're profiles of the
same data but one is ordered by zipped and the other by unzipped size
(called weight and bytes respectively). It gives a sense of the kind of
information you can derive. This is transit schedule information and I
found it interesting that placenames dominate the unzipped data, by a fair
margin, whereas in the zipped version it's geographic locations (which rank
6th in the unzipped data) that dominate and placenames compress well enough
to sink to rank 3.
c
On Saturday, May 19, 2018 at 1:12:03 AM UTC+2, Kenton Varda wrote:
>
> Hi,
>
> This sounds neat. I'm not aware of anyone having built such a tool yet.
>
> It should indeed be straightforward using the Dynamic API, or maybe the
> "Any" API (AnyPointer/AnyList/AnyStruct), which gives you a lower-level
> view of the object tree.
>
> -Kenton
>
> On Mon, May 14, 2018 at 8:46 AM, <[email protected] <javascript:>> wrote:
>
>> Hi,
>> For the project I'm working on I need to distribute some zipped capnproto
>> data. I'd like the data itself to be fairly small, but in particular I'd
>> like the result of zipping it to be the smallest I can absolutely make it.
>>
>> I used to use protocol buffers and implemented a size profiler for those.
>> It basically traversed the entire structure while keeping track of the path
>> that led to each point and counted the size of data encountered against a
>> fixed-size suffix of the path. It was pretty simple but really useful in
>> identifying where the problem points were. Now I've switched to capnproto
>> and am considering doing the same for that, possibly as a stand-alone tool
>> if I have time. I'm assuming it won't be all that hard to do with the
>> reflection api. The plan then is to use it separately but in particular to
>> combine it with a zip profiler I already have to find parts of the data
>> that don't compress well.
>>
>> My question is, is this something anyone has already done or has thought
>> about so they have any input into how such a tool should work? Also, I
>> wonder if this is even something that might be useful to anyone else.
>>
>>
>> c
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Cap'n Proto" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected] <javascript:>.
>> Visit this group at https://groups.google.com/group/capnproto.
>>
>
>
--
You received this message because you are subscribed to the Google Groups
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
Visit this group at https://groups.google.com/group/capnproto.
rank #trc self accum zself zaccum path
1 212 187.5K 187.5K 17.6zK 17.6zK CfsSection.stopAttributes
CfsSto...
2 88 107.9K 107.9K 6.2zK 6.2zK [] KeywordRange.buckets []
Posit...
3 184 90.0K 90.0K 2.4zK 2.4zK CfsSection.routeAttributes
CfsRo...
4 146 89.8K 179.7K 5.1zK 5.8zK [] Pid.eids [] Eid.value
5 147 89.8K 89.8K 717zB 717zB Pid.eids [] Eid.value
CfsStopApp...
6 220 44.9K 44.9K 19.7zK 19.7zK CfsSection.stopAttributes
CfsSto...
7 144 44.9K 224.6K 5.6zK 11.5zK CfsPeidMap.pids [] Pid.eids []
8 227 27.3K 27.3K 5.6zK 5.6zK CfsSection.stopAttributes
CfsSto...
9 197 26.9K 26.9K 19.6zK 19.6zK CfsSection.tripAttributes
CfsTri...
10 153 26.2K 39.4K 7zB 1.1zK [] Pid.eids [] Eid.value
TRACE 88:
[]
KeywordRange.buckets
[]
PositionBucket.stopGids
TRACE 144:
CfsPeidMap.pids
[]
Pid.eids
[]
TRACE 146:
[]
Pid.eids
[]
Eid.value
TRACE 147:
Pid.eids
[]
Eid.value
CfsStopAppendix.foreignId
TRACE 153:
[]
Pid.eids
[]
Eid.value
TRACE 184:
CfsSection.routeAttributes
CfsRouteAttributesSection.stopBloomFilters
[]
TRACE 197:
CfsSection.tripAttributes
CfsTripAttributesSection.stopTimingPool
TRACE 212:
CfsSection.stopAttributes
CfsStopAttributesSection.shortNames
[]
TRACE 220:
CfsSection.stopAttributes
CfsStopAttributesSection.descendancies
TRACE 227:
CfsSection.stopAttributes
CfsStopAttributesSection.lineSeqPool
rank #trc self accum zself zaccum path
1 220 44.9K 44.9K 19.7zK 19.7zK CfsSection.stopAttributes
CfsSto...
2 197 26.9K 26.9K 19.6zK 19.6zK CfsSection.tripAttributes
CfsTri...
3 212 187.5K 187.5K 17.6zK 17.6zK CfsSection.stopAttributes
CfsSto...
4 207 25.8K 25.8K 13.3zK 13.3zK CfsSection.lineAttributes
CfsLin...
5 195 25.8K 25.8K 12.8zK 12.8zK CfsSection.tripAttributes
CfsTri...
6 88 107.9K 107.9K 6.2zK 6.2zK [] KeywordRange.buckets []
Posit...
7 144 44.9K 224.6K 5.6zK 11.5zK CfsPeidMap.pids [] Pid.eids []
8 114 22.5K 22.5K 5.6zK 5.6zK CfsSection.placeAttributes
CfsPl...
9 227 27.3K 27.3K 5.6zK 5.6zK CfsSection.stopAttributes
CfsSto...
10 146 89.8K 179.7K 5.1zK 5.8zK [] Pid.eids [] Eid.value
TRACE 88:
[]
KeywordRange.buckets
[]
PositionBucket.stopGids
TRACE 114:
CfsSection.placeAttributes
CfsPlaceFragmentAttributesSection.stopGids
[]
TRACE 144:
CfsPeidMap.pids
[]
Pid.eids
[]
TRACE 146:
[]
Pid.eids
[]
Eid.value
TRACE 195:
CfsSection.tripAttributes
CfsTripAttributesSection.stopGidPool
TRACE 197:
CfsSection.tripAttributes
CfsTripAttributesSection.stopTimingPool
TRACE 207:
CfsSection.lineAttributes
CfsLineAttributesSection.stopGidPool
TRACE 212:
CfsSection.stopAttributes
CfsStopAttributesSection.shortNames
[]
TRACE 220:
CfsSection.stopAttributes
CfsStopAttributesSection.descendancies
TRACE 227:
CfsSection.stopAttributes
CfsStopAttributesSection.lineSeqPool