I've built a proof of concept and it seems to work okay; it's 
here: https://github.com/tundra/capnprof.

The one issue I've run into is that in order to figure out how much space 
(approximately) a value took up in the zipped data I need to know the 
location in memory of all values, so I can map them back to the input. I 
haven't been able to figure out a way to get that for the pointer sections 
in structs, except by just reading past where I know the data section ends. 
That seems a little hacky though -- is there a non-hacky way to do this?

I've attached two small examples of the output, they're profiles of the 
same data but one is ordered by zipped and the other by unzipped size 
(called weight and bytes respectively). It gives a sense of the kind of 
information you can derive. This is transit schedule information and I 
found it interesting that placenames dominate the unzipped data, by a fair 
margin, whereas in the zipped version it's geographic locations (which rank 
6th in the unzipped data) that dominate and placenames compress well enough 
to sink to rank 3.


c

On Saturday, May 19, 2018 at 1:12:03 AM UTC+2, Kenton Varda wrote:
>
> Hi,
>
> This sounds neat. I'm not aware of anyone having built such a tool yet.
>
> It should indeed be straightforward using the Dynamic API, or maybe the 
> "Any" API (AnyPointer/AnyList/AnyStruct), which gives you a lower-level 
> view of the object tree.
>
> -Kenton
>
> On Mon, May 14, 2018 at 8:46 AM, <[email protected] <javascript:>> wrote:
>
>> Hi,
>> For the project I'm working on I need to distribute some zipped capnproto 
>> data. I'd like the data itself to be fairly small, but in particular I'd 
>> like the result of zipping it to be the smallest I can absolutely make it.
>>
>> I used to use protocol buffers and implemented a size profiler for those. 
>> It basically traversed the entire structure while keeping track of the path 
>> that led to each point and counted the size of data encountered against a 
>> fixed-size suffix of the path. It was pretty simple but really useful in 
>> identifying where the problem points were. Now I've switched to capnproto 
>> and am considering doing the same for that, possibly as a stand-alone tool 
>> if I have time. I'm assuming it won't be all that hard to do with the 
>> reflection api. The plan then is to use it separately but in particular to 
>> combine it with a zip profiler I already have to find parts of the data 
>> that don't compress well.
>>
>> My question is, is this something anyone has already done or has thought 
>> about so they have any input into how such a tool should work? Also, I 
>> wonder if this is even something that might be useful to anyone else.
>>
>>
>> c
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Cap'n Proto" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> Visit this group at https://groups.google.com/group/capnproto.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
Visit this group at https://groups.google.com/group/capnproto.
rank #trc     self    accum    zself   zaccum path
   1  212   187.5K   187.5K   17.6zK   17.6zK CfsSection.stopAttributes 
CfsSto...
   2   88   107.9K   107.9K    6.2zK    6.2zK [] KeywordRange.buckets [] 
Posit...
   3  184    90.0K    90.0K    2.4zK    2.4zK CfsSection.routeAttributes 
CfsRo...
   4  146    89.8K   179.7K    5.1zK    5.8zK [] Pid.eids [] Eid.value
   5  147    89.8K    89.8K    717zB    717zB Pid.eids [] Eid.value 
CfsStopApp...
   6  220    44.9K    44.9K   19.7zK   19.7zK CfsSection.stopAttributes 
CfsSto...
   7  144    44.9K   224.6K    5.6zK   11.5zK CfsPeidMap.pids [] Pid.eids []
   8  227    27.3K    27.3K    5.6zK    5.6zK CfsSection.stopAttributes 
CfsSto...
   9  197    26.9K    26.9K   19.6zK   19.6zK CfsSection.tripAttributes 
CfsTri...
  10  153    26.2K    39.4K      7zB    1.1zK [] Pid.eids [] Eid.value

TRACE 88:
    []
    KeywordRange.buckets
    []
    PositionBucket.stopGids

TRACE 144:
    CfsPeidMap.pids
    []
    Pid.eids
    []

TRACE 146:
    []
    Pid.eids
    []
    Eid.value

TRACE 147:
    Pid.eids
    []
    Eid.value
    CfsStopAppendix.foreignId

TRACE 153:
    []
    Pid.eids
    []
    Eid.value

TRACE 184:
    CfsSection.routeAttributes
    CfsRouteAttributesSection.stopBloomFilters
    []

TRACE 197:
    CfsSection.tripAttributes
    CfsTripAttributesSection.stopTimingPool

TRACE 212:
    CfsSection.stopAttributes
    CfsStopAttributesSection.shortNames
    []

TRACE 220:
    CfsSection.stopAttributes
    CfsStopAttributesSection.descendancies

TRACE 227:
    CfsSection.stopAttributes
    CfsStopAttributesSection.lineSeqPool

rank #trc     self    accum    zself   zaccum path
   1  220    44.9K    44.9K   19.7zK   19.7zK CfsSection.stopAttributes 
CfsSto...
   2  197    26.9K    26.9K   19.6zK   19.6zK CfsSection.tripAttributes 
CfsTri...
   3  212   187.5K   187.5K   17.6zK   17.6zK CfsSection.stopAttributes 
CfsSto...
   4  207    25.8K    25.8K   13.3zK   13.3zK CfsSection.lineAttributes 
CfsLin...
   5  195    25.8K    25.8K   12.8zK   12.8zK CfsSection.tripAttributes 
CfsTri...
   6   88   107.9K   107.9K    6.2zK    6.2zK [] KeywordRange.buckets [] 
Posit...
   7  144    44.9K   224.6K    5.6zK   11.5zK CfsPeidMap.pids [] Pid.eids []
   8  114    22.5K    22.5K    5.6zK    5.6zK CfsSection.placeAttributes 
CfsPl...
   9  227    27.3K    27.3K    5.6zK    5.6zK CfsSection.stopAttributes 
CfsSto...
  10  146    89.8K   179.7K    5.1zK    5.8zK [] Pid.eids [] Eid.value

TRACE 88:
    []
    KeywordRange.buckets
    []
    PositionBucket.stopGids

TRACE 114:
    CfsSection.placeAttributes
    CfsPlaceFragmentAttributesSection.stopGids
    []

TRACE 144:
    CfsPeidMap.pids
    []
    Pid.eids
    []

TRACE 146:
    []
    Pid.eids
    []
    Eid.value

TRACE 195:
    CfsSection.tripAttributes
    CfsTripAttributesSection.stopGidPool

TRACE 197:
    CfsSection.tripAttributes
    CfsTripAttributesSection.stopTimingPool

TRACE 207:
    CfsSection.lineAttributes
    CfsLineAttributesSection.stopGidPool

TRACE 212:
    CfsSection.stopAttributes
    CfsStopAttributesSection.shortNames
    []

TRACE 220:
    CfsSection.stopAttributes
    CfsStopAttributesSection.descendancies

TRACE 227:
    CfsSection.stopAttributes
    CfsStopAttributesSection.lineSeqPool

Reply via email to