Re: "financial reporting language"

Robert Graham Merkel Sat, 1 Jan 2000 18:57:37 -0800
Christopher Browne writes:
<snip>
(In the context of the functions report-value, report-date, report-total):
 > >  > >2) What should these functions return?  At this stage, they have to 
 > >  > >   return some kind of data structure which would probably be a vector
 > >  > >   containing 
 > >  > >   i.   the type (ie string, value, total, or date)
 > >  > >   ii.  the "value"
 > >  > >   iii. the linkname
 > >  > >   iv.  the column
 > >  > >   v.   the style
 > >  > >
 > >  > >   Do I understand this correctly?
 > >  > 
 > >  > Nope.  They return nothing; they function as side-effects.
 > >  > 
 > >  > They're more analagous to a file port, which consumes its input,
 > >  > sending the input to the appropriate place.
 > > 
 > > How do they know where to send stuff - and what to send?
 > > 
 > > Could you outline an implementation of a fairly simple case to show
 > > me what's going on here?
 > 
OK.  Consider the following case

(report-line my-port (report-string "Hello world" my-string-collector
                       0 "dummy link" default-style)
                       ... more report-*'s ............ )
 
Now, the first thing that happens is that the bracketed report-string
expression is evaluated.  my-port is bound to anything in the frame 
where report-string is evaluated, is it?
Therefore, what does the code for report-string look like?
Sorry if I seem to be asking dumb questions, but my fairly limited
knowledge of Scheme isn't hitting me round the head and saying "it's 
implemented this way, stupid!"  It's a choice
of dumb questions now, while the design process is happening, or
dumb questions later, when I try implementing stuff . . .

 > The "report-port" should be a structure containing several parameters
 > including its type.
 > 
 > (define report-port-structure 
 >    (make-record-type "report-port"
 >        '(report-type report-device location-name page-structure
 >          style-info)))
 > 
 > ;;; Now, create a port structure based on the above
 > (define html-report-port 
 >    ((record-constructor report-port-structure)
 >     'html   ;;; Other values such as 'text, 'gnumeric, 'xml
 >             ;;; would be other options
 >     'port   ;;; It might be nice to open ports to other kinds
 >             ;;; of devices, such as a string, network
 >             ;;; socket, named pipes, ...  Although it's likely
 >             ;;; that clever use of ports can get output to go
 >             ;;; to those places without need for special support
 >     (open-output-port "/tmp/sampreport.html")
 >             ;;; This provides a link to the *specific* port in
 >             ;;; question, in this case, a file in /tmp
 >     (list    8 ;;; Report with 8 columns
 >             '("25%" "20%" "15%" "15%" "15%" "10%")  ;;; Percent for each
 >             ;;; There might also be merit to having definitions for
 >             ;;; tables used in header/footer
 >             )
 >     (list   )  ;;; Some sort of property list to reflect style info
 >     ))
 > 
Fine.  That's very much what I'd assumed.

 > Given this structure, you could define the functions that I
 > suggested using for dispatching thus:
 > 
 > (define (is-report-type-x? rp x)
 >   (eq? x
 >        ((report-accessor report-port-structure) 'report-type)))
 > 
 > (define (is-report-port-html? report-port)
 >    (is-report-type-x? report-port 'html)
 > (define (is-report-port-text? report-port)
 >    (is-report-type-x? report-port 'text)
 > (define (is-report-port-gnumeric? report-port)
 >    (is-report-type-x? report-port 'gnumeric)
 > 
Yep.

<snip>
 > > However, this means that the type of collector-thunk is dependent 
 > > on the output format of the report.  Therefore, you would need
 > > to pass a "collector-thunk generator" as an argument to the report
 > > generation code so that a HTML collector-thunk was generated when 
 > > a HTML report was needed, and a Gnumeric formula collector-thunk
 > > was generated when a Gnumeric-exported report was wanted etc.
 > > This would also mean that the report would essentially have to be
 > > rerun when a different output format was generated (instead of 
 > > being able to reprocess the generated report to a different 
 > > final output format).  This is not ideal.
 > > 
 > > Can you suggest a way around this?
 > 
 > I'm inclined to say, "run the report again," using a report-port
 > for the different medium.
 > 
 > The problem here is that we have two choices:
 > 
 > a) Define a report-writing language, or
 > b) Define a report-writing language *as well as a language for
 >    representing device-independent reports.*
 > 
 > The approach I'm suggesting is more like a).  It involves only 2.5
 > languages:
 > a) Scheme,
 > b) Report-writing functions layered atop Scheme (the 0.5), and
 > c) The output form.  (ASCII, HTML, LaTeX, Gnumeric XML, ...)
 > 
 > It we want to generate reports in a pure device-independent form,
 > and *then* transform them into the physical output form, then
 > the answer is probably to generate reports using XML, and then
 > use transformation tools on the XML.  
 > 
 > The problem is that this requires an extra programming language, as the
 > set of "languages" increases to four:
 > a) Whatever we use to generate the XML,
 > b) The XML DTD/Schema for the dev-independent intermediate form, 
 > c) Whatever language is used to transform XML into output forms, and
 > d) Output form languages.
 > 
 > I don't see a whole lot of merit in adding the extra layers.
 > Feel free to disagree, but also feel free to justify the need for
 > the extra "language layers."
 > 
OK, my initial goal was to represent device-independent reports.  The
reason that I was keen on doing this was that I was hoping to have the
UI able to initially display the HTML report, with a button that says
"export/print" which popped up a list of potential report targets.  
If we follow approach (a), we will need to totally re-do the report
when we do this, making engine calls in the process.  If we use
approach (b), this is not necessary.

However, looking at what we've already got, it's clear that we are
going doing the road of approach (a).  (a) does simplify things
considerably, and the only downside I can see of import is the minor 
performance hit that we take in the situation described above.  Yes, 
we also lose the ability to use external tools to manipulate "raw"
reports, but there are many alternative ways of achieving the same
goal anyway (including defining an XML DTD and exporting XML if
such a thing is deemed necessary).


<snip>
 > Sure.  That function repeats code that is virtually identical several
 > times, which is certainly suggestive of there being "structure."
 > A decent alternative would be to build an association list:
 > (define report-line-dispatch-list
 >   '(('html . report-line-html) ('gnumeric . report-line-gnumeric)
 >    ('text . report-line-text)))
 > 
 > (define (report-line report-port . list-of-line-items)
 >   ((assoc 
 >     ((report-accessor report-port-structure) 'report-type))
 >    report-port list-of-line-items))
 > 
 > Where you could register new methods by modifying report-line-dispatch-list.
 > 
 > This could be changed to a hash table if O(1) performance proved
 > important.
 Cool, could we go with this?  I doubt a hash table is necessary.

<large snip>
 > This is the area where my "proposal" is particularly weak at this
 > point; I think we could do far worse than to consciously model (if not
 > replicate) CSS.
 > 
 > 1.  The model should involve *named* style components, with logical
 >     names.  
 > 
 >     This way, if the output mechanism supports some sort of "style
 >     sheet," and HTML and Gnumeric *do,* the style *controls* can be
 >     centralized in one part of the output, with names used to actually
 >     attach style to things that need to be centred, emboldened,
 >     italicized, or specially coloured.
 > 
 >     Look at the source to any of my web pages at
 >     <http://www.hex.net/~cbbrowne/>; all *sorts* of HTML tags have
 >     classes attached to them.  The ones presently in use include:
 >      "ABSTRACT" "ADDRESS" "AFFILIATION" "APPLICATION" "ARTICLE"
 >      "ATTRIBUTION" "AUTHOR" "BLOCKQUOTE" "BOOK" "CALSTABLE" "CAUTION"
 >      "COLOPHON" "COMMAND" "EMAIL" "EMPHASIS" "ENVAR" "FILENAME"
 >      "FUNCTION" "GLOSSLIST" "INLINEMEDIAOBJECT" "INTERFACE" "KEYCAP"
 >      "LITERAL" "LITERALLAYOUT" "NAVFOOTER" "NAVHEADER" "NOTE" "PART"
 >      "PHRASE" "PRODUCTNAME" "PROGRAMLISTING" "PROPERTY" "QUOTE"
 >      "SECT1" "SECT2" "SECT3" "SECT4" "SYSTEMITEM" "TABLE" "TEXTOBJECT"
 >      "TIP" "TITLE" "TITLEPAGE" "TOC" "USERINPUT" "WARNING"
 > 
 >     I can customize how each is displayed across the whole web site by
 >     messing with the contents of the file stdstyle.css, that all the
 >     web pages reference.
 > 
 >     Report-oriented named "style classes" would be sensible to
 >     construct for GnuCash.
 > 
 > This seems to me to be the first, and most important step.  Naming
 > things is *critical.*
 > 
Yep.  When I go through and look at the number of style classes that would
be needed for the typical accounting report, I doubt there would be
nearly as many as what you have above.

I intend to have a look at the existing reports, and the design of 
the transaction reports, and have a look at what styles are necessary
for those.  I intend to also examine what common elements they share,
to start to build up a list of what globally-available styles should
be available.

 > 2. *Then* we look at what sorts of properties we'd want to assign to
 >    the "style classes."
 > 
 >    This would include things like:
 >    - Alignment.
 >      (memq? horiz-alignment '(left right center justify))
 >      (memq? vert-alignment '(top bottom center))
 >      Note that vertical alignment becomes important if/when graphical
 >      images are introduced.
 > 
 >    - Borders.
 >      Properties including:
 >         - Single/double/triple lines
 >         - Thickness
 >         - Dotting
 >         - Color/greyscaling
 >         - K001 3D Effects
 > 
 >    - Underlining.
 >      Properties including:
 >         - How many lines?
 >         - Thickness/distance apart
 >         - Dotting
 >         - Color/greyscaling
 > 
 >    - Background
 >      Properties including:
 >         - Color/greyscaling
 >         - Watermark image
 > 
 >    - Fonts
 >      Properties including:
 >         - Family
 >         - Vendor
 >         - Name
 >         - Roman/sans serif
 >         - Spacing (memq spacing '(monospaced proportional))
 >         - Point size
 >         - Weight (normal, demibold, bold, ...)
 >         - Slant
 >         - Italicization
 >       Font properties will obviously have to get mapped to the nearest
 >       available font on the medium.  In raw ASCII, there may be only
 >       one font.
 > 
 >    - Color/greyscaling
 >         With multiple lists of color information so that things
 >         degrade gracefully  
 >         - Canonical Netscape 16 colors
 >         - RGB numbers
 > 
 > Add these together, and a "style" has the following properties:
 >       - A name
 >       and then, as defined above...
 >       - Text color
 >       - Background
 >       - Font
 >       - Underlining
 >       - Borders
 >       - Alignment
 > 
Yep, but there is an additional trick I would like to add, I would
like to specify a "parent style".  If a property is left unspecified,
it should inherit that property from the parent.
This hierachy should make life much easier to design reports with
lots of styles, then if, say, you didn't like the font, you could
minimize the number of changes necessary.

It would allow neat things (with an appropriately designed style
hierachy), like the user changing a font setting and every font in a
report changing appropriately in response.
 > This is, roughly speaking, the sorts of things that CSS1 offers; see,
 > for a not dramatically sophisticated example,
 > <http://www.hex.net/~cbbrowne/stdstyle.css>
 > 
 > The fact that this is somewhat complex, involving a fairly deep data
 > structure, is *not* going to be important to performance, shown
 > thusly:
 > 
 > - Outputting to ASCII, almost *all* of the style info gets thrown
 >   away, as there's no control over most of it.  We keep alignment and
 >   underlining, and the rest is largely ignored.
 > 
 > - Outputting to HTML, the fact that this is a complex structure does
 >   *not* present any performance problems, because all we do is to:
 >     a) Dump out all the style info to a CSS header/file, so that each
 >        style gets dealt with *ONCE,* and then
 >     b) Attach the NAME to bits of HTML via ``CLASS="NAME"''
 > 
 > - Similar is true for Gnumeric, as they've been putting a lot of work
 >   into their style system's performance and functionality lately.
 > 
 > - Outputting to LaTeX is probably similar; we can define macros to do
 >   "style stuff," and then output things like:
 >    \gcstyleSUBTOTAL {  255.00 }
Agreed, performance is not an issue here.

What is at issue, though, is whether we can acceptably map style
definitions from your report-format-independent version described
above, to specific style information for the individual output
formats.

There are two main issues with doing this:

1)    Complexity of mapping from generic style information to 
      report-format specific mapping.  I suspect this is probably 
      acceptable, but we won't know for sure until we try.
2)    Translating styles in such a way that makes best use of 
      the output medium's capabilities.

 For instance consider a situation where a report designer wished
to use colour to distinguish report information.  This works fine for
Gnumeric output, but not for LaTeX.  Therefore, the report designer 
will probably go with some method supported by both output media 
so that that the information can always be distinguished.  However,
if you let styles be specified in a medium-specific manner, you could
use colour in Gnumeric and italics in LaTeX and everyone is happy.

I am assuming, of course, that the number of output media is always
going to remain relatively small, and reports will mostly use globally
defined styles so that they will only have to specify output-medium
specific style information for one or two special styles at most.

What do you think?  This seems to be the main area of disagreement
left, so we're not doing too badly.

Happy New Year to you (and everyone else on the list!)  

<OFFTOPIC>
Anyone have any Y2K hassles to report?
</OFFTOPIC>
-- 
---------------------------------------------------------------------------
Robert Merkel                                               [EMAIL PROTECTED]

Humanity has advanced, when it has advanced, not because it has been sober, 
responsible, and cautious, but because it has been playful, rebellious, and 
immature.
                -- Tom Robbins
---------------------------------------------------------------------------

--
Gnucash Developer's List 
To unsubscribe send empty email to: [EMAIL PROTECTED]
Re: "financial reporting language"

Reply via email to