Re: "financial reporting language"

Christopher Browne Thu, 30 Dec 1999 15:36:20 -0800
On Fri, 31 Dec 1999 02:04:15 +1100, the world broke into rejoicing as
Robert Graham Merkel <[EMAIL PROTECTED]>  said:
> [EMAIL PROTECTED] writes:
>  > 
>  > Robert Merkel <[EMAIL PROTECTED]> wrote:
>  > >I've mostly completed the data-collection part of the transaction
>  > >reports code, and am now figuring out how to implement the formatted
>  > >data output. 
>  > >
>  > >I'm going over your mail of a few weeks ago, attempting to figure
>  > >out how to actually implement this code.
>  > 
>  > Preface this with one comment: the word "thunk" is not necessarily
>  > well-used throughout this posting.  I would speculate that the
>  > "financial reporting language" should be gone over once or twice
>  > more, fairly carefully, and that which falls out of it as *useful*
>  > should be rather more carefully specified so that we have, rather
>  > than some "ramblings," an actual design that can evolve without
>  > necessarily breaking.
>  > 
>  > your suggestion of adding a "report-date" function is a good example
>  > of the sort of thing that should be added in to the design before
>  > things get out of hand.  
>  > - Page numbers would be another sensible addition; 
> 
> Yes.
>  > - Checksums might be another; 
> 
> Could you explain what you mean here?

Probably too obscure to worry about; I've known financial systems to put
sequential checksums on each page so that someone nefarious couldn't
safely insert a faked page.  Largely a "fancier page number," and not
likely needed any time soon.

>  > - Running balances still another (e.g. - each line or each header/footer
>  >   contains a running total);
> 
> Definitely.
>
>  > - Keeping room for a data-only output form for archival shouldn't
>  >   be too problematic
> 
> Yep.
>  > - Are any of the ideas *violently* incompatible with the notion of
>  >   generating TeX (or LaTeX) and thereby having typeset accounts?
> 
> I've been thinking about this from a LaTeX perspective, and I'm fairly
> confident that the kind of thing we're talking about could be
> implemented by using the "tabularx" package.

"tabularx" apparently postdates my *heavy* involvement with LaTeX;
much of the point is to simply keep it in the back of our minds.

>  > - Is there *violent* inherent incompatibility with the Postscript display
>  >   model or with GNOME Print/GNOME Canvas?
> I'm not an expert on PostScript, but as I understand it PostScript 
> is so flexible that it should be able to cope with anything we can
> fling at it.
> 
> GNOME print is designed to offer line-by-line layout, so at this stage
> I don't see that we're doing anything that will make our lives difficult.

And that's what we want to know at this stage.

>  > - Graphical images, anyone?
> 
> I could well see a need for this, but I'm not keen to include it in
> the initial design (for the purposes of having SOMETHING running 
> in the not-too-distant future).  So, yes, I'd make provision for
> inserting graphics.
>
> Another one that has occurred to me is the need to have horizontal
> lines below certain columns of certain rows of text.  Indeed, 
> it would be nice to be able to specify a variety of types of line -
> solid, dashed, double etc.

Good thought.  It's common to need single/double lines to indicate
subtotals/grand totals.  That probably gets attached to particular
"boxes" aka "records" aka "fields."

For things like invoices, people may want to draw rectangles around
some boxes, so I think it's not just horizontal lines that are needed.

>  > Not all may need to be "formally specified" from the beginning,
>  > but if there are such things that it might make sense to add in
>  > later, it's a good idea to think ahead so that the design doesn't
>  > make later extensions impossible without making violent changes to
>  > the design.
>  > 
> Agreed.
>  > In effect, any "violent" changes should take place now, when there's
>  > not functionality to throw away.
>  > 
>  > For instance, it would be seriously annoying to build up some
>  > functionality, and shortly thereafter determine that since we left *no*
>  > room for the insertion of graphical objects, it is necessary to create
>  > a new reporting scheme from scratch because we forgot that some folks
>  > might want to have graphical images embedded in the headers of reports...
>  > I may be personally dubious of the merits of such, but it'll doubtless
>  > be incredibly important to someone.
>  > 
>  > e.g. - While a graphical G/L may be of negative value [consider that
>  > this *forces* you to print in graphical mode, and there's considerable
>  > CPU cost to rendering a 50 page report when graphical images get added
>  > in...] someone might decide use the reporting system to create invoices,
>  > and downright *NEED* pretty graphics.
>  > 
>  > I guess what I'm suggesting is that at this point, code should be
>  > written to establish the feasibility of techniques, not for full use.
>  > There needs to be a bit more design effort before trying to go all
>  > the way with the code.
> 
> Agreed.  Once we have agreed on a preliminary design, I'll try and
> design and then implement a HTML renderer for the parts needed 
> for the existing reports.
> 
> That way, hopefully, most of the remaining warts  can be spotted
> before we get too much further.

Spotting warts early is a good thing.

> <snip>
> 
>  > >2) What should these functions return?  At this stage, they have to 
>  > >   return some kind of data structure which would probably be a vector
>  > >   containing 
>  > >   i.   the type (ie string, value, total, or date)
>  > >   ii.  the "value"
>  > >   iii. the linkname
>  > >   iv.  the column
>  > >   v.   the style
>  > >
>  > >   Do I understand this correctly?
>  > 
>  > Nope.  They return nothing; they function as side-effects.
>  > 
>  > They're more analagous to a file port, which consumes its input,
>  > sending the input to the appropriate place.
> 
> How do they know where to send stuff - and what to send?
> 
> Could you outline an implementation of a fairly simple case to show
> me what's going on here?

The "report-port" should be a structure containing several parameters
including its type.

(define report-port-structure 
   (make-record-type "report-port"
       '(report-type report-device location-name page-structure
         style-info)))

;;; Now, create a port structure based on the above
(define html-report-port 
   ((record-constructor report-port-structure)
    'html   ;;; Other values such as 'text, 'gnumeric, 'xml
            ;;; would be other options
    'port   ;;; It might be nice to open ports to other kinds
            ;;; of devices, such as a string, network
            ;;; socket, named pipes, ...  Although it's likely
            ;;; that clever use of ports can get output to go
            ;;; to those places without need for special support
    (open-output-port "/tmp/sampreport.html")
            ;;; This provides a link to the *specific* port in
            ;;; question, in this case, a file in /tmp
    (list    8 ;;; Report with 8 columns
            '("25%" "20%" "15%" "15%" "15%" "10%")  ;;; Percent for each
            ;;; There might also be merit to having definitions for
            ;;; tables used in header/footer
            )
    (list   )  ;;; Some sort of property list to reflect style info
    ))

Given this structure, you could define the functions that I
suggested using for dispatching thus:

(define (is-report-type-x? rp x)
  (eq? x
       ((report-accessor report-port-structure) 'report-type)))

(define (is-report-port-html? report-port)
   (is-report-type-x? report-port 'html)
(define (is-report-port-text? report-port)
   (is-report-type-x? report-port 'text)
(define (is-report-port-gnumeric? report-port)
   (is-report-type-x? report-port 'gnumeric)

>  > For report-value, the "collector thunk" would be a function that grabs
>  > the value and adds it in to a total.
>  > 
>  > For report-total, it passes in the "input-collector-thunk," and gets,
>  > from that, the total being reported, and then passes on that total to
>  > the output-collector-thunk.
>  > 
>  > (define (create-non-spreadsheet-total-thunk)
>  >    (let*  
>  >        ((value 0)  ;;; This is the total that things add to
>  >         (adder-function 
>  >              (lambda (x) (set! value (+ value x))))
>  >         (report-value (lambda () value))
>  >    (dispatcher (lambda (method newval)
>  >                  (cond
>  >                    ((eq? method 'add)
>  >                     (adder-function newval))
>  >                    ((eq? method 'report)
>  >                     (report-value))
>  >                    (else
>  >                     'error)))))
>  >      dispatcher))
>  > 
>  > This thus defines:
>  > a) A slot for a value, which starts at 0,
>  > b) An "adder" function, which adds the value "newval" in, and
>  > c) A "report-value" function, which returns the total.
>  > 
>  > The "thunk definer" gets used thus:
>  > > (define total-thunk (create-non-spreadsheet-total-thunk))
>  > > (total-thunk 'add 25)
>  > > (total-thunk 'add 25)
>  > > (total-thunk 'add -15)
>  > > (total-thunk 'add 2.5)
>  > > (total-thunk 'report #f)
>  > 37.5       
>  > 
>  > This is the "simple" case, where all we're doing is to add values in
>  > to a total.
>  > 
>  > With the spreadsheet format, it wouldn't be values that get passed in,
>  > but rather cell identifiers, and (total-thunk 'report #f) would pass
>  > back the *formula.*
>  > 
>  > Something like:
>  > (define (create-spreadsheet-total-thunk)
>  >    (let*  
>  >        ((ilist '())  
>  >         (adder-function 
>  >              (lambda (x) (set! ilist (cons x ilist))))
>  >         (report-value (lambda () ilist))
>  >    (dispatcher (lambda (method newval)
>  >                  (cond
>  >                    ((eq? method 'add)
>  >                     (adder-function newval))
>  >                    ((eq? method 'report)
>  >                     (report-value))
>  >                    (else
>  >                     'error)))))
>  >      dispatcher))
>  > 
>  > Which is used thus:
>  > > (define ssthunk (create-spreadsheet-total-thunk))
>  > > (ssthunk 'add "C1")
>  > > (ssthunk 'add "C2")
>  > > (ssthunk 'add "C3")
>  > > (ssthunk 'add "C4")
>  > > (ssthunk 'report #f)
>  > ("C4" "C3" "C2" "C1")
>  > 
> OK, this is neater than what I suggested, but same idea applies - we
> have a pair of functions, but invoked like method calls on the one
> object.

That's fair; it's more like an object with two methods.

>  > The last step returns a list of cells in this case, just to keep
>  > exposition simple; but there might be some more useful behaviour, like
>  > actually assembling this into:
>  > "<formula>=C1+C2+C3+C4</formula>"
>  > or, in a more intelligent implementation, figuring out that these run
>  > together, and turning this into:
>  > "<formula>=SUM(C1:C4)</formula>"
>  > 
>  > An *really* cool thing to do would be some equivalent to Lotus 123
>  > "named regions," and generate:
>  > <namedregion>
>  >   <name>REGION4</name>
>  >   <celllist>
>  >    <cell>C1</cell>
>  >    <cell>C2</cell>
>  >    <cell>C3</cell>
>  >    <cell>C4</cell>
>  >   </celllist>
>  > </namedregion>
>  > as well as
>  > <formula>=SUM(REGION4)</formula>
>  > 
>  > although that may be taking things a step further than we need go just
>  > now...  I ought to fiddle with Gnumeric to see how it manages this
>  > sort of thing...
>  > 
> Agreed, this sort of thing will be *very* cool.  While I'm not
> interested in implementing it right now, I certainly want to be able
> to do it in the future.
> 
> However, this means that the type of collector-thunk is dependent 
> on the output format of the report.  Therefore, you would need
> to pass a "collector-thunk generator" as an argument to the report
> generation code so that a HTML collector-thunk was generated when 
> a HTML report was needed, and a Gnumeric formula collector-thunk
> was generated when a Gnumeric-exported report was wanted etc.
> This would also mean that the report would essentially have to be
> rerun when a different output format was generated (instead of 
> being able to reprocess the generated report to a different 
> final output format).  This is not ideal.
> 
> Can you suggest a way around this?

I'm inclined to say, "run the report again," using a report-port
for the different medium.

The problem here is that we have two choices:

a) Define a report-writing language, or
b) Define a report-writing language *as well as a language for
   representing device-independent reports.*

The approach I'm suggesting is more like a).  It involves only 2.5
languages:
a) Scheme,
b) Report-writing functions layered atop Scheme (the 0.5), and
c) The output form.  (ASCII, HTML, LaTeX, Gnumeric XML, ...)

It we want to generate reports in a pure device-independent form,
and *then* transform them into the physical output form, then
the answer is probably to generate reports using XML, and then
use transformation tools on the XML.  

The problem is that this requires an extra programming language, as the
set of "languages" increases to four:
a) Whatever we use to generate the XML,
b) The XML DTD/Schema for the dev-independent intermediate form, 
c) Whatever language is used to transform XML into output forms, and
d) Output form languages.

I don't see a whole lot of merit in adding the extra layers.
Feel free to disagree, but also feel free to justify the need for
the extra "language layers."

>  > >;;; You pass a list of the thunks established above into (report-line)
>  > >;;; which combines them into a single line.
>  > >;;; (define (report-line report-port . list-of-thunks))
>  > >;;;
>  > >
>  > >1)report-line actually performs the formatting (then sends it to the
>  > >appropriate physical port), doesn't it?  
>  > 
>  > Close.  I'd have report-line forward responsibility to dispatch
>  > functions, suggested below...
>  > 
>  > >Would it make sense here for the report-port data structure to
>  > >contain a renderer function that does the transformation of the thunk
>  > >list into a string that can then be fed to the physical report port?
>  > 
>  > report-line should be implemented as a dispatcher that forwards the
>  > output to one of:
>  >  - report-line-text
>  >  - report-line-html
>  >  - report-line-gnumeric
>  > as needed.
>  > 
>  > Those dispatch functions would forward the output to the "physical
>  > port."
>  > 
>  > Thus, report-line whould be defined thus:
>  > (define (report-line report-port . list-of-thunks)
>  >   (cond
>  >    ((is-report-port-html? report-port)
>  >     (report-line-html report-port list-of-thunks))
>  >    ((is-report-port-text? report-port)
>  >     (report-line-text report-port list-of-thunks))
>  >    ((is-report-port-gnumeric? report-port)
>  >     (report-line-gnumeric report-port list-of-thunks))
>  >    (else
>  >     'error)))
> 
> I was going to try to be a little cuter to allow new report output
> formats to be registered dynamically.  Do you agree that this would
> be worthwhile?

Sure.  That function repeats code that is virtually identical several
times, which is certainly suggestive of there being "structure."
A decent alternative would be to build an association list:
(define report-line-dispatch-list
  '(('html . report-line-html) ('gnumeric . report-line-gnumeric)
   ('text . report-line-text)))

(define (report-line report-port . list-of-line-items)
  ((assoc 
    ((report-accessor report-port-structure) 'report-type))
   report-port list-of-line-items))

Where you could register new methods by modifying report-line-dispatch-list.

This could be changed to a hash table if O(1) performance proved
important.

> <large snip>
> 
>  > >;;; style:
>  > >(define style-structure 
>  > >  (make-record-type 
>  > >   "style"
>  > >   '(alignment fontinfo color)))
>  > >
>  > >I still have concerns that allowing ourselves more than a fixed set
>  > >of styles that we can implement in all output forms is just going
>  > >to make life too hard.
>  > >
>  > >Can you suggest how your method might be implemented in a portable fashio
n? 
>  > 
>  > If we go with a pure "lowest common denominator," then we're left
>  > looking at raw text as the output form.  And that is *NOT* going to
>  > fly.
>  > 
>  > - On a monochrome monitor, or a laserprinter, there won't be color.
>  >   But spreadsheets can cope with it nicely.
>  > 
>  > - Any paper-oriented output form will forcibly need to have
>  >   headers/footers, and will worry about "pages."  HTML and spreadsheets
>  >   won't.
>  > 
>  > - Devices will vary as to what fonts they support.
>  > 
>  > If we have an acceptable way of degrading output gracefully on
>  > less-than-expressive media, that allows us to have the styles be
>  > pretty smart.
> 
> My concern is that font information varies greatly according to output
> format, and coming up with an acceptable translator for 
> a highly specific style specification is a quite complex task.  Also,
> available fonts vary *extremely* widely between output methods.

True.

Font selection might *need* to depend on the output method.  That's a
good reason to think this through first.

> In addition, I'm not convinced the output formatting of a style 
> should necessarily be embedded in the report.  Consider where
> all reports need to be in a certain "corporate style".  If we define
> what a style should look like in every document it's going to be a
> nightmare to maintain consistency.

If it's HTML, then it would be logical for the HTML to contain style
names, and for there to be a separate CSS file generated.  An
entertaining property that falls out of such is that web browsers are
allowed to have user configuration override what the document
specifies for CSS.  Thus, the "corporate CSS" might kick in to
override the behaviour.

I *don't* like the idea of having <font size=16> and the likes
embedded in the document, and the W3C people are deprecating that in
HTML 4.0; *FAR* better to attach CLASS="SUBTITLE" and have global CSS
configuration to centralize the formatting to one location that can be
managed.

> Again, what I'd propose is a limited set of styles that are guaranteed
> to be supported by *all* output methods.  The fonts, colours, etc used
> for these fixed styles for a specific output method would presumably
> be both globally and locally configurable.  If a specific report
> requires a custom style, they should specify output-method specific
> font information for any output methods that they intend to support,
> and should specify another style to default to if an unsupported
> output method is attempted - kind of like an object hierachy, in fact.
> 
> something like
> 
> ;; (define custom-style default-style render-info)
> 
> Where default-style is the name of an already-defined style, 
> and render info is something like:
> 
> ( '(html .(*some html rendering info*)) '(text . (*text rendering
> info*)))
> 
> Whilst this method places a greater burden on people who write reports
> (if they use custom styles) and output formatters (as they have to
> provide font information for the default styles), I would be
> comfortable implementing such a method.
> 
> However, if you could broadly outline an implementation for your
> method (or point to an example of a similar method that's been 
> implemented in another context) I would be happy to go with it.
> While I'm not very familiar with CSS, it at least has the comfort of a
> common font model, something we are not blessed with in this case.
> 
> Anyway, I think we are getting closer.  If you could clear up
> my confusion and examine the remaining concerns, it won't be 
> too long before we have a good, implementable, design.

This is the area where my "proposal" is particularly weak at this
point; I think we could do far worse than to consciously model (if not
replicate) CSS.

1.  The model should involve *named* style components, with logical
    names.  

    This way, if the output mechanism supports some sort of "style
    sheet," and HTML and Gnumeric *do,* the style *controls* can be
    centralized in one part of the output, with names used to actually
    attach style to things that need to be centred, emboldened,
    italicized, or specially coloured.

    Look at the source to any of my web pages at
    <http://www.hex.net/~cbbrowne/>; all *sorts* of HTML tags have
    classes attached to them.  The ones presently in use include:
     "ABSTRACT" "ADDRESS" "AFFILIATION" "APPLICATION" "ARTICLE"
     "ATTRIBUTION" "AUTHOR" "BLOCKQUOTE" "BOOK" "CALSTABLE" "CAUTION"
     "COLOPHON" "COMMAND" "EMAIL" "EMPHASIS" "ENVAR" "FILENAME"
     "FUNCTION" "GLOSSLIST" "INLINEMEDIAOBJECT" "INTERFACE" "KEYCAP"
     "LITERAL" "LITERALLAYOUT" "NAVFOOTER" "NAVHEADER" "NOTE" "PART"
     "PHRASE" "PRODUCTNAME" "PROGRAMLISTING" "PROPERTY" "QUOTE"
     "SECT1" "SECT2" "SECT3" "SECT4" "SYSTEMITEM" "TABLE" "TEXTOBJECT"
     "TIP" "TITLE" "TITLEPAGE" "TOC" "USERINPUT" "WARNING"

    I can customize how each is displayed across the whole web site by
    messing with the contents of the file stdstyle.css, that all the
    web pages reference.

    Report-oriented named "style classes" would be sensible to
    construct for GnuCash.

This seems to me to be the first, and most important step.  Naming
things is *critical.*

2. *Then* we look at what sorts of properties we'd want to assign to
   the "style classes."

   This would include things like:
   - Alignment.
     (memq? horiz-alignment '(left right center justify))
     (memq? vert-alignment '(top bottom center))
     Note that vertical alignment becomes important if/when graphical
     images are introduced.

   - Borders.
     Properties including:
        - Single/double/triple lines
        - Thickness
        - Dotting
        - Color/greyscaling
        - K001 3D Effects

   - Underlining.
     Properties including:
        - How many lines?
        - Thickness/distance apart
        - Dotting
        - Color/greyscaling

   - Background
     Properties including:
        - Color/greyscaling
        - Watermark image

   - Fonts
     Properties including:
        - Family
        - Vendor
        - Name
        - Roman/sans serif
        - Spacing (memq spacing '(monospaced proportional))
        - Point size
        - Weight (normal, demibold, bold, ...)
        - Slant
        - Italicization
      Font properties will obviously have to get mapped to the nearest
      available font on the medium.  In raw ASCII, there may be only
      one font.

   - Color/greyscaling
        With multiple lists of color information so that things
        degrade gracefully  
        - Canonical Netscape 16 colors
        - RGB numbers

Add these together, and a "style" has the following properties:
      - A name
      and then, as defined above...
      - Text color
      - Background
      - Font
      - Underlining
      - Borders
      - Alignment

This is, roughly speaking, the sorts of things that CSS1 offers; see,
for a not dramatically sophisticated example,
<http://www.hex.net/~cbbrowne/stdstyle.css>

The fact that this is somewhat complex, involving a fairly deep data
structure, is *not* going to be important to performance, shown
thusly:

- Outputting to ASCII, almost *all* of the style info gets thrown
  away, as there's no control over most of it.  We keep alignment and
  underlining, and the rest is largely ignored.

- Outputting to HTML, the fact that this is a complex structure does
  *not* present any performance problems, because all we do is to:
    a) Dump out all the style info to a CSS header/file, so that each
       style gets dealt with *ONCE,* and then
    b) Attach the NAME to bits of HTML via ``CLASS="NAME"''

- Similar is true for Gnumeric, as they've been putting a lot of work
  into their style system's performance and functionality lately.

- Outputting to LaTeX is probably similar; we can define macros to do
  "style stuff," and then output things like:
   \gcstyleSUBTOTAL {  255.00 }
--
"All language designers are arrogant.  Goes with the territory..."  
-- Larry Wall
[EMAIL PROTECTED] - <http://www.hex.net/~cbbrowne/lsf.html>

--
Gnucash Developer's List 
To unsubscribe send empty email to: [EMAIL PROTECTED]
Re: "financial reporting language"

Reply via email to