Is there any reason that we can't store the byte-stream data directly
in postmodern? We already have an efficient, mostly non-consing byte-
array serializer with the following format:
[btree_id][data_type][data_format]
If you used a new table for each btree, then you could strip the
btree_id and pass the type + format to postgres. Integers are stored
big-endian and strings in left to write lisp-code order so might just
work without modification.
However I'm speculating as I don't understand all the issues. As I
understand it, CL-SQL has the problem that the byte storage methods
for different SQL engines are different enough to make a common API
difficult to implement.
Ian
On Nov 29, 2007, at 10:06 AM, Robert L. Read wrote:
On Thu, 2007-11-29 at 16:16 +0200, Alex Mizrahi wrote:
AP> Am I missing something really basic here?
actually it's quite strange situation that you have *many*
employees with
same name but you want just one (random one). i cannot imagine why
one needs
this in real world..
or you're saying that all have different names, but it still does
consing?
this could be a bug then..
With respect to consing, it is important to point out that our
serializer is very consing (for postmodern and CL-SQL backends.) This
is because I used base64 to transform the byte-streams into character
strings.
Most relational databases (including Postgres) provide a way of
storing
byte sequences directly. However, this is not standardized and not
portable. In fact, I spoke to Kevin Rosenberg, the author of CL-SQL,
and he and CL-SQL don't have a good way to do it.
However, since postmodern is Postgres specific, it could avoid this
step, by using a back-end specific serializer. I suspect this would
have a huge impact on performance, both by decreasing consing (minor)
and by decreasing the amount of disc I/O that has to be done (major).
(BDB doesn't have this problem, because it natively uses byte-
sequences,
not character-sequences.)
Please see the code below, which demonstrates that pushing 1 million
bytes through the serializer (without even going to the database)
creates 8 million bytes of garbage in 0.433 seconds. (This is on a
new,
fast, 2 gigabyte 64-bit machine, against postmodern:
asdf:operate 'asdf:load-op :elephant)
(asdf:operate 'asdf:load-op :ele-clsql)
(asdf:operate 'asdf:load-op :postmodern)
(asdf:operate 'asdf:load-op :elephant-tests)
(in-package "ELEPHANT-TESTS")
(setq *default-spec* *testpm-spec*)
(setq teststring "supercalifragiliciousexpialidocious")
(setq testint 42)
(setq totalseriazationload (* 1000 1000))
(setq n (ceiling (/ totalseriazationload (length teststring))))
(open-store *default-spec*)
(time
(dotimes (x n)
(in-out-value teststring)))
(close-store)
*****
Results in:
Evaluation took:
0.433 seconds of real time
0.172974 seconds of user run time
0.058991 seconds of system run time
0 calls to %EVAL
0 page faults and
8,731,728 bytes consed.
NIL
ELE-TESTS>
I personally think making a back-end specific serializer to avoid the
base64 encoding would make a significant performance difference. This
is not much of an issue for me personally, since I keep everything
cached in memory anyway.
--
Robert L. Read, PhD
http://konsenti.com
_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel
_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel