On Sun, May 26, 2013 at 2:45 PM, Amirouche Boubekki <
amirouche.boube...@gmail.com> wrote:

>
>
>
> 2013/5/26 Cedric Greevey <cgree...@gmail.com>
>
>> I may be developing an application which will need a persistent,
>>
>
>
>> ACID
>>
>
> which means at least transactionnal, are you sure you need that ?
> depending on the database, ACID means differents things. Do you need data
> integrity across «documents», which means that a transaction must span
> modification to several objects, if a failure happens everything should
> rolled back or not persisted.
>

Yes, I need things not to be able to get left half-done. :)


>  local database (on the same disk as the application, rather than having
>> to be accessed over the network)
>>
>
> which means embedded
>
>
>> containing information about potentially 100,000-1,000,000 (or more)
>> objects.
>>
>
> which means relatively big
>

I expect a few GB to a few tens of GB in practice. Chump change,
disk-space-wise, but just a bit too big to want to try loading it all into
RAM at once, even on the 8GB development box here. It would probably work,
but run like a pig and make the rest of the system horribly slow due to
paging.

Much of that information will be of a quasi-boolean character: "is it an X
>> or not?" for various choices of X, but with "yes", "no", "borderline", and
>> "not yet evaluated" as the four possible values. It will be desirable to
>> query for these, for example to get a lazy seq of all objects for which
>> it's a borderline Y or for which it's not yet evaluated whether it's a Z or
>> for which it's either "yes" or "borderline" on whether it's an X or
>> whatever.
>>
>
> It seems like loosely structured data for which a key/value store (also
> know as kv store) might be great
>
>
>>  I'm not that familiar with the local-DB solutions out there. I'd like a
>> recommendation for one which is *
>>
>
>
>> a) a good for for Clojure use
>>
>
> I'm not sure about Clojure specificities related to bindings C/C++
> databases, but in Python it's some 
> ctypes<http://docs.python.org/2/library/ctypes.html>(or else) definitions 
> away.
>
>
>> and b) a good fit for the type of data and queries noted above.
>>
>
> You are not very specific about the queries and the data.
>
> 1) Is it structured aka. an object can have several fields possibly
> complex fields like list or hashmaps but also integers ? dates and uuids
> can be emulated with strings and integers
> 2) Do objects have relations ? a lot of relations ?
> 3) is the data schema fixed at compilation or do you need to have the
> schema to be dynamic ?
>

Much of the data is conditional in a certain sense -- if it's an X, it's
also a Y and it may be a W or a Z as well, but if it's a G it's certainly
not a W, etc.; though simply storing a large number of boolean columns that
may be unused by many of the table rows would be acceptable.

The thing that makes me slightly dubious about relational here is that
there will necessarily either be many columns unused by many rows, as
there's a lot of data that's N/A unless certain other conditions are met;
or else there will be many whole tables and a lot of expensive joins, as we
have a table of Foos, with an isBar? column with either a BarID or a null,
and a table of Bars with an isBaz? column, and a table of Bazzes with an
isQuux? column, and then a need to do joins on *all* of those tables to run
a query over a subset of Quuxes and have access to some Foo table columns
in the results.

This sort of thing points towards an object database more than any other
sort, with inherited fields from superclasses, or a map database that
performs well with lots of null/missing keys in most of the maps. But maybe
a relational DB table with very many columns but relatively few used by any
given row would perform OK.

The DB must be able to grow larger then available RAM without crashing the
>> JVM and the seqs resulting from queries like the above will also need to be
>> able to get bigger than RAM.
>>
>
>
>> My own research suggests that H2 may be a good choice, but it's a
>> standard SQL/relational DB and I'm not 100% sure that fits well with the
>> type of data and querying noted above. Note though that not all querying
>> will take that form; there'll also be strings, uuids, dates, and other such
>> field types and the need to query on these and to join on some of them;
>> also, to do less-than comparisons on dates.
>>
>
> Depending on your speed needs and the speed of the database, a kv store
> can be enough, you serialize the data as strings and deserialize it when
> you need to do computation. Except that kv store are not easy to deal with
> when you have complex queries, but again it depends on the query.
>

I expect they'd also have problems with transactional integrity if, say,
there was a power cut during an update. Anything involving "serialize the
data as strings" sounds unsuited to either the volume I'm envisioning or
the need for consistency. It certainly wouldn't do to overwrite the file
with half of an updated version of itself and then lose power! Keeping the
previous version around as a .bak file is scarcely much better. It pretty
much needs to be ACID since there will need to be coordinated changes to
more than one bit of the data sometimes and having an update interrupted
with only half the changes done, and having it stay in that half-done
state, would potentially be disastrous.

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to