Re: Vote tallying...

Andrew Robinson Fri, 18 Jan 2013 13:53:26 -0800

On 01/18/2013 08:47 AM, Stefan Behnel wrote:

Andrew Robinson, 18.01.2013 00:59:

I have a problem which may fit in a mysql database

Everything fits in a MySQL database - not a reason to use it, though. Py2.5
and later ship with sqlite3 and if you go for an external database, why use
MySQL if you can have PostgreSQL for the same price?

MySQL is provided by the present server host. It's pretty standard atweb hosting sites.It works through "import MySQLdb" -- and it means an IP call for everyaction...

Postgre isn't available :( otherwise, I'd use it....

I'm mildly concerned about scaling issues.... but don't have a lot oftime (just a few days) to come to a decision. I don't need highperformance, just no grotesque degradation when the system is scaled up,and no maintenance nightmare. The votes table is going to getmonsterous if all votes are held in one table....


Your comment about sqlite is interesting; I've never used it before.
At a glance, it uses individual files as databases, which is good... But it 
wants to lock the entire database against reads as well as writes when any 
access of the database happens.  Which is bad...

http://www.sqlite.org/different.html
http://www.sqlite.org/whentouse.html

... XML files are a rather static thing and meant to be
processed from start to end on each run. That adds up if the changes are
small and local while the file is ever growing. You seem to propose one
file per article, which might work. That's unlikely to become too huge to
process, and Python's cElementTree is a very fast XML processor.

Yes, that's exactly what I was thinking.... one file/article.

It's attractive, I think, because many Python programs are allowed toread the XML file concurrently, but only one periodically updates it asa batch/chron/or triggered process; eg: the number/frequency of updateis actually controllable.

eg: MySQL accumulates a list of new votes and vote changes and pythonoccasionally flushes the database into the archive file. That way, MySQLonly maintains a small database of real-time changes, and thespeed/accuracy of the vote tally can be tailored to the user's need.


However, your problem sounds a lot like you could map it to one of the dbm
databases that Python ships. They work like dicts, just on disk.

Doing a Google search, I see some of these that you are mentioning --yes, they may have some potential.


IIUC, you want to keep track of comments and their associated votes, maybe
also keep a top-N list of the highest voted comments. So, keep each comment
and its votes in a dbm record, referenced by the comment's ID (which, I
assume, you keep a list of in the article that it comments on).

The comments themselves are just ancillary information; the votes onlyapply to the article itself at this time. The two pieces of feedbackinformation are independent, occasionally having a user that gives bothkinds. Statistically, there are many votes -- and few comments.

Each archive file has the same filename as the article that is beingcommented or voted on; but with a different extension (eg: xml, or.db,or...) so there's no need to store article information on each voteor comment; (unlike the MySQL database, which has to store all thatinformation for every vote.... ugh....!)

  You can use
pickle (see the shelve module) or JSON or whatever you like for storing
that record. Then, on each votes update, look up the comment, change its
votes and store it back. If you keep a top-N list for an article, update it
at the same time. Consider storing it either as part of the article or in
another record referenced by the article, depending of how you normally
access it. You can also store the votes independent of the comment (i.e. in
a separate record for each comment), in case you don't normally care about
the votes but read the comments frequently. It's just a matter of adding an
indirection for things that you use less frequently and/or that you use in
more than one place (not in your case, where comments and votes are unique
to an article).

You see, lots of options, even just using the stdlib...

Stefan

Yes, lots of options....

Let's see... you've noticed just about everything important, and havelots of helpful thoughts; thank you.

There are implementation details I'm not aware of regarding how thefile-system dictionaries (dbm) work; and I wouldn't know how to compareit to XML access speed either.... but I do know some general informationabout how the data might be handled algorithmically; and which mightsuggest a better Python import to use?

If I were to sort all votes by voter ID (a 32 bit number), and appendthe vote value (A 4 to 8bit number); Then a vote becomes a chunk of 40bits, fixed length; and I can stack one right after another in a compactformat.

Blocks of compacted votes are ideal for binary searching; since theyhave fixed length... and if I am only wanting to change a vote, I don'tneed to re-write the entire file, because a changed vote doesn't changethe file length. ( So, now I have COMPACT + FASTER ).

If I were to wish to insert new votes, in this sorted list of votes -- Iwould need to do something like a merge sort; which means changing thelength of the file by a bulk copy of most of it to open up a "little"space for 1 or two votes; BUT:

It's also possible to break the vote list up into compact ranges ofvoter ID's; And perhaps using indirection like you suggested.... and Iwas thinking it might not be hard to add *some* limited blank space inthe file -- like this XML example:

<vblock start="125" end="550">0xFFFFFABCDEFAFDADADADAFAFADADADAFADADAFAFAFADDAFAFAD................</vblock><vblock start="560" end="620">0xFEEEEAFFFFABCDEFAFDADADADAFAFADADADAFADADAFAFAFADDAFAFAD......</vblock>

So that inserting a vote to block "125" could be done by a simultaneousdeleting of some of the padding characters "...." and therefore, thewhole file rarely needs a re-write.

I don't want to waste a large amount of time optimizing this, as a fastC-api COPY in one library might make merge sort tons faster than apython interpreter based blank space shrink-expand; But, do any of thepossible optimizations I just gave suggest one standard Python librarymight be better suited than another?


Thanks for the help.  :)

--
http://mail.python.org/mailman/listinfo/python-list

Re: Vote tallying...

Reply via email to