Re: Kind'a TL, but please DR - Need your thoughts

James Schneider Tue, 02 Feb 2016 00:48:53 -0800

On Mon, Feb 1, 2016 at 10:49 PM, Mario R. Osorio <nimbiot...@gmail.com>
wrote:


> y So this is effectively a feed aggregation engine. I would recommend
>> having a separate daemon running per media source, so that issues with one
>> media source do not affect the operations of another.
>>
>
>  I never would have thought of this application as a feed aggregation
> engine, but I'm not really sure it fits the definition, will be digging
> deeper into this
>

Maybe not in the traditional sense of pulling in RSS, Twitter, Facebook,
etc., but it sounds like you want to perform the same task with other
message types like email and an SMS gateway. There may be applications that
exist already in the SaaS space if your SMS gateways are publicly
accessible and support remote integration options (I assume they do since
you mentioned it as one of your sources). There's probably something out
there for ripping out email content and dropping it in a database as well,
or serializing that data into a format that can be easily parsed and
dropped into the database by a small integration script (perhaps JSON to
Postgres?). I'm not familiar with any, but I'd be shocked if they don't
exist. You can look at that as an option to rolling your own solution,
which may end up cheaper and easier in the long run since you wouldn't be
responsible for maintaining that portion of the system. The folks who run
those systems/applications are more familiar with the edge cases and
probably already have them handled. Worth a Google anyway to possibly avoid
re-inventing an already perfectly round wheel.


>
>
>> It would be possible to do everything with one daemon, but would be much
>> trickier to implement.
>>
>
> I agree 120%
>
>
>
>> A second(?) python daemon would be waiting for those messages to be in
>>> the DB, process them, act accordingly to the objective of the application,
>>> and update the DB as expected. This process(es) might included complicated
>>> and numerous mathematical calculations, which might take seconds and even
>>> minutes to process.
>>>
>>
>> Implementation here is less critical than your workflow design.
>>
>
> I agree yet, this is the heart of my application. I understand it
> basically only involves the (web) application and the DBMS w/o any other
> external element; It is here where the whole shebang happens, but it might
> just be the DB application programmer in me though.
>

Ah, now I see the bias. ;-) I'm a network administrator by trade, so I can
fix any problem with the right router. I totally get it. :-D

I mentioned workflow for exactly the reason you pointed out, this is the
heart of the application, and if its wrong, the rest of the system fails.


> This could be implemented as a simple cron script on the host that runs
>> every few minutes. The trick is to determine whether or not a) records have
>> already been processed, b) certain records are currently processing, c)
>> records are available that have yet to be processed/examined. You can use
>> extra DB columns with the data to flag whether or not a process has already
>> started examining that row, so any subsequent calls to look for new data
>> can ignore those rows, even if the data hasn't finished processing.
>>
>
> You gave me half my code there, but I'm not sure I want to trust a cron
> job for that. I know there are plenty of other options to do the dirty
> laundry here, such as queues, signals, sub-processes (and others?) but I
> kind'a feel comfortable leaving that communication exchange to the DBMS
> events as I see it; who would know better when 'something' happened but the
> DBMS itself?
>

For long running processes, you'll want flags that are persistent,
especially if there is a failure along the way (process crash, power loss,
etc.). I wouldn't necessarily trust a long-running but transient process in
RAM to complete (although it may often times be a necessary evil).
Short-running processes are usually fine, especially if they can be easily
recreated in the event of a processing failure. The DBMS also serves as the
central information point for what data has/hasn't been processed, which I
suppose counts as process management to some degree.


>
> The reason I want to do the application using Django is that all this HAS
>>> to have multiple web interfaces and, at the end of the day most media will
>>> c--ome through web, and have to be processed as http requests. Also, Django
>>> gives me a frame to make this work better organized and clean and I can
>>> make the application(s) DB agnostic.
>>>
>>
>>
>
>> What do you mean by 'multiple web interfaces'? You mean multiple daemons
>> running on different listening ports? Different sites using the sites
>> framework? End-user browser vs. API?
>>
>
> A combination of all that and probably a bit more ... This is something I
> left out trying to evade the TL;DNR responses: I'm considering having this
> app return nothing but probably json or xml code for other applications to
> "feed" from it. (here is that feed word again!), there are a myriad of
> possible ways this application can be used. This, BTW, would leave all the
> HTML/CSS/Javascrpt/etc "problems" to someone else ... it might just be the
> DB app programmer in me trying to avoid dealing with web issues, or I might
> just be trying to make things harder for me; this is something I haven't
> really thought much about.
>

If that's the route you are trying to go, build the API first. The Django
REST Framework is an excellent tool. If you want a human-friendly element
later, you can slap in your own web front-end and take advantage of the API
calls you've already created. I feel you on the JS/HTML/CSS issue, though.
I don't know who invented CSS, but it is a completely different mindset
from application or database programming, you know, where we expect
objective, consistent, and deterministic results. ;-)


> Wanting the application to be DB agnostic does not mean that I don't have
>>> a choice: I know I have many options to communicate among different python
>>> processes, but I prefer to leave that to the DBMS. Of the open source DBMS
>>> I know of, only Firebird and PostgreSQL have event that can provide the
>>> communication between all the processes involved. I was able to create a
>>> very similar application in 2012 with Firebird, but this time I am being
>>> restricted to PostgreSQL, which I don't to oppose at all. That application
>>> did not involve http requests.
>>>
>>
>> Prefer to leave what to the DBMS? The DBMS is responsible for storing and
>> indexing data, not process management. Some DBMS' may have some tricks to
>> perform such tasks, but I wouldn't necessarily want to rely on them unless
>> really necessary. If you're going to the trouble of writing separate
>> listening daemons, then they can talk to whatever backend you choose with
>> the right drivers.
>>
>
> I understand I'm having the DBMS do some of the process management, but it
> only goes as far as letting other processes know there is some job to be
> done, not even what needs to be done. I don't thing the overhead on the
> DBMS is going to be all that big.
>

I'd assume you are thinking of some kind of DBMS trigger functionality to
fire off other related processes. It may totally be appropriate for your
application. Most on this list will jump to Celery because it integrates
well with Django and probably is easier to introspect with regards to job
management. But hey, that may not even matter if you don't need to manage
that set of processes from the web. I'm sure the DB driver would have
support to interrogate those processes manually though, you just don't get
the benefit of the ORM abstraction in that case, and would make your
application much less DB-agnostic. I don't know enough about such features
to have a decent opinion either way.


>
> This whole application is an idea that's been in my mind for some 7 years
> now. I even got as far as having a working prototype. I was just starting
> to learn Python then and my code is a shameful non pythonic mess. But it
> worked. I used Firebird as my RDMS, and all feeds (again?) would come in
> and out through an ad-hoc gmail account (with google voice for SMS
> messaging) I would get the input, process it and return the output within
> 10 to 40 seconds, with the average at around 20 which is satisfying if you
> consider the app is not really controlling the "medium". Of course, I never
> even considered any heaving testing as there were many limitations, the 500
> outgoing messages per day being just the first one.  It just proved my
> concept. ande served as a very good (and long) exercise in Python.
>

That's actually really good. Having a working baseline should make it easy
to improve, even if you end up rewriting the whole thing. The non-Pythonic
mess makes my eye twitch a bit, but I'm assuming that you'll be cleaning
that up along the way. It also has probably enlightened you to issues with
the existing workflow.


>
> I recently shared my thoughts with some close friends that linger around
> other branches of (IT related) knowledge and they liked they idea, hence
> the request for your input, for which I feel very much obliged.
>
> Thanks a BUNCH!
>
>
============
> DISCLAIMER!
> ============
> I do not mean to argue any of the ideas you and all others have shared
> with me, on the contrary; you have fed even more my curiosity and curiosity
> well managed usually turns into knowledge. I can't do different from
> thanking all of you for that gift.
>

I don't think you were being argumentative by any means. Start with what
you know, play to your strengths, and work outward from there. My only
other advice, keep it simple as long as you can. Occam's Razor proves true
for most of the programming (and life) problems I've faced (the simplest
solution is often the correct one).

-James

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/CA%2Be%2BciX%2BXZeC7YHU3dRQ42UamGaNzMXSVgvn-ZrOrurryoxeUw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Kind'a TL, but please DR - Need your thoughts

Reply via email to