On Fri, Aug 15, 2014 at 9:47 AM, AW <debian.list.trac...@1024bits.com> wrote:
> On Fri, 15 Aug 2014 09:11:19 +0900
> Joel Rees <joel.r...@gmail.com> wrote:
>
>  > When you're grep- or sed-searching a textual log file, you don't care
>  > whether all the log entries fit any particular relation or structure
>  > definition, and you don't have to think sideways to search on the
>  > keywords buried in the text of the actual log entry.
>
> Of course you think sideways...
> Step 1. Choose a log to view

Mixed logs. What then?

> Step 2. Decide which time frame you want to view.

Maybe I don't want to limit to a particular time frame, especially
when I'm trying to debug a problem which has been slowly corrupting
the loggin database for I don't know how long.

> Step 3. Decide which column is important to you.

What columns? Who defined those columns? Why do I have to do a
database design on all the unforeseeable sets of conditions that I
will want to log, many not errors or even warnings, with all the
information I want to log about them, before I can start coding and
debugging the application so that I can find out what I want to log?

And, again, what happens when a watchdog daemon can't get a socket
(heaven forbid a port) to the error logging daemon and wants to log
that fact? Now we're back to log files and we might just as well have
stuck with them in the first place.

And if management wants them in a database, dump them to a database
after you can scan through them to get an idea of any specific columns
you want to define other than the free-form text bucket at the end.
But keep the logs in files and generate the database from the files,
otherwise, you're going to be stuck trying to log the fact that you
can't log because your database function is down or not yet up, and
that's going to happen a lot more often than trying to log the fact
that your file system is so corrupt you can't write the logs.

> These are all relational searches.

You can design them, after the fact, as relational searches. And if
your design is good, it will catch a lot of similar searches. But you
still have to write down the queries if you want to use them again,
just like you have to write down the more complex grep queries if you
want to use them again.

>  The fact that you decide as a human does
> not make the data non-relational.

Actually, the mathematician in me says, yes it does. No mathematical
model truly captures anything from the real world.

> It should be very clear that log data are
> strongly relational.

Only if there is a large text bucket at the end of most records.

> They conform to all the ideas regarding relational data,
> and you follow relational logic to retrieve the parred down snippet of data 
> you
> wish to view.

Only after you have had time to go back, analyze a few months or years
of logs, and design a database that fits.

> As far as keywords go, which column in an apache log shows the
> referrer?

You don't know unless you can see my httpd configuration files, unless
I happen not to have customized the error logs very much. (And, yes, I
sometimes heavily customize the apache logs to emphasize stuff that
needs to be seen in a specific application while debugging a specific
problem. Then I change the format again when I'm done, because leaving
it that way clutters the logs. And I leave the format sitting in the
configuration files in a comment, in case I need to do that again.

You would have me design a new database and make the logs
discontinuous to do the same thing.

>  Which one shows the date?  Aren't these precisely keyword searches?

Depends on whether normalization makes them keywords. (See what I said above.)

> In fact, awk with grep usage is very similar to a database 'select' 
> statement...

Uhm, yeah, the early relational databases were little more than
constrained plaintext, numeric indexes written as ASCII text, and
searched with awk, sed, and grep. Then they started adding specialty
search functions and then they started writing the indexes in binary.

Databases are a constrained use of text. Binary indexing and binary
blog fields are just optimizations.

> except the user must already know what the column headers are,

What headers?

You don't need headers in text logs. You want a date? You search for a
date. Don't seem to be successful finding a date, look at the log and
you'll see the dates that are there, and then you know what the grep
command should look like.

On the other hand, if you need to see some log where you wrote out
that the number of pink elephant toy queries seems to be greater than
the number of Pooh-Bear towel queriess, and the managers think that is
meaningful because it probably means the customers this week have been
from a particular neighborhood, and they want to adjust the signs and
in-store sales accordingly, what columns in your log database tell you
that?

And again, what columns do you look at when the whole system dies
before it can get up far enough to write to the log database?

> as that
> information is not available as it would be in an sql database...

But if you need the tabular data, you can parse the text logs and
write it. And your parser can tell you when a table's definition needs
to be changed. And you can add new tables for management. And your
logs are unaffected, remain useable for your system purposes and for
new things management dreams up.

Normal data has more to do with how you look at the data than it has
to do with how your store it.

-- 
Joel Rees

Be careful where you see conspiracy.
Look first in your own heart.


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/CAAr43iMpe9Bmg5Vwh2mY0EMr0md3Y=4E2Feu_pRycOY0OfG=f...@mail.gmail.com

Reply via email to