On 04/05/2023 12:09, Jim Melton wrote:
You are confusing data representation with data presentation. The flaws in Excel are NOT 
issues with the data format. So long as the data format clearly and consistently 
represents that content, then the representation is "good".

If you want to overcome limitations in Excel's presentation (import, 
interpretation), then that's an Excel issue. You can overcome it by manually 
doing the import and explicitly asserting the data type of each column, or you 
can create something more custom.

Realize that more and more your data is likely to be consumed by some other 
data science tool (R, Python/numpy/pandas, etc.) and you quickly see how 
pushing Excel issues into the data representation layer is a losing proposition.

---
Jim Melton
For myself, I'm not actually committed to any particular text format, but .csv was helpful for some of the "customers" of   my apps, because they're used to doing "casual" science extraction using Excel.    For relatively low-rate/low-volume   data, I find that text formats let me explore it for first-order sanity-checking, and it also allows for processing with   a number of different tools without having to bring in great lumbering packages and libraries.

I can process my .csv data with AWK, Ad-Hoc Python, GnuPlot and Octave and R without too much fuss--because the columns   are regular and in known places, and typically floating-point or integers.

When you move towards data that has much higher rates/volumes, and has more exotic semantics, like pulsar processing,   you start to abandon text.   But 99% of my "user community" are amateur science types, whose experiments are somewhat   ad-hoc, and are usually confined to simple observing modes where low-rate textual logging makes considerable sense...




-----Original Message-----
Sent: Thursday, May 4, 2023 01:40
To: discuss-gnuradio@gnu.org
Subject: [EXTERNAL] Re: Getting GPS data into stream

Hey Marcus,

as you say, for a lot of science you don't get high rates – so I'm really less 
worried
about that. More worried about Excel interpreting some singular data point as 
date; or, as
soon as we involve textual data, all the funs with encodings, 
quoting/delimiting/escaping…
(not to mention that an Excel set to German might interpret different things as 
numbers
than a Northern American one).

I wish there was just one good CSV standard that tools adhered to. Alas, that's 
not the
case, and especially Excel has a habit of autoconverting input and losing data 
at that point.
So, looking for an alternative that has these well-defined constraints and 
isn't as
focused on hierarchical data (JSON, YAML, XML), far too verbose but excellent 
to query
with command line tools (XML), completely impossible to correctly parse as 
human or parser
in its full beauty (YAML)… Just some tabular data notation that's textual, 
appendable, and
not a party of guesswork for the reading tool.
We could just canonalize calling all our files

marcusdata.utf8.textalwaysquoted.iso8601.headerspecifies_fieldname_parentheses_type.csv

but even that wouldn't solve the issue of excel seeing an unquoted 12.2021 and 
deciding
the field being about christmases past.

So, maybe we just do some rootless JSON format that starts with a SigMF object 
describing
the file and its columns, and then basically is just a sequence of JSON arrays

[ 1.212e-1, 0, "Müller", 24712388823 ]
[ 1.444e-2, 1, "📡🔭  \"👽\"!", 11111111111 ]
[ 2.0115-1, 0, "Cygnus-B", 0 ]

(I'm not even sure that's not valid JSON; gut feeling tells me we should be 
putting []
around the whole document, but we don't want that for streaming purposes. 
ECMA-404 doesn't
seem to *forbid* it.)

That way, we get the metadata in a format that's easy to skip by simpler tools, 
but
trivial to parse with the right tools (I've grown to like `jq`), and the data 
into a
well-defined format. Sure, you can't dump that into Excel, still, but you know 
what, if it
comes down to it, we can have a python script that takes these files and 
actually converts
them to valid XLSX without the misconversion footguns, and that same tool could 
also be
run in a browser for those having a hard time executing python on their 
machines.

Cheers,
Marcus
On 03.05.23 23:05, Marcus D. Leech wrote:
On 03/05/2023 16:51, Marcus Müller wrote:
Do agree, but really don't like CSV, too underspecified a format, too many ways 
that
comes back to bite you (aside from a thousand SDR users writing emails that 
their PC
can't keep up with writing a few MS/s of CSV…)
I like CSV because you can hand your data files to someone who doesn't have a 
complete
suite of astrophysics tools, and they
    can slurp it into Excel and play with it.

How important is plain-textness in your applications?
I (and many others in my community) tend to throw ad-hoc tools at data from 
ad-hoc
experiments.  In the past, I used a lot
    of AWK to post-process data, and these days, I use a lot of Python.    
Text-based
formats lend themselves well to this kind
    of processing.  Rates are quite low, typically.  Like logging an integrated 
power
spectrum a few times a minute, for example.

There are other observing modes where text-based formats aren't quite so 
obvious--like
pulsar observations, where filterbank
    outputs might be recorded at 10s of kHz, and then post-processed with any 
of a number
of pulsar tools.

In all of this, part of the "science" is extracted in "real-time" and part in
post-processing.


Best,
Marcus


CONFIDENTIALITY NOTICE - SNC EMAIL: This email and any attachments are 
confidential, may contain proprietary, protected, or export controlled 
information, and are intended for the use of the intended recipients only. Any 
review, reliance, distribution, disclosure, or forwarding of this email and/or 
attachments outside of Sierra Nevada Corporation (SNC) without express written 
approval of the sender, except to the extent required to further properly 
approved SNC business purposes, is strictly prohibited. If you are not the 
intended recipient of this email, please notify the sender immediately, and 
delete all copies without reading, printing, or saving in any manner. --- Thank 
You.


Reply via email to