Dear fellow fedora users,

If I have a data file called 15.dat with the following content:

$ cat 15.dat
1
3
1
0
2

And I want to find min, quartile 1, median, quartile 3 and maximum (Five number 
summary)
We can use datamash like

$ cat 15.dat | datamash min 1 q1 1 median 1 q3 1 max 1                  0       
1       1.5     2.75    6

Q3 is reported as 2.75 but if we split the data file in half the number is 3.

$ sort 15.dat
0
1
1
2
3
6

$ cat GF19.dat
14
0
4
0
0
1
1
7
1
0
3
1
2
0
$

$ sort GF19.dat
0
0
0
0
0
1
1
1
1
14
2
3
4
7
$
Is incorrect, the 14 is biggest or max
We use -n for numeric

$ sort GF19.dat -n
0
0
0
0
0
1
1
1
1
2
3
4
7
14

It works but q3 is also 2.75 but by hand is 3

$ cat GF19.dat | datamash min 1 q1 1 median 1 q3 1 max 1
0       0       1       2.75    14
$

If we apply a code using sort and awk
From

https://unix.stackexchange.com/questions/13731/is-there-a-way-to-get-the-min-max-median-and-average-of-a-list-of-numbers-in

we can get min, max, median and average

sort -n | awk '{a[i++]=$0;s+=$0}END{print 
a[0],a[i-1],(a[int(i/2)]+a[int((i-1)/2)])/2,s/i}'

How can we find q1 and q3 to generate five number summary?  And does it give 3 
for q3 for both files.  I want to use datamash, but question why it outputs 
2.75 and not 3?

7
14
$ cat 15.dat | sort -n | awk '{a[i++]=$0;s+=$0}END{print 
a[0],a[i-1],(a[int(i/2)]+a[int((i-1)/2)])/2,s/i}'
0 6 1.5 2.16667
$

It outputs min, max, median and average.  Average is optional.  Only min, q1, 
median, q3 and max is needed.

Thank you in advance,


Antonio

Sent from ProtonMail, encrypted email based in Switzerland.

_______________________________________________
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org

Reply via email to