Re: [Pharo-users] Can it do this way ?

Richard O'Keefe Mon, 07 Sep 2020 19:24:19 -0700

There are two quite different questions.
(1) Where may dashes occur in a real ISBN-10?
(2) What does Exercism require in the specification and check in the
test cases?


For (1) the rules are

Each ISBN consists of 5 elements with each section being separated by
spaces or hyphens. Three of the five elements may be of varying length:

   - *Prefix element* – currently this can only be either 978 or 979. It is
   always 3 digits in length
   - *Registration group element* – this identifies the particular country,
   geographical region, or language area participating in the ISBN system.
   This element may be between 1 and 5 digits in length
   - *Registrant element* - this identifies the particular publisher or
   imprint. This may be up to 7 digits in length


   - *Publication element* – this identifies the particular edition and
   format of a specific title. This may be up to 6 digits in length
   - *Check digit* – this is always the final single digit that
   mathematically validates the rest of the number. It is calculated using a
   Modulus 10 system with alternate weights of 1 and 3.

An ISBN-10 does not have the three-digit prefix.  So we have
  [0-9]{1,5}   -- prefix
  [0-9]{1,7}   -- registrant
  [0-9]{1,6}   -- publication
  [0-9X]
      -- check digit

As an examplw, "Standard C++ IOStreams and Locales" by Langer & Kreft has
ISBN-10 0-201-18395-1
ISBN-13 9780201183955
so I shall assume the separators are optional.
/^[0-9]{1,5}[- ]?[0-9]{1,7}[- ]?[0-9]{1,6}[- ]?[0-9X]$/
Of course the elements cannot all have their maximum length at the same
time.  In AWK I would write
   x = a_putative_ISBN_10
   y = x
   gsub(/[- ]+/, "", y)
   if (x ~ /^[0-9]{1,5}[- ]?[0-9]{1,7}[- ]?[0-9]{1,6}[- ]?[0-9X]$/ \
    && y ~ /^[0-9]{9,9}[0-9X]$/ \
   ) {
       x *might* be valid, we still need to check the checksum
   }

For (2), there appear to be no restrictions on where dashes may occur
or how many: "These may be communicated with or without hyphens".
Exercism doesn't allow spaces.

Regular expressions are elegant in their own way, BUT for this problem
they are (a) excessive, (b) inefficient, and (c) insufficient.

   digit count := 0.
   check sum := 0.
   for each character c of the string
       if c is not a hyphen then
           if c is a digit then
               digit value := c's value as a digit
           else if c is X and digit count = 9 then
               digit value := 10
           else
               return false.
           digit count := digit count + 1.
           if digit count > 10 then return false.
           check sum := (11 - digit count) * digit value + check sum.
    return check sum mod 11 is zero.

Part of the insight here is "don't DO it, just PRETEND you did."
That is, instead of copying the string without the hyphens,
just ignore the hyphens as they turn up.
Another part is "if you are only going to use it once, don't store it."
That is, we need a digit's value just once, in the update to check sum,
so we should compute it just before we need it, not store it.

Now the pseudo-code above is classic sequential imperative coding.

Classic functional coding does something like
   let no_dashes = filter (/= '-') (explode string) in
   length no_dashes = 10 and
   let check = last no_dashes in
   (is_digit check or check = 'X') and
   all is_digit (take 9 no_dashes) and
   let xval c = if x = 'X' then 10 else digit_value c in
   dot (map xval no_dashes) [10,9..1]) mod 11 = 0

This pseudo-code translates nicely to Smalltalk too.
You might want to add

SequenceableCollection>>
  with: other inject: initial into: aBlock
    |r|
    r := initial.
    self with: other do: [:x :y |
      r := aBlock value: r value: x value: y].
    ^r
  dot: other
    ^self with: other inject: 0 into: [:acc :x :y | x*y + acc]

(These methods are so obvious that it would be absurd to claim
any intellectual property rights to them.)

I also have "fusion" methods like
SequenceableCollection>>
from: start to: finish allSatisfy: testBlock
  self from: start to: finish do: [:each |
    (aBlock value: each) ifFalse: [^false]].
  ^true

so that ((seq copyFrom: a to: z) allSatisfy: blk)
can be done as (seq from: a to: z allSatisfy: blk)
without making a copy.

Fusion methods are useful because Smalltall compilers
don't work as hard at eliminating intermediate data
structures as functional language compilers.  (Having
other priorities.)

   (isdigit (last no_dashes
           return false if digit count > 10.




On Tue, 8 Sep 2020 at 03:02, Steffen Märcker <merk...@web.de> wrote:

> No problem. I am not knowledgeable about isbn numbers. At which places may
> a dash occur?
>
> Kind regards,
> Steffen
>
> 07.09.2020 16:18:22 Roelof Wobben via Pharo-users <
> pharo-users@lists.pharo.org>:
>
> > Op 6-9-2020 om 10:07 schreef Steffen Märcker:
> >> Maybe this is a naive question, but can you just split the task into the
> >> following two?
> >>
> >> 1. Check whether whether the string is syntactically an ISBN number.
> >> This can be done, e.g., using a regex.
> >>
> >> 2. Check the the check character.
> >> Calculate the check character from the (now to be known) syntactically
> >> valid string.
> >>
> >> ISBNValidator>>isValidISBN: aString
> >> ^(self isSyntacticallyValid: aString) and: [self isCheckCharacterValid:
> >> aString]
> >>
> >> Kind regards,
> >> Steffen nder if the code will not be too big. I learned that it is good
> >>
> > Sorry to respond not earlier but your respons seems to be in the spam
> folder of my provider.
> >
> > I could do that but if very bad in regex so I do not know a regex which
> van validate 123456789 or 123-456-78-9
> >
> > Roelof
> >
>
>

Re: [Pharo-users] Can it do this way ?

Reply via email to