This is a more business-logic as opposed to syntax/technique question than
usually shows up on this list, but hopefully folks won't mind.

I started off to ask "what's the best way to find contiguous sequences of
numbers in a list?" and then realized that maybe I should answer the "What
are you trying to achieve?" question first.

Short version:
Tactical problem:  Given a list of sorted numbers, how best to identify
contiguous sequences within the list?
Strategic problem:   How best to put a chunked-up file back together given
a (possibly incomplete) set of chunks and the index number of those chunks?

Long version:
I have a file that's been broken up into chunks of roughly standard size
and I want to put the file back together.  I have the chunk number for each
chunk so I know what order they should be concatenated in.  I may not have
all the chunks, but I don't want to wait for all of them to arrive before
starting the assembly, so I'll need to be able to be able to generate
intermediate products and add to them later when the assembly process is
re-run.

My database has filepath and chunk-number data for all chunks (even the
ones that haven't arrived yet; the filepaths are predictable), so I know
where the chunks will be and what order to assemble them in.

My first thought is to get the sorted list of chunk numbers for all the
chunks that I currently have, locate contiguous sub-sequences, and assemble
those sub sequences, then assemble the sub-sequences once they become
contiguous.  Something like this:

Available chunk nums:  '(1 2 3 5 7 200 201 202 203)

Group them by contiguous:  '( (1 2 3) (5) (7) (200 201 202 203))

Generate superchunks:  1-3, 200-203

Later...

Available chunk nums:  '(1-3 4 5 7 190 193 200-203)

Group them by contiguous:  '( (1-3 4 5) (7) (190) (193) (200-203))

Merge 4 and 5 into 1-3, rename 1-3 as 1-5.

...etc

I realized I don't actually know a simple way to identify contiguous
sub-sequences.  I came up with the following, but it feels clumsy.

(define-values (n final)
  (for/fold ((prev (car nums))
             (acc '())
             )
            ((n (cdr nums)))
    (values n
            (if (= n (add1 prev))
               (cons n acc)
               (begin
                  (set! result (cons (reverse acc) result))
                  (list n))))
    ))
(cons (reverse final) result)

Given this:  '(1 2 3 5 7 200 201 202 203)
It yields this: '((200 201 202 203) (7) (5) (2 3))

Which is fine -- I don't care about the order of the sublists within the
overall list, as long as each sublist is itself sorted ascending.

Does anyone have any advice to offer, on either the tactical or strategic
problem?

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to