Johny wrote:
I have a text and would like  to split the text into smaller parts,
say into 100 characters each. But if  the 100th character is not a
blank ( but word) this must be less than 100 character.That means the
word itself can not be split.
These smaller parts must contains only whole( not split) words.
I was thinking  about  RegEx but do not know how to find the correct
Regular Expression.

While I suspect you can come close with a regular expression:

  import re, random
  size = 100
  r = re.compile(r'.{1,%i}\b' % size)
  # generate a random text string with a mix of word-lengths
  words = ['a', 'an', 'the', 'four', 'fives', 'sixsix']
  data = ' '.join(random.choice(words) for _ in range(200))
  # for each chunk of 100 characters (or fewer
  # if on a word-boundary), do something
  for bit in r.finditer(data):
    chunk = bit.group(0)
    print "%i: [%s]" % (len(chunk), chunk)

it may have an EOF fencepost error, so you might have to clean up the last item. My simple test seemed to show it worked without cleanup though.

-tkc



--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to