Okay, since i wasn't clear the first time, let me try again. Sorry, I'm not a professional programmer, I'm a true beginner.
SHORT VERSION: What I want is Joseph's option 2b, if I understand him correctly. Given a string of the form :M260: I want to get a count of its occurrences in a single other string of the form :L000:W000:M260:B271:8:A: (Incidentally, the first string, :M260:, came from a line like :L121:M260:B250:L000:, but saying so probably only confused the issue. Since I want to compare each colon-delimited "chunk" of the source string to the target string, I assume I will need to loop through each chunk in the source string, comparing each "chunk" to the target string.) LONG VERSION: I think John W. Krahn's post is very close to what I was actually asking. John's solution showed me that the context of the problem may affect the choice of solution, so let me explain the entire problem, in case someone wants to look at the whole thing: I am comparing a "source" set of oil well names to a "target" set of oil well names to find the closest matches. Each well name consists of two parts: a name (like "McAffey" or "Idd El Shargi") and a number (like B-1). I print out the 50 or so best-matching "target" names for each name in the "source" list. I split the well name into "chunks" based on whitespace and punctuation. I also remove leading zeroes. If letters and numbers adjoin, I split those into separate chunks as well. I reduce each word in the well name to a Soundexed string and chain them together in a single string delimited by colons. I divide the well number into its separate numeric and alpha parts and include them in the string, but I don't Soundex them. I end up with data of the form :L520:T400:C000:S000:L200:8: :L520:T400:C000:S000:L200:8: :L520:T400:C000:S000:L200:24:E214: :L520:T400:C000:S000:M:24:E214: :L520:T400:C000:S000:L200:14:E214: :L520:T400:C000:S000:L200:14:E214: :L520:M260:C000:S000:L200:14:E214: :L520:T400:M260:S000:M260:14:E214: :L520:T400:C000:S000:L200:14:E214: To compare one well name to another to see how well they match, I compared each "chunk" from each name in the "source" list to each "chunk" in each name in the "target" list in a doubly-nested loop, which is quite expensive. I accumulate a score which is just the number of matches. Then I sort by the score and print the top 100. I thought string comparison would eliminate the need to loop across the "target" string and make the whole thing run much faster, but I need to know how many times the source is found in the target. That's what i couldn't figure out how to do without looping. I have also realized I need to do two separate comparison passes, one for the well name part (the Soundexed chunks) and a second pass on the high-scoring well names to compare the well number parts of them (the pure alpha or pure numeric chunks). I need to do this because the well number "chunks" give a lot of artificially high scores, since simple strings like "B" and "1" are pretty common in well numbers. I think John Krahn's post is just about what I was asking for. I just need a way to process the Soundexed "chunks" in the first comparison, and only the non-Soundexed (pure alpha or pure numeric) "chunks" in the second comparison. Thanks, Scott Scott E. Robinson SWAT Team UTC Onsite User Support RR-690 -- 281-654-5169 EMB-2813N -- 713-656-3629 "R. Joseph Newton" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] cc: Mark Anderson <[EMAIL PROTECTED]>, [EMAIL PROTECTED] Subject: Re: COunting the number of times a string matches in another 03/12/03 07:19 PM string [EMAIL PROTECTED] wrote: > Thanks, Rob and Mark, but I'm pretty sure I'm trying to do something a > little different from a count hash. Each token in the candidate string > needs to be compared separately to all the target strings, and then count > the number of matches. So take any token out of the first string -- > :M260: for example -- and count its matches against one of the target > strings, like :L520:M260:C000:S000:L200:14:E214:. I don't think a count > hash does that?? > > Thanks, > > Scott > > Scott E. Robinson > SWAT Team > UTC Onsite User Support > RR-690 -- 281-654-5169 > EMB-2813N -- 713-656-3629 Hi Scott, The first task of a programmer is to develop a clear specification for the functionality desired. It looks like you need to do some work here, because the specification is somewhat ambiguous. Right offhand, I can think of two interpretations for the functionality you describe: 1) You wish to get the account of occurences, in each other line, of each line in some basis line. You indicate something about the first line. Is this line somehow distinct from the others, that it is used as a basis? 2) a) You wish to get the count, for each unique token in each line, of occurences in each other line. b) You wish to get the count, for each token in each line, of occurences in each other line. Either of these would produce a lot more output, and require much more processing. Alternative b would also be redundant. What precisely are you trying to get? Joseph -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]