On Oct 11, 10:02 am, [EMAIL PROTECTED] wrote: > On Oct 11, 12:49 pm, Matimus <[EMAIL PROTECTED]> wrote: > > > > > On Oct 11, 9:11 am, brad <[EMAIL PROTECTED]> wrote: > > > > [EMAIL PROTECTED] wrote: > > > > However...how can you know it is a name... > > > > OK, I admitted in my first post that it was a crazy question, but if one > > > could find an answer, one would be onto something. Maybe it's not a 100% > > > answerable question, but I would guess that it is an 80% answerable > > > question... I just don't know how... yet :) > > > > Besides admitting that it's a crazy question, I should stop and explain > > > how it would be useful to me at least. Is a credit card number itself > > > valuable? I would think not. One can easily re and luhn check for credit > > > card numbers located in files with a great degree of accuracy, but a > > > number without a name is not very useful to me. So, if one could > > > associate names to luhn checked numbers automatically, then one would be > > > onto something. Or at least say, "hey, this file has luhn validated CCs > > > *AND* it seems to have people's names in it as well." Now then, I'd have > > > less to review or perhaps as much as I have now, but I could push the > > > files with numbers and names to the top of the list so that they would > > > be reviewed first. > > > > Brad > > > What the hell are you doing? Your post sounds to me like you have a > > huge amount of stolen, or at the very least misapprehended, data. Now > > you want to search it for credit card numbers and names so that you > > can use them. > > > I am not cool with this! This is a public forum about a programming > > language. What makes you think that anybody in this forum will be cool > > with that. Perhaps you aren't doing anything illegal, but it sure is > > coming off that way. If you are doing something illegal I hope you get > > caught. > > > At the very least, you might want to clarify why you are looking for > > such capability so that you don't get effectively black-listed (well, > > by me at least). > > > Matt > > Go have a beer and calm down a bit :) It's a legitimate purpose, > although it could (and probably is being used by bad guys right now). > My intent, as you can see from the links below, is to catch it before > the bad guys do. > > http://filebox.vt.edu/users/rtilley/public/find_ccns/http://filebox.vt.edu/users/rtilley/public/find_ssns/ > > Brad
Its just past 10:00 am where I am... I know customs vary, but generally beer before lunch is frowned upon :). I know the tone of posts does not carry well over the web, but I was really just trying to point out that your previous post sounded very shady, and at the very least some clarification was in order. I wasn't standing on my desk frothing at the mouth or anything. On to my suggestion. I think you are going to have to use statistical analysis. That is, you won't get something that reliably returns a boolean, but maybe something that says there is a 75% chance that there are names in a given file. You can't know that a given string is or isn't a name, you can only know that it is probably a name based upon how often it is used in that context. Either way this isn't a simple problem to solve, and it probably involves creating a database of words that shows what percentage of the time they are used as names. How such a database is created... that is the hard part. There may be tools out there for such analasys, but that isn't an area I have any experience in. Matt -- http://mail.python.org/mailman/listinfo/python-list