> > Hasn't someone already fixed this problem? If there isn't a CPAN module > to > > perform standardized bibliographic reference formatting/parsing. I > haven't > > looked at CPAN; did either of you? If a CPAN module doesn't exist, one > > should! > > > > What standard? > > Kalthoff K (2001) Analysis of biological development. McGraw-Hill, NY. > > > Or > > > > Manning JT, Barley L, Walton J, Lewis-Jones DI, Trivers RL, Singh D, > > Thornhill R, Rohde P, Bereczkei T, Henzi P, Soler M, Szwed A. (2000) The > > 2nd:4th digit ratio, sexual dimorphism, population differences, and > > reproductive success. evidence for sexually antagonistic genes? Evol Hum > > Behav. 21(3):163-183. > > > Or > > > > Berger, M., Lawrence, M., Demichelis, F., Drier, Y., Cibulskis, K., > > Sivachenko, A., Sboner, A., Esgueva, R., Pflueger, D., Sougnez, C., > Onofrio, > > R., Carter, S., Park, K., Habegger, L., Ambrogio, L., Fennell, T., > Parkin, > > M., Saksena, G., Voet, D., Ramos, A., Pugh, T., Wilkinson, J., Fisher, > S., > > Winckler, W., Mahan, S., Ardlie, K., Baldwin, J., Simons, J., > Kitabayashi, > > N., MacDonald, T., Kantoff, P., Chin, L., Gabriel, S., Gerstein, M., > Golub, > > T., Meyerson, M., Tewari, A., Lander, E., Getz, G., Rubin, M., & > Garraway, > > L. (2011). The genomic complexity of primary human prostate cancer > Nature, > > 470 (7333), 214-220 DOI: 10.1038/nature09744 > > > ? > > If there's a standard, then sure, someone has probably put that into CPAN. > The problem is that I don't think that there is, though I'd be glad to be > proven wrong. > >
> > What I want to be able to do eventually is parse each name separately and > > associate that with the title. I am not sure how yet, but I haven't even > > got > > there. > > > > > That can range from pretty simple to fairly complex, depending on how much > you want to squeeze out of that relationship. If you just want to be able > to > say "Morgan, M.J wrote an article for X journal, titled Y", then that's > just > a hash (of hashes), and you need to look no further than this mail. But if > you also want to say, "Journal X has these authors. One of them is Wilson, > C.E, who co-wrote article Y, where Crim, L.W. was also a collaborator, and > whose primary author is Morgan, M.J.", then hashes will probably not cut it > anymore (a cyclical hash of hashes might do, but that's pretty tough to > handle, and _very_ rough on the eyes). You'll probably want an object model > there, or some database interaction. > > But we are getting ahead of ourselves for now :) > > I figured that eventually it would be easier to somehow pass the results into mySQL tables, but I left that bridge to be crossed once I get there. > > > > It works fine for the first name, but as expected if @entries contain > > several strings with authors names (I did that by matching the year and > > storing $` in the @entries) it will match the first author and it will go > > to > > the next $entries. Is there a way to match the pattern more than once, > but > > to store each match separately? > > > > You are looking for the /g switch. You can look it up in perlretut[0]. > > I actually remember reading on the Llama book that the /g modifier could be use with m// also and not only with s/// and thinking but when would you need it with m//. :) > For example, would I be able to store > > Morgan, M.J. as one item in an array and Wilson, C.E. as another one? > > > > > > > Sure. the my @names = ... from above will suffice for that. But chances are > you want more than that - In general, you have two options. Either you make > several small regexes to extract the data piece by piece, or you create a > grammar to do the job for you. For the latter, there's two main options: a > (?(DEFINE)) pattern, which is Pure Perl and in the language since 5.010, or > you pull out Regexp::Grammars from CPAN. They are pretty similar, but > Regexp::Grammars is much more powerful, letting you access the full parse > tree - so what I'll have to do in two steps in the next snippet, R::G would > do in one. > > Here's my stab at it, using (?(DEFINE))[1], named captures[2], Unicode > character properties[3], and a probably unnecessary lookbehind[1] in the > split by the end. I made some arbitrary assumptions on the data, like > saying > that a title can't be longer than 52 characters, or can't have a period in > it, or that the journal's name can't have digits in it, which I suppose is > a > tad disingenuous, but take it as an example, not a solution : P > > Thanks! This gives me a lot to read on. Cheers, T. -- "Education is not to be used to promote obscurantism." - Theodonius Dobzhansky. "Gracias a la vida que me ha dado tanto Me ha dado el sonido y el abecedario Con él, las palabras que pienso y declaro Madre, amigo, hermano Y luz alumbrando la ruta del alma del que estoy amando Gracias a la vida que me ha dado tanto Me ha dado la marcha de mis pies cansados Con ellos anduve ciudades y charcos Playas y desiertos, montañas y llanos Y la casa tuya, tu calle y tu patio" Violeta Parra - Gracias a la Vida Tiago S. F. Hori PhD Candidate - Ocean Science Center-Memorial University of Newfoundland