Just wondering if this will help you: https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/time/DateUtils.html#parseDate-java.lang.String-java.util.Locale-java.lang.String...-
You'll still need to extract the candidate date strings but once you have them, this can parse them using various formats. Perhaps...since we are now in "The Age of AI" (:-)), you could use Apache OpenNLP, per this: https://stackoverflow.com/questions/27182040/how-to-detect-dates-with-opennlp I've used NLP in other situations...it's not popular but it does the job nicely. A bit more of a general discussion: https://www.baeldung.com/cs/finding-dates-addresses-in-emails Hope this helps. BOB ________________________________ From: Jochen Theodorou <blackd...@gmx.org> Sent: Wednesday, 14 June 2023 4:42 AM To: users@groovy.apache.org <users@groovy.apache.org> Subject: Re: Existing resources to seek date patterns from raw data and normalize them On 13.06.23 16:52, James McMahon wrote: > Hello. I have a task to parse dates out of incoming raw content. Of > course the date patterns can assume any number of forms - YYYY-MM-DD, > YYYY/MM/DD, YYYYMMDD, MMDDYYYY, etc etc etc. I can build myself a robust > regex to match a broad set of such patterns in the raw data, but I > wonder if there is a project or library available for Groovy that > already offes this? I always wanted to try one time https://github.com/joestelmach/natty/tree/master or at least https://github.com/sisyphsu/dateparser... never came to it ;) > Assuming I get pattern matches parsed out of my raw data, I will have a > collection of strings representing year-month-days in a variety of > formats. I'd then like to normalize them to a standard form so that I > can sort and compare them. I intend to identify the range of dates in > the raw data as a sorted Groovy list. once you have the library identified the format this is the easy step [...] > I intend to write a Groovy script that will run from an Apache NiFi > ExecuteScript processor. I'll read in my data flowfile content using a > buffered reader so I can handle flowfiles that may be large. what does large mean? 1TB? Then BufferedReader may not be the right choice ;) bye Jochen