After adding extensive logging I identified the problem: the combined regex pattern was not matching the entirety of values that are unix seconds since the epoch. I fixed that problem in the Groovy script, and it ran as expected.
Thank you both for your comments, Paul and Christopher. On Thu, Jun 20, 2024 at 6:19 PM Paul King <pa...@asert.com.au> wrote: > This would be my expectation: > > import java.time.Instant > import java.time.ZoneId > import groovy.json.JsonBuilder > > def lastModifiedView = '1652135219'.toLong() > def zoneId = ZoneId.of('America/Los_Angeles') > def date = > Instant.ofEpochSecond(lastModifiedView).atZone(zoneId).toLocalDate() > def result = [lastModifiedView: date] > assert new JsonBuilder(result).toPrettyString() == '''{ > "lastModifiedView": { > "year": 2022, > "month": "MAY", > "chronology": { > "calendarType": "iso8601", > "id": "ISO" > }, > "dayOfMonth": 9, > "dayOfWeek": "MONDAY", > "dayOfYear": 129, > "era": "CE", > "leapYear": false, > "monthValue": 5 > } > }''' > > And works fine for me. It wasn't clear if you wanted different > information in the serialization or just flagging that somewhere your > code is differing from above because of the different values in the > output. > > Paul. > > < > https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail > > > Virus-free.www.avast.com > < > https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail > > > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > > On Fri, Jun 21, 2024 at 7:25 AM James McMahon <jsmcmah...@gmail.com> > wrote: > > > > Hello. I have a json key named viewLastModified. It has a value of > 1652135219. Using an Epoch Converter manually ( > https://www.epochconverter.com/), I expect to convert this with my Groovy > script to something in this ballpark: > > GMT: Monday, May 9, 2022 10:26:59 PM > > Your time zone: Monday, May 9, 2022 6:26:59 PM GMT-04:00 DST > > Relative: 2 years ago > > > > But my code fails, and I'm not sure why. > > Using the code I wrote, I process it and get this result: > > "viewLastModified": [ > > { > > "chronology": { > > "calendarType": "iso8601", > > "id": "ISO", > > "isoBased": true > > }, > > "dayOfMonth": 11, > > "dayOfWeek": "SATURDAY", > > "dayOfYear": 192, > > "era": "CE", > > "leapYear": false, > > "month": "JULY", > > "monthValue": 7, > > "year": 1970 > > } > > ] > > > > Can anyone see where I have an error when I try to process a pattern > that is seconds since the epoch? > > > > My code: > > import java.util.regex.Pattern > > import java.time.LocalDate > > import java.time.LocalDateTime > > import java.time.format.DateTimeFormatter > > import java.time.format.DateTimeParseException > > import java.time.Instant > > import java.time.ZoneId > > import groovy.json.JsonSlurper > > import groovy.json.JsonBuilder > > import org.apache.nifi.processor.io.StreamCallback > > import org.apache.nifi.flowfile.FlowFile > > > > // Combined regex pattern to match various date formats including Unix > timestamp > > def combinedPattern = Pattern.compile(/\b(\d{8})|\b(\d{4}[' > ,-\\/]+\d{2}[' ,-\\/]+\d{2})|\b(\d{2}[' ,-\\/]+\d{2}[' > ,-\\/]+\d{4})|\b(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[' > ,-\\/]+\d{2}[' > ,-\\/]+\d{4}|\b(?:January|February|March|April|May|June|July|August|September|October|November|December)[' > ,-\\/]+\d{2}[' ,-\\/]+\d{4}\b|\b\d{10}\b/) > > > > // Precompile date formats for faster reuse > > def dateFormats = [ > > DateTimeFormatter.ofPattern('yyyyMMdd'), > > DateTimeFormatter.ofPattern('dd MMM, yyyy'), > > DateTimeFormatter.ofPattern('MMM dd, yyyy'), > > DateTimeFormatter.ofPattern('yyyy MMM dd'), > > DateTimeFormatter.ofPattern('MMMM dd, yyyy') > > ] > > > > // Helper function to parse a date string using predefined formats > > def parseDate(String dateStr, List<DateTimeFormatter> dateFormats) { > > for (format in dateFormats) { > > try { > > return LocalDate.parse(dateStr, format) > > } catch (DateTimeParseException e) { > > // Continue trying other formats if the current one fails > > } > > } > > return null > > } > > > > // Helper function to parse a Unix timestamp > > def parseUnixTimestamp(String timestampStr) { > > try { > > long timestamp = Long.parseLong(timestampStr) > > // Validate if the timestamp is in a reasonable range > > if (timestamp >= 0 && timestamp <= > Instant.now().getEpochSecond()) { > > return > Instant.ofEpochSecond(timestamp).atZone(ZoneId.systemDefault()).toLocalDateTime().toLocalDate() > > } > > } catch (NumberFormatException e) { > > // If parsing fails, return null > > } > > return null > > } > > > > // Helper function to validate date within a specific range > > boolean validateDate(LocalDate date) { > > def currentYear = LocalDate.now().year > > def year = date.year > > return year >= currentYear - 120 && year <= currentYear + 40 > > } > > > > // Function to process and normalize dates > > def processDates(List<String> dates, List<DateTimeFormatter> > dateFormats) { > > dates.collect { dateStr -> > > def parsedDate = parseDate(dateStr, dateFormats) > > if (parsedDate == null) { > > parsedDate = parseUnixTimestamp(dateStr) > > } > > log.info("Parsed date: ${parsedDate}") > > parsedDate > > }.findAll { it != null && validateDate(it) } > > .unique() > > .sort() > > } > > > > // Define the list of substrings to check in key names > > def dateRelatedSubstrings = ['birth', 'death', 'dob', 'date', 'updated', > 'modified', 'created', 'deleted', 'registered', 'times', 'datetime', 'day', > 'month', 'year', 'week', 'epoch', 'period'] > > > > // Start of NiFi script execution > > def ff = session.get() > > if (!ff) return > > > > try { > > log.info("Starting processing of FlowFile: ${ff.getId()}") > > > > // Extract JSON content for processing > > String jsonKeys = ff.getAttribute('payload.json.keys') > > log.info("JSON keys: ${jsonKeys}") > > def keysMap = new JsonSlurper().parseText(jsonKeys) > > def results = [:] > > > > // Process each key-value pair in the JSON map > > keysMap.each { key, value -> > > def datesForThisKey = [] > > log.info("Processing key: ${key}") > > > > // Check if the key contains any of the specified substrings > > if (dateRelatedSubstrings.any { key.toLowerCase().contains(it) > }) { > > // Read and process the content of the FlowFile > > ff = session.write(ff, { inputStream, outputStream -> > > def bufferedReader = new BufferedReader(new > InputStreamReader(inputStream)) > > def bufferedWriter = new BufferedWriter(new > OutputStreamWriter(outputStream)) > > String line > > > > // Read each line of the input stream > > while ((line = bufferedReader.readLine()) != null) { > > // Check if the line contains the key > > if (line.contains(key)) { > > def matcher = combinedPattern.matcher(line) > > // Find all matching date patterns in the line > > while (matcher.find()) { > > datesForThisKey << matcher.group(0) > > } > > } > > bufferedWriter.write(line) > > bufferedWriter.newLine() > > } > > > > bufferedReader.close() > > bufferedWriter.close() > > } as StreamCallback) > > > > // Process and store dates for the current key > > if (!datesForThisKey.isEmpty()) { > > log.info("Found dates for key ${key}: > ${datesForThisKey}") > > results[key] = processDates(datesForThisKey, dateFormats) > > log.info("Processed dates for key ${key}: > ${results[key]}") > > } > > } else { > > log.info("Key ${key} does not contain date-related > substrings, skipping.") > > results[key] = [] > > } > > } > > > > // Serialize results to JSON and store in FlowFile attribute > > def jsonBuilder = new JsonBuilder(results) > > ff = session.putAttribute(ff, 'payload.json.dates', > jsonBuilder.toPrettyString()) > > log.info("Successfully processed FlowFile: ${ff.getId()}") > > session.transfer(ff, REL_SUCCESS) > > } catch (Exception e) { > > log.error("Failed processing FlowFile: ${ff.getId()}", e) > > session.transfer(ff, REL_FAILURE) > > } > > > > I'm producing something, but it isn't the correct something. >