[ https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983597#comment-13983597 ]
Ann Burgess commented on TIKA-1274: ----------------------------------- Hi Nick, Thank you for the git repo tips. I added the 'target' directory and I was mimicking the directory structure of the tika build - consider it removed. On that note, I'd appreciate any documentation on the dos and don'ts of building a git repo for Tika or other Apache projects... if such documentation exists. As for the file contents, ENVI header files<http://www.exelisvis.com/docs/ENVIHeaderFiles.html>are plain text documents. The contents of the ENVI header files are, in fact, metadata for a corresponding data file, i.e. to read a file named some_file.img, it requires the corresponding file some_file.img.hdr. In other words, because the entire contents of a some_file.img.hdr file is metadata for some_file.img, the actual contents of the some_file.img.hdr file do NOT describe the .hdr file itself, rather they describe the .img file. That is why I didn't think it appropriate to move parts of the 'raw content' into metadata. Does that make sense? I'm also very open to how this sort of thing is normally treated or to open a conversation about the topic of how to treat one file type describing another file type. Thanks for the input and any further suggestions. -- ------------------------------------------------------------------------------------------ Ann Bryant Burgess, PhD Postdoctoral Fellow Computer Science Department University of Southern California Viterbi School of Engineering Los Angeles, CA Alaska Science Center/USGS Anchorage, AK Cell: (585) 738-7549 Office: (907) 786-7059 Fax: (907) 786-7150 E-mail: anniebryant.burg...@gmail.com Office Address: 4210 University Dr., Anchorage, AK 99508-4626 ------------------------------------------------------------------------------------------- > ENVI header parser > ------------------ > > Key: TIKA-1274 > URL: https://issues.apache.org/jira/browse/TIKA-1274 > Project: Tika > Issue Type: New Feature > Components: parser > Affects Versions: 1.5 > Reporter: Ann Burgess > Assignee: Chris A. Mattmann > Labels: mime, newbie, parser, patch > > I have written a parser that extracts text and metadata from ENVI header > files, currently called at the command line as: > abryant:tika abryant$ java -classpath > annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar > org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr > Content-Encoding: ISO-8859-1 > Content-Length: 818 > Content-Type: application/envi.hdr > resourceName: MOD09GA_test_header.hdr > abryant:tika abryant$ java -classpath > annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar > org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr > ENVI > description = { > GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]} > samples = 2400 > lines = 2400 > bands = 7 > header offset = 0 > file type = ENVI Standard > data type = 2 > interleave = bip > sensor type = Unknown > byte order = 0 > map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, > 4.6331271653e+02, 4.6331271653e+02, , units=Meters} > projection info = {16, 6371007.2, 0.000000, 0.0, 0.0, Sinusoidal, > units=Meters} > coordinate system string = > {PROJCS["Sinusoidal",GEOGCS["GCS_ELLIPSE_BASED_1",DATUM["D_ELLIPSE_BASED_1",SPHEROID["S_ELLIPSE_BASED_1",6371007.181,0.0]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Sinusoidal"],PARAMETER["False_Easting",0.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",0.0],UNIT["Meter",1.0]]} > wavelength units = Unknown > ______________ > As a current non-certified committer, could someone enlighten me to the steps > needed to submit this new parser for review. > The parser is located in my directory structure as: > /users/annbryant/tika/tika/anniedev/src/main/java/edu/usc/sunset/abburgess/tika/EnviFileReader.class > My custom mimetypes.xml file is located at: > /Users/annbryant/TIKA/tika/anniedev/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml -- This message was sent by Atlassian JIRA (v6.2#6252)