Hi Joe, On Feb 26, 2012, at 11:06 AM, Joe White wrote:
> Hi, Chris, > I would agree that we probably should come up with a more comprehensive > solution for this wrt the metadata object and the resulting XHTML. That > would make this feel a little more like the geospatial stuff is more of a > first class citizen in the metadata hierarchy. +1. > > We will probably need to support more coordinate systems than just WGS 84, as > there are a number of systems that either have no transformation to WGS 84. +1, agreed, WGS84 was just the first one that came to mind. > The encoding of the WKT is also pretty important. Would you rather break it > down to it's component parts, probably datum and projection for starters, or > leave it whole? Obviously, the more metadata we have, the more powerful Tika > becomes, but there is a point where you have too much data that is not as > useful. Let's start out with its component parts, datum and projection, and encode those as metadata fields. So we'd likely update the existing Geographic metadata interface with these new keys as a starter. > > On another note, I took a look at the code for your 605 patch, and I have a > suggestion. Reading the notes on the checkins for the patch, I noticed that > no one had suggested using the in-memory Dataset as the default type. There > is no reason why the stream used to open the Tika parser could not be used to > fill a buffer with the file data, and then use that to create a dataset. Hmm, so your suggestion is to use the in-memory Dataset API and that would be streamable via Tika? Hmm, that would be great, I just wasn't as familiar with GDAL to know how to do that, so a coding example if you have one in Java would help me to wrap my head around it. > > As it is, I'm trying to get GDAL to cooperate with me on my Mac. Being a > newcomer to Mac seems to be a drawback when trying to be productive. It just > takes a little more fight to get the bits to do what I really want. > Heh, yeah I was trying to do this too. At one point I had it running but a few OS upgrades have nixed that. Let's see if I can get it up and running again too so we can co-develop this. > In any case, once I get GDAL whipped into shape, I'll see if I can't get a > test file to recognize any geospatial data, and then we will be off and > running. Great! Cheers, Chris > On Feb 26, 2012, at 1:10 PM, Mattmann, Chris A (388J) wrote: > >> Hi Joe, >> >> Awesome! Thanks for picking this up and getting interested in this work. >> Right now, the only use cases we've had so far >> is to represent lats and lons (WGS84). It would be great to extract more >> information and come up with a policy for representing >> more WKTs and so forth. We should probably start by coming up with a scheme >> for encoding the extracted information in the >> Tika metadata object and in its output XHTML. Do you have any ideas about >> how to do that? Right now in the existing patch >> on TIKA-605, I simply was intended to use the met object and its >> key-multi-value structure to represent the extracted information >> but to take advantage of streaming and of content handlers, we ought to >> encode this information in the output XHTML. >> >> Thoughts? >> >> Cheers, >> Chris >> >> On Feb 26, 2012, at 9:39 AM, Joe White wrote: >> >>> Hi, >>> I'm looking into implementing a bridge/link between Tika and GDAL so that >>> geospatial information can be saved from georeferenced images and vector >>> types. One thing that I have noticed while going through the code is that >>> the code only defines geographic coordinate types, using latitudes and >>> longitudes. Is this by design? If GDAL is wrapped into Tika, and a >>> projected image is imported, are the geospatial extents meant to be held in >>> the metadata as geographic points, possibly as WGS 84? >>> >>> Thanks >>> >>> Joe White >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Senior Computer Scientist >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 171-266B, Mailstop: 171-246 >> Email: chris.a.mattm...@nasa.gov >> WWW: http://sunset.usc.edu/~mattmann/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Adjunct Assistant Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++