On Jul 11, 2014, at 8:01am, Avi Hayun wrote:
> Hi,
>
> Scenario:
> 1. I use tika-core in my app
> 2. I use the following to detect the stream's media type:
>
> byte[] bytes = IOUtils.toByteArray(new URL("http://www.amazon.com/sitemap_
> video.xml"));
> String contentType = new Tika().detect(by
t; at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
> ... 25 more
>
> --
> --
> Hong-Thai
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
ppt (14)
> - xls (9)
> - dwg (4)
> - odp (495)
> - odt (839)
> - pps (2)
> - ods (1)
>
> 1.7-SNASPHOT:
> - pdf (7) - pptx (10) - doc (6) - ppt (14) - xls (9) - dwg (4) - odp (2) -
> pps (2)
>
>
> On Thu, Sep 11, 2014 at 8:55 PM, Ken Krugler
> wrote:
>
> Can you tell me what i can do to parse all tag of html.
>
> Thanks advance!
>
> Regards,
> Tang Thi Phuong Linh.
> --
> P.Linh
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascadin
effort SVN-using committers would have to expend?
>
> I don't mean to incite a VCS war. ;)
git v. svn is more like a brushfire that flares up every few months, at least
on the @members list :)
-- Ken
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
://svn.apache.org/repos/asf/tika/trunk'
> svn: E000111: Error running context: Connection refuse
>
> Can it be related to the recent infra-related issue or is it just a temp
> problem ?
Working for me, just tried.
-- Ken
--
Ken Krugler
+1 530-210-
33a/tika-parsers/src/main/java/org/apache/tika/parser/txt/CharsetDetector.java>.
> Is this correct and OK to use?
>
> Thanks,
> Tyler
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassa
yone have any last minute issues they'd like to finish and see in
> Tika 1.X? I'd like to get the example working with CORS (TIKA-1585 and
> TIKA-1586). Any others?
>
> Have a good weekend,
> Tyler
--
Ken Krugler
+1 530-210-6378
http://www.s
gt; gpg: There is no indication that the signature belongs to the owner.
> Primary key fingerprint: 1D32 9CC2 D69C 821B FBE4 183E 8810 BB19 D4F1 0117
>
> Not sure if Chris, Lewis et al are near you and do this quickly?
>
> Cheers,
> Dave
--
.8
> [ ] ±0 I don't object to this release, but I haven't checked it
> [ ] -1 Do not release this package because...
>
> Thanks,
> Tyler
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
ae2d7fdd31.
>>
>> In addition, a staged maven repository is available here:
>> https://repository.apache.org/content/repositories/orgapachetika-1009
>>
>> Please vote on releasing this package as Apache Tika 1.8. The vote is
> open for the next 72 hours and pass
> https://dist.apache.org/repos/dist/dev/tika/
>>>
>>> The release candidate is a zip archive of the sources in:
>>> http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/
>>>
>>> The SHA1 checksum of the archive is
>>> 5e22fee9079370398472e59082
e: text/x-java-source
> LoC: 70
> X-Parsed-By: org.apache.tika.parser.DefaultParser
> X-Parsed-By: org.apache.tika.parser.code.SourceCodeParser
> resourceName: UrlParser.java
>
> Should I build a parser for each file format to get an exact content-type, as
> Java has Sour
vely reverse engineering (when we
> find that Tika is wrong) from a non-Apache project?
>
> Any other sensitivities I should be aware of?
>
> Best,
>
> Tim
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
ind of previous project are you looking into?
It's the Krugle code search product.
Being sold as enterprise software, but they might be willing to open source the
parsing code.
-- Ken
> ____________
> From: Ken Krugler [kkrugler_li...@transpac.com]
>
dit that, but I don't know where in the sequence it makes sense. I assume
it should be in step 13, "Update Tika site"
Thanks,
-- Ken
------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
il: chris.a.mattm...@nasa.gov
> WWW: http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++
>
>
-
ority of at least
> three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.10
>
> [ ] -1 Do not release this package because...
>
> Here is my +1!
>
> Cheers,
> Dave
--
Ken Krugler
+1 530-210
>
>
> This email communication (including any attachments) contains information
> from Answers Corporation or its affiliates that is confidential and may be
> privileged. The information contained herein is intended only for the use
> of the addressee(s) named above. If you
Hi all,
As part of integrating language-detector into Tika (see TIKA-1723), I noticed
TIKA-546 ("Add ability to create language profiles to tika-app")
If we switch over to language-detector, then this code no longer makes sense.
Also note that many language detectors require the full set of lan
k
>> Components: parser
>>Reporter: Madhav Sharan
>>
>>
>> As of now tika uses lucene-geo-gazetteer CLI to extract co-ordinates of a
>> location. CLI requires jvm and lucene to instantiate for every request.
>> With all new REST api
flicting dependencies managed by maven.
I don't have any experience with moving classes around to create modules, so my
natural inclination is to move the sources.
As far as shared code, I think moving something like commons-codec into core
(100K) is fine.
-- Ken
-
ection (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov
> WWW: http://sunset.usc.edu/~mattmann/
> ++++++++++
> Adjunct Associate Professor,
Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov
> WWW: http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Associate Professor, Computer Science Depar
VN", and
> http://tika.apache.org/contribute.html still talks about SVN being our master.
>
> What's the status? Have we switched? Still in progress? Where should we
> commit to? Is it time to delete our SVN checkouts and re-checkout from git?
>
> Cheers
>
t
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++
>>>
>>> -Original Message-
>>> From: Markus Jelsma
>>> Reply-To: "u...@tika.apache.org&quo
on releasing this package as Apache Tika 1.12.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.12
> [ ] -1 Do not release this package because…
>
> Cheers,
>
.
Thanks,
-- Ken
----------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
/tika/trunk/tika-langdetect
scm:svn:https://svn.apache.org/repos/asf/tika/trunk/tika-langdetect
What's the plan (if any) for switching to git details in poms?
Thanks,
-- Ken
------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions
- Ken
------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
Is there a document where we're tracking what (breaking) API changes are
occurring in the 2.x branch, and the migration path from 1.x for Tika users?
If not, should this be a wiki page that we all edit iteratively?
Thanks,
-- Ken
------
Ken Krugler
+1 530-210-6378
ServiceLoader require that these be interfaces? I assume not, as
isAssignableFrom() should work with either interfaces or abstract classes,
right?
Asking because I'm looking at the language detector API for 2.x.
Thanks,
-- Ken
------
Ken Krugler
+1 530-210-6378
nset.usc.edu/~mattmann/
> ++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
ith Tika 1.11 language detector.
> https://docs.google.com/spreadsheets/d/1cW6S2WpiN08pZ3UMVGMyQkO-fotUiUyGRemCrbC1miY/edit?usp=sharing
>
> I was also looking at the work done by Ken Krugler on Tika's 2.x branch
> language detection and I was planning to fork that project and add the
> Text
community to add our method there, wait for a new release and use
> that!
See https://issues.apache.org/jira/browse/TIKA-1706 for the issue - and seems
like 2.0 is a fine place to make the clean switch to just using Commons IOUtils.
-- Ken
--
Ken Krugler
+1 530-2
th-america/program/schedule
------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
evelopers/github.html>
Isn't this something we’d want to do as well?
Thanks,
— Ken
----------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
;t make it to Vancouver this week, the slides from my
> "What's new with Apache Tika 2.0" talk are now available online:
> http://www.slideshare.net/NickBurch2/apache-tika-whats-new-with-20
>
> The audio was recorded, hopefully that will be available to go with the
>
org/browse/OSSRH-22250, looks like it’s
https://in.linkedin.com/in/meetabhishekjindal
<https://in.linkedin.com/in/meetabhishekjindal>
— Ken
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
est coverage to ensure common usecases won't be broken, of course.
>
> [1]:
> https://issues.apache.org/jira/browse/TIKA-2007?focusedCommentId=15435206&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15435206
> --
>
> Be
his issue doesn't
> affect me directly.
>
> [1]: http://proguard.sourceforge.net/index.html#manual/usage.html
> [2]: http://www.oracle.com/technetwork/java/javase/clopts-139448.html#gbmtm
>
>
> ср, 24 авг. 2016 г. в 21:16, Ken Krugler :
>
>> I think excluding mor
s.apache.org/jira/browse/INFRA-12186, it will help
> us to reduce major bugs in Tika over time.
> Thanks
> Lewis
>
> --
> http://home.apache.org/~lewismc/
> @hectorMcSpector
> http://www.linkedin.com/in/lmcgibbney
--
Ken Krugler
+1 530-210-6378
http
Hi Lewis,
> On Sep 21, 2016, at 2:32pm, lewis john mcgibbney wrote:
>
> Hi Ken,
> Good question. Answer below
>
> On Wed, Sep 21, 2016 at 2:16 PM, wrote:
>
>>
>> From: Ken Krugler
>> To: dev@tika.apache.org
>> Cc:
>> Date: Tu
-1 Do not release this package because..
>
> Cheers,
> Chris
>
> P.S. Of course here is my +1.
>
>
>
>
>
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
[ ] +1 Release this package as Apache Tika 1.14
> [ ] -1 Do not release this package because..
>
> Cheers,
> Chris
>
> P.S. Of course here is my +1.
>
>
>
>
>
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.20
> [ ] -1 Do not release this package because...
>
> Here's my +1.
>
> Cheers,
>
> Tim
-
nl
> ruru
> zhlt
>
> Is there something that needs to be done to enable the detection of Asian
> languages or should I file this as a bug report?
>
> Thanks,
>
> Mike
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.co
at
it then.
Regards,
— Ken
> On Jan 17, 2019, at 1:48 PM, Mike Thomsen wrote:
>
> Ken,
>
> Here's a Gist version of it:
>
> https://gist.github.com/MikeThomsen/84abb89aab903a8b21d64af532cc369b
>
> Thanks,
>
> Mike
>
> On Thu, Jan 17, 2019 at
s to do so. Maybe only Tim as PMC Chair can.
>
> --
> Best regards,
> Konstantin Gribov.
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra
ing wiki migration (from moin to
>> confluence)?
>>>
>>> I can try it via selfservice.a.o if you consent but I'm not sure if I
>> have
>>> enough access to do so. Maybe only Tim as PMC Chair can.
>>>
>>> --
>>> Best regards,
>>> Konstantin Gribov.
>>
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra
failed.
>
> In short, this is an area for improvement. I suspect our current
> mechanism would also be pretty awful on UTF-16.
>
> On Tue, Jun 18, 2019 at 4:26 PM Ken Krugler
> wrote:
>>
>> Hi devs,
>>
>> I’m trying to remember the history of how Tika’s cu
+1
— Ken
> On Jul 15, 2019, at 2:37 PM, Tim Allison wrote:
>
> Anyone have anything they want to get into 1.22? If not, I’ll kick off the
> regression tests shortly.
>
> Cheers,
> Tim
----------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimite
ika.apache.org/
>
> -- Tim Allison, on behalf of the Apache Tika community
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra
Cheers,
>
> Tim
------
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
o the text to PDF
> (for a start, something on top of that transformer), and then may be even
> for other formats ?
>
> Sergey
--
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
ent(someImage);
> creator.complete();
>
> It would be consistent with the Tika approach on the read side.
>
> Cheers, Sergey
> On Mon, Oct 14, 2019 at 4:13 PM Ken Krugler wrote:
>
>> If you’re suggesting ways to make it easier to use something like
>> YaHPConver
ve for
> change notifications double-check!)
>
> Thanks
> Nick
--
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
a wiki page, making a whole sale set of
> edits, getting review of those edits from the community, and assuming it
> passes muster, then bringing the edits back to the original page?
>
>
>
> Eric
>
>> On Oct 29, 2019, at 7:00 PM, Ken Krugler wrote:
>>
&g
ime to start working on integrating Bob's
>>
>> work on the current main branch. I'll have to ignore most of the incoming
>>
>> issues for a bit...unlike the last 4 years...this time I mean it. :)
>>
>> Let me know if there are any objections to heading down this path now.
>>
>>
>>
>> Cheers,
>>
>>
>>
>> Tim
--
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
ache.org/jira/browse/TIKA-2917
>>>Project: Tika
>>> Issue Type: Improvement
>>> Reporter: Tim Allison
>>> Assignee: Tim Allison
>>> Priority: Minor
>>>
>>> Inline images may have XMP associated with them. We are not currently
>>> extracting this metadata.
>>
>>
>>
>> --
>> This message was sent by Atlassian JIRA
>> (v7.6.14#76016)
--
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
e/lib/security/
> sudo cp ~/Downloads/UnlimitedJCEPolicyJDK8/local_policy.jar
> $JAVA_HOME/jre/lib/security/
— Ken
>
> On Fri, Nov 20, 2020 at 1:43 PM Ken Krugler
> wrote:
>
>> Hi all,
>>
>> I was trying to build the 1.25-rc1 branch, and ran into this same issue
>&
asing the
XMLReaderUtils.POOL_SIZE
Nov 21, 2020 10:39:07 PM org.apache.tika.utils.XMLReaderUtils acquireSAXParser
WARNING: Contention waiting for a SAXParser. Consider increasing the
XMLReaderUtils.POOL_SIZE
… and so on…
Any suggestions?
Thanks!
— Ken
--
Ken Krugler
http://www.scaleunlimite
131-b11, mixed mode)
— Ken
> On Mon, Nov 23, 2020 at 1:40 PM Ken Krugler
> wrote:
>
>> Hi all,
>>
>> I got past the JCE issue, but now some tests are failing with timeouts.
>>
>> For this test:
>>
>> [INFO] Running org.apache.tika.parser.micr
rent SAXParser which is not handled correctly in
> XMLReaderUtils? What OS, what version of java?
>
> Thank you, again.
>
> Best,
>
> Tim
>
> On Mon, Nov 23, 2020 at 1:40 PM Ken Krugler
> wrote:
>
>> Hi
a>
>
> Please vote on releasing this package as Apache Tika 1.25.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.25
> [ ] -1 Do not release this package bec
Allison
>> Priority: Minor
>>Fix For: 2.0.0
>>
>>
>> We or our dependencies use 4? json parsers last time I looked. It feels like
>> a majority of our dependencies use jackson. I used to have a preference for
>> GSON, which is why we h
dency, etc.
>
> Some options for classic-> basic, base, ...what else?
>
> Any other recommendations for these names? Thank you!
>
> Best,
>
> Tim
--
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
but was:<[
>
> ]>
> Tests in error:
> ODFParserTest.testNullStylesInODTFooter:367 » WriteLimitReached Your
> document ...
>
> ODFParserTest.testParagraphLevelFontStyles:388->TikaTest.getXML:191->TikaTest.getXML:205
> » SAX
--
K
tracking issue
> [5]: http://checkstyle.sourceforge.net/
> [6]: https://maven.apache.org/plugins/maven-checkstyle-plugin/
>
>
>
> --
>
> Best regards,
> Konstantin Gribov
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
> (Maven/Ant+Ivy/Gradle/SBT/whatever) in their projects,
> so it shouldn't be something bothersome for end user.
>
> What do you think, folks?
>
> [1]: https://issues.apache.org/jira/browse/TIKA-2314
>
> --
>
> Best regards,
> Konstantin Gribov
a step in the release?
No, I don’t believe so.
> Does it take a few weeks for the sync?
Here’s what I’ve heard (from a forum post):
> Also FYI, mvnrepository.com is unaffiliated with Maven Central, and lags it
> by anywhere from a few hours to a few days
So potentially a few days.
Hi devs,
I’m curious about the occasional use of java.util.logging in Tika:
> ./tika-core/src/main/java/org/apache/tika/config/InitializableProblemHandler.java:import
> java.util.logging.Logger;
> ./tika-core/src/main/java/org/apache/tika/config/LoadErrorHandler.java:import
> java.util.logging.
Thanks for the ref, Tim.
I’m curious why SolrCell doesn’t fire up threads when parsing docs with Tika
(or use the fork parser), to mitigate issues with hangs & crashes?
— Ken
> On May 29, 2018, at 11:54 AM, Tim Allison wrote:
>
> All,
>
> Over the weekend, Shawn Heisey very kindly drafted a
> 2018-05-29 16:11 GMT-03:00 Ken Krugler :
>
>> Thanks for the ref, Tim.
>>
>> I’m curious why SolrCell doesn’t fire up threads when parsing docs with
>> Tika (or use the fork parser), to mitigate issues with hangs & crashes?
>>
>> — Ken
e target? This would allow us to bake modularity in now.
> Given that I haven't actually tried modularizing/jigsawizing Tika yet, this
> could be a complete disaster, of course. :)
>
> Cheers,
>
> Tim
--
Ken Krugler
+1 530-
M Nicholas DiPiazza
>>> wrote:
>>>>
>>>> +1 on 1.27 release.
>>>>
>>>> On Mon, Jun 28, 2021, 10:57 AM Tim Allison wrote:
>>>>>
>>>>> All,
>>>>> The recent release of PDFBox fixed 2 DoS CVEs
est this with more
> recent versions of the surefire plugin, or is there a recommended
> workaround?
>
> Thank you.
>
>Best,
>
> Tim
>
> [0]
> http://maven.apache.org/surefire/maven-surefire-plugin/faq.html#vm-termination
--
gt; a) tika-pipes hands-on workshop
> b) get to know the users -- 5 minute go-around the room "this is how
> we use it; these are our pain points"
> c) ???
>
> Again, thank you!
>
> Best,
>
> Tim
--
Ken K
s successfully.
>>
>>>
>>> [X] +1 Release this package as Apache Tika 2.2.0
>>
>> I did notice that the tika DL's module(s) are pulling in the enire Hadoop
>> dependency chain. I wonder if we can cut down on this... that is however a
>> concern outside of this release candidate review.
>>
>> Thanks for the quick turnaround.
>> lewismc
>>
--
Ken Krugler
http://www.scaleunlimited.com
Custom big data solutions
Flink, Pinot, Solr, Elasticsearch
with
> ASF projects. I'd want to copy the header pretty much literally about
> no endorsements, etc.
> What would you think of adding something similar to our wiki or our website?
>
>Best,
>
> Tim
--
Ken Krugler
http://www
e, Sep 12, 2023 at 10:49 AM Tim Allison <mailto:talli...@apache.org>> wrote:
>> >If Tika users will be happy to move on and drop Java 8 and/or javax. Please
>> >drop them :)))
>>
>> Fellow devs and broader Tika community, are we ok with EOL'ing Tika
) Keep Java 11 in "main"/3.x now and set the EOL for Tika 2.x/Java 8 in say
> 6 months or fewer?
>
> Thank you, all, for your feedback!
>
> Best,
>
> Tim
>
>
--
Ken Krugler
http://www.scaleunlimited.com
Custom big data solutions
Flink, Pinot, Solr, Elasticsearch
Hi Tika devs,
Check out Magika at https://github.com/google/magika
Wondering if we could leverage Deeplearing4j to run the model from that project.
— Ken
--
Ken Krugler
http://www.scaleunlimited.com
Custom big data solutions
Flink & Pinot
Hi devs,
I saw this dataset on Hugging Face, seems useful for evaluating Tika OCR…
— Ken
https://huggingface.co/datasets/pixparse/idl-wds
n that's found in most
web pages, from what I see.
--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom data mining solutions
ined.
Regards,
-- Ken
----------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr
at would take a tag like:
http://www.imdb.com/title/tt0117500/"; />
and put it into the metadata map as "og:url" =>
"http://www.imdb.com/title/tt0117500/";
Thoughts on this?
Thanks,
-- Ken
--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr
On Sep 23, 2011, at 3:24am, Jukka Zitting wrote:
> Hi,
>
> On Fri, Sep 23, 2011 at 2:23 AM, Ken Krugler
> wrote:
>> The reason why is that Open Graph uses RDFa
>
> Instead of mapping the RDFa tags to Tika's Metadata and then
> back to normal XHTML tags, we
say, me, where the end result is likely to be horribly wrong.
For better or worse, RDF has never been an itch that I've needed to scratch.
-- Ken
--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr
pom.xml
> Path: /tika-bundle-it
> Location: line 87
> Type: Maven Project Build Lifecycle Mapping Problem
>
> I looked up the problem and came up with this link:
> http://wiki.eclipse.org/M2E_plugin_execution_not_covered
>
> However, I don't understand what is actually g
hrome/trunk/src/third_party/cld/
>
> Best regards
>
> Jérôme
>
> --
> @jcharron
> http://motre.ch/
> http://jcharron.posterous.com/
> http://www.shopreflex.fr/
> http://www.staragora.com/
>
> <http://feeds.feedburner.com/~r/Bligblagblog/~6/1>
-
va language detect library
> (http://code.google.com/p/language-detection)... hoping to finish that
> soon and do a followon blog post.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Mon, Oct 24, 2011 at 9:45 AM, Ken Krugler
> wrote:
>> I took a qui
py, and every three characters triggers a new String()
-- Ken
> http://blog.mikemccandless.com
>
> On Mon, Oct 24, 2011 at 4:53 PM, Michael McCandless
> wrote:
>> On Mon, Oct 24, 2011 at 2:15 PM, Ken Krugler
>> wrote:
>>
>>> Sounds like a great idea - see the recent comment thr
; Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr
hem, I fear there may
> not be anyone left in their project who's interested in charset detectors any
> more. I'd love to be proved wrong though, if anyone has any personal contacts
> on the project they could prod about it?
>
> Nick
--
Ken Krugl
; Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattm...@nasa.gov
> WWW: http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> +++
66B, Mailstop: 171-246
> Email: chris.a.mattm...@nasa.gov
> WWW: http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
>
logspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
--
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr
1 - 100 of 597 matches
Mail list logo