I've gone with a virtual, hands-on workshop for tika-pipes for Dec 2 at noon (EST). I'll try to shorten the content (a bit) so there will be more time for chatting, but this will be similar to the initial meeting.
https://www.meetup.com/apache-tika-community/events/282123231/ On Tue, Nov 16, 2021 at 11:21 AM Tim Allison <talli...@apache.org> wrote: > > Arky, > This sounds great. Are you using the new language models I added to > Tika 2.x? Those include the languages you mention and a couple more > you requested earlier (?). > > Cheers, > > Tim > > On Wed, Nov 10, 2021 at 2:13 PM Arky <hitmana...@gmail.com> wrote: > > > > Hi, > > > > My personal interest is to get Tika to work better with Southern and > > Southeast Asian languages. > > > > Any conversation on how we could contribute corpora to help train the > > models for languages like Burmese, Thai, Khmer and Vietnamese would be > > great. > > > > > > Apart from general introductions, I would be happy to give use case of > > how downstream projects use Tika for their work to injest and extract > > data from multi-lingual documents. > > > > Cheers > > > > --arky > > > > > > > > > > > > On 11/11/21 1:18 AM, Tim Allison wrote: > > > But seriously... how about a hands-on workshop on tika-pipes for the > > > first week of December (focus on fileshare to Solr)? We can follow > > > Eric's recommendation of having a brief around the room to introduce > > > each other and then a smaller actual tutorial. > > > > > > Was the day of week/time of day ok? I realize that TWTh can be heavy > > > meeting days for some, but I also know that folks take MF off. :D > > > > > > On Tue, Nov 9, 2021 at 3:53 PM Tim Allison <talli...@apache.org> wrote: > > >> > > >> Will sign up Ken for next week....kidding. Yes, that sounds great > > >> when you're ready! > > >> > > >> On Tue, Nov 9, 2021 at 3:16 PM Ken Krugler <kkrugler_li...@transpac.com> > > >> wrote: > > >>> > > >>> Hi Tim, > > >>> > > >>> Maybe how to embed Tika in a scalable processing framework (Flink, > > >>> Spark, AWS Lambda???) to process a large corpus in parallel? > > >>> > > >>> — Ken > > >>> > > >>>> On Nov 9, 2021, at 11:00 AM, Tim Allison <talli...@apache.org> wrote: > > >>>> > > >>>> All, > > >>>> Many thanks to those who attended today. It was great to e-meet > > >>>> old friends and users from around the world. Many thanks to Lewis > > >>>> McGibbney for getting the ball rolling on these. > > >>>> Let's use this thread to discuss possible topics and scheduling for > > >>>> the next meetups? > > >>>> > > >>>> Question 1: Pace...one a month or so? > > >>>> > > >>>> Question 2: Topics? > > >>>> a) tika-pipes hands-on workshop > > >>>> b) get to know the users -- 5 minute go-around the room "this is how > > >>>> we use it; these are our pain points" > > >>>> c) ??? > > >>>> > > >>>> Again, thank you! > > >>>> > > >>>> Best, > > >>>> > > >>>> Tim > > >>> > > >>> -------------------------- > > >>> Ken Krugler > > >>> http://www.scaleunlimited.com > > >>> Custom big data solutions > > >>> Flink, Pinot, Solr, Elasticsearch > > >>> > > >>> > > >>> > >