Arky, This sounds great. Are you using the new language models I added to Tika 2.x? Those include the languages you mention and a couple more you requested earlier (?).
Cheers, Tim On Wed, Nov 10, 2021 at 2:13 PM Arky <hitmana...@gmail.com> wrote: > > Hi, > > My personal interest is to get Tika to work better with Southern and > Southeast Asian languages. > > Any conversation on how we could contribute corpora to help train the > models for languages like Burmese, Thai, Khmer and Vietnamese would be > great. > > > Apart from general introductions, I would be happy to give use case of > how downstream projects use Tika for their work to injest and extract > data from multi-lingual documents. > > Cheers > > --arky > > > > > > On 11/11/21 1:18 AM, Tim Allison wrote: > > But seriously... how about a hands-on workshop on tika-pipes for the > > first week of December (focus on fileshare to Solr)? We can follow > > Eric's recommendation of having a brief around the room to introduce > > each other and then a smaller actual tutorial. > > > > Was the day of week/time of day ok? I realize that TWTh can be heavy > > meeting days for some, but I also know that folks take MF off. :D > > > > On Tue, Nov 9, 2021 at 3:53 PM Tim Allison <talli...@apache.org> wrote: > >> > >> Will sign up Ken for next week....kidding. Yes, that sounds great > >> when you're ready! > >> > >> On Tue, Nov 9, 2021 at 3:16 PM Ken Krugler <kkrugler_li...@transpac.com> > >> wrote: > >>> > >>> Hi Tim, > >>> > >>> Maybe how to embed Tika in a scalable processing framework (Flink, Spark, > >>> AWS Lambda???) to process a large corpus in parallel? > >>> > >>> — Ken > >>> > >>>> On Nov 9, 2021, at 11:00 AM, Tim Allison <talli...@apache.org> wrote: > >>>> > >>>> All, > >>>> Many thanks to those who attended today. It was great to e-meet > >>>> old friends and users from around the world. Many thanks to Lewis > >>>> McGibbney for getting the ball rolling on these. > >>>> Let's use this thread to discuss possible topics and scheduling for > >>>> the next meetups? > >>>> > >>>> Question 1: Pace...one a month or so? > >>>> > >>>> Question 2: Topics? > >>>> a) tika-pipes hands-on workshop > >>>> b) get to know the users -- 5 minute go-around the room "this is how > >>>> we use it; these are our pain points" > >>>> c) ??? > >>>> > >>>> Again, thank you! > >>>> > >>>> Best, > >>>> > >>>> Tim > >>> > >>> -------------------------- > >>> Ken Krugler > >>> http://www.scaleunlimited.com > >>> Custom big data solutions > >>> Flink, Pinot, Solr, Elasticsearch > >>> > >>> > >>> >