bahadirborasahin commented on PR #14549: URL: https://github.com/apache/lucene/pull/14549#issuecomment-2838277355
@stefanvodita > I think we need to know more around how the list came about and we need some evidence that the new list is better. I'm also unable to tell if the new words are reasonably considered stop words, but maybe a Turkish speaker could weigh in. I had concerns earlier about malformed entries like `keţke` or `onlarýn` yet they seem to be fixed in this revision. The suggested words make sense in Turkish (as a stopword), however, I find chance of occurrence of some of them very low if that matters, `cuppadak`, `cumburlok`, `cumbadak`? @HakanBayazitHabes > Based analysis of multiple Turkish NLP resources Can we give references to those studies? I am specifically wondering whether we should add all kinds of adverbs/*zarf* as stopwords as they can potentially provide context? (for example `doğru`/*accurate*/*true*/*factual*) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
