bahadirborasahin commented on PR #14549:
URL: https://github.com/apache/lucene/pull/14549#issuecomment-2838277355

   @stefanvodita 
   
   > I think we need to know more around how the list came about and we need 
some evidence that the new list is better. I'm also unable to tell if the new 
words are reasonably considered stop words, but maybe a Turkish speaker could 
weigh in.
   
   I had concerns earlier about malformed entries like `keţke` or `onlarýn` yet 
they seem to be fixed in this revision. The suggested words make sense in 
Turkish (as a stopword), however, I find chance of occurrence of some of them 
very low if that matters, `cuppadak`, `cumburlok`, `cumbadak`? 
   
   @HakanBayazitHabes 
   
   > Based analysis of multiple Turkish NLP resources
   
   Can we give references to those studies? I am specifically wondering whether 
we should add all kinds of adverbs/*zarf* as stopwords as they can potentially 
provide context? (for example `doğru`/*accurate*/*true*/*factual*) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to