Hi Alexey,

I am stuck on the preprocessing step itself as I am not able to find any api 
which takes the sentence , reads the tokens and calculate their count whereas 
scikit-learn provides the apis out of the box.

I am attaching the sample data that I need to categorize on the basis of user 
experience. Please find the python code snippet below:

from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
classifier = MultinomialNB();
vect=CountVectorizer();
counts=vect.fit_transform(["pizza was soft, very nice"," good ambience and 
excellent service","tool a long time, service needs improvement","toppings were 
very less, but bread was excellent"]) ;
counts=vect.fit_transform(comment);
targets = ['Good Experience','Good Experience','Bad Experience','Good 
Experience'];
classifier.fit(counts,targets);
predictComments = [“soft bread, nice toppings”]
predictData=vect.transform(predictComments);
predictions = classifier.predict(predictData)
print(predictions);


Thanks,
Priya


From: Alexey Zinoviev <[email protected]>
Sent: Sunday, September 6, 2020 6:41 PM
To: Igor Belyakov <[email protected]>
Cc: user <[email protected]>
Subject: Re: Preprocessing of data to use in Naive-Bayes

Very interesting case!

We have 3 different implementations for NaiveBayes algorithm
https://apacheignite.readme.io/docs/naive-bayes<https://urldefense.proofpoint.com/v2/url?u=https-3A__apacheignite.readme.io_docs_naive-2Dbayes&d=DwMFaQ&c=ObqWq9831a7badpzAhIKIA&r=qixDeHnSzhtciDY_pRHc4x12Ip0suDtJCZ5Ce1zlWfQ&m=s_IECR0VZUJ9ds7ehfpq8i3L0GTFiHRJ3ghViHS6dE8&s=oCy265A-SLfh0-HlWoiLAaoxQoXI4w6qOJ_BgZh66Dg&e=>

I suppose that this is the best for this task 
https://apacheignite.readme.io/docs/naive-bayes#discrete-bernoulli-naive-bayes<https://urldefense.proofpoint.com/v2/url?u=https-3A__apacheignite.readme.io_docs_naive-2Dbayes-23discrete-2Dbernoulli-2Dnaive-2Dbayes&d=DwMFaQ&c=ObqWq9831a7badpzAhIKIA&r=qixDeHnSzhtciDY_pRHc4x12Ip0suDtJCZ5Ce1zlWfQ&m=s_IECR0VZUJ9ds7ehfpq8i3L0GTFiHRJ3ghViHS6dE8&s=S0CrU7joi3OwZA5W7BunClUM8cv-m2HtQziDPhuDtlg&e=>
Data should be prepared as Vectors in Ignite Cache to start training.

Dear Priya Yadav, could you please provide code or pseudocode with how you 
populate your Ignite cache with sentences data, a few sentences will be useful 
too.
Also will be useful, how could you solve this task in scikit-learn, I'll try to 
help with the preprocessing code for this case.

Sincerely yours,
       Alexey

пт, 4 сент. 2020 г. в 19:40, Igor Belyakov 
<[email protected]<mailto:[email protected]>>:
Alexey,

Do you have any thoughts regarding that?

Igor

On Fri, Sep 4, 2020 at 10:03 AM Priya Yadav 
<[email protected]<mailto:[email protected]>> wrote:
Hi,


Problem Statement: I have a feedback sentences having words separated by spaces 
like normal English sentences. Using these sentences I need to classify into 
categories based on some keywords. How should I preprocess my data in order to 
use it in Naive-Bayes.

Any leads would be helpful.

Thanks in advance.

This email and any files transmitted with it are confidential, proprietary and 
intended solely for the individual or entity to whom they are addressed. If you 
have received this email in error please delete it immediately.
This email and any files transmitted with it are confidential, proprietary and 
intended solely for the individual or entity to whom they are addressed. If you 
have received this email in error please delete it immediately.

Attachment: FeedbackData
Description: FeedbackData

Reply via email to