Re: Prediction using Classification with text attributes in Apache Spark MLLib

2017-10-20 Thread lmk
Trying to improve the old solution. Do we have a better text classifier now in Spark Mllib? Regards, lmk -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr

Re: Bad Digest error while doing aws s3 put

2016-02-08 Thread lmk
Hi Dhimant, As I had indicated in my next mail, my problem was due to disk getting full with log messages (these were dumped into the slaves) and did not have anything to do with the content pushed into s3. So, looks like this error message is very generic and is thrown for various reasons. You may

Re: SchemaRDD saveToCassandra

2014-09-16 Thread lmk
Hi Michael, Please correct me if I am wrong. The error seems to originate from spark only. Please have a look at the stack trace of the error which is as follows: [error] (run-main-0) java.lang.NoSuchMethodException: Cannot resolve any suitable constructor for class org.apache.spark.sql.catalyst.e

SchemaRDD saveToCassandra

2014-09-11 Thread lmk
cassandra just like the regular rdd? If that is not possible, is there any way to convert the schema RDD to a regular RDD ? Please advise. Regards, lmk -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-saveToCassandra-tp13951.html Sent from the

Re: Apache Spark- Cassandra - NotSerializable Exception while saving to cassandra

2014-08-27 Thread lmk
Hi Yana I have done take and confirmed existence of data..Also checked that it is getting connected to Cassandra.. That is why I suspect that this particular rdd is not serializable.. Thanks, Lmk On Aug 28, 2014 5:13 AM, "Yana [via Apache Spark User List]" < ml-node+s1001560n12960...@

NotSerializableException while doing rdd.saveToCassandra

2014-08-27 Thread lmk
che.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: org.apache.spark.SparkConf data.saveToCassandra("test", "test_data", Seq("x1", "x2", "x3", "x4", "x5", "x6&quo

Re: Bad Digest error while doing aws s3 put

2014-08-07 Thread lmk
This was a completely misleading error message.. The problem was due to a log message getting dumped to the stdout. This was getting accumulated in the workers and hence there was no space left on device after some time. When I re-tested with spark-0.9.1, the saveAsTextFile api threw "no space le

Re: Bad Digest error while doing aws s3 put

2014-08-05 Thread lmk
files when the cluster is of 4 m3.2xlarge slaves it throws Bad Digest error after writing 86/100 files when the cluster is of 5 m3.2xlarge slaves it succeeds writing all the 100 files when the cluster is of 6 m3.2xlarge slaves.. Please clarify. Regards, lmk -- View this message in context

Re: Bad Digest error while doing aws s3 put

2014-08-04 Thread lmk
m3.2xlarge slaves to 6), or reduce the data size. Is there a possibility that the data is getting corrupt when the load increases? Please advice. I am stuck with this problem for the past couple of weeks. Thanks, lmk -- View this message in context: http://apache-spark-user-list.1001560.n

Re: Bad Digest error while doing aws s3 put

2014-08-04 Thread lmk
Anyone has any thoughts on this? Regards, lmk -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Bad-Digest-error-while-doing-aws-s3-put-tp10036p11313.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Bad Digest error while doing aws s3 put

2014-07-28 Thread lmk
ing to some partitions only, say while writing to 240 partitions, it might succeed for 156 files and then it will start throwing the Bad Digest Error and then it hangs. Please advise. Regards, lmk -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Bad-Digest-er

Re: Bad Digest error while doing aws s3 put

2014-07-25 Thread lmk
Can someone look into this and help me resolve this error pls.. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Bad-Digest-error-while-doing-aws-s3-put-tp10036p10644.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: save to HDFS

2014-07-24 Thread lmk
Thanks Akhil. I was able to view the files. Actually I was trying to list the same using regular ls and since it did not show anything I was concerned. Thanks for showing me the right direction. Regards, lmk -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com

Re: save to HDFS

2014-07-24 Thread lmk
hdfs://masteripaddress:9000/root/test-app/test1/ after I login to the cluster? Thanks, lmk -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/save-to-HDFS-tp10578p10581.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

save to HDFS

2014-07-24 Thread lmk
ccessfully and log says that save is complete also. But I am not able to find the file I have saved anywhere. Is there a way I can access this file? Pls advice. Regards, lmk -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/save-to-HDFS-tp10578.html Sent from

Bad Digest error while doing aws s3 put

2014-07-17 Thread lmk
d the same command to write similar content with lesser data to s3 without any problem. When I googled this error message, they say it might be due to md5 checksum mismatch. But will this happen due to load? Regards, lmk -- View this message in context: http://apache-spark-user-list.1001560.n3.

RE: Prediction using Classification with text attributes in Apache Spark MLLib

2014-06-26 Thread lmk
Thanks Alexander, That gave me a clear idea of what I can look for in MLLib. Regards, lmk -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Prediction-using-Classification-with-text-attributes-in-Apache-Spark-MLLib-tp8166p8395.html Sent from the Apache Spark

RE: Prediction using Classification with text attributes in Apache Spark MLLib

2014-06-25 Thread lmk
Hi Alexander, Just one more question on a related note. Should I be following the same procedure even if my data is nominal (categorical), but having a lot of combinations? (In Weka I used to have it as nominal data) Regards, -lmk -- View this message in context: http://apache-spark-user-list

RE: Prediction using Classification with text attributes in Apache Spark MLLib

2014-06-24 Thread lmk
? Thanks, lmk -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Prediction-using-Classification-with-text-attributes-in-Apache-Spark-MLLib-tp8166p8168.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Prediction using Classification with text attributes in Apache Spark MLLib

2014-06-24 Thread lmk
Hi, I am trying to predict an attribute with binary value (Yes/No) using SVM. All my attributes which belong to the training set are text attributes. I understand that I have to convert my outcome as double (0.0/1.0). But I donot understand how to deal with my explanatory variables which are also

Re: Can this be done in map-reduce technique (in parallel)

2014-06-05 Thread lmk
Hi Cheng, Thanks a lot. That solved my problem. Thanks again for the quick response and solution. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-this-be-handled-in-map-reduce-using-RDDs-tp6905p7047.html Sent from the Apache Spark User List mailing list

Re: Can this be done in map-reduce technique (in parallel)

2014-06-05 Thread lmk
Hi Cheng, Sorry Again. In this method, i see that the values for a <- positions.iterator b <- positions.iterator always remain the same. I tried to do a b <- positions.iterator.next, it throws an error: value filter is not a member of (Double, Double) Is there something I

Re: Can this be done in map-reduce technique (in parallel)

2014-06-05 Thread lmk
Hi Cheng, Thank you for your response. While I tried your solution, .mapValues { positions => for { a <- positions.iterator b <- positions.iterator if lessThan(a, b) && distance(a, b) < 100 } yield { (a, b) } } I got the result

Re: Can this be done in map-reduce technique (in parallel)

2014-06-04 Thread lmk
Hi Oleg/Andrew, Thanks much for the prompt response. We expect thousands of lat/lon pairs for each IP address. And that is my concern with the Cartesian product approach. Currently for a small sample of this data (5000 rows) I am grouping by IP address and then computing the distance between lat/

Can this be done in map-reduce technique (in parallel)

2014-06-04 Thread lmk
Hi, I am a new spark user. Pls let me know how to handle the following scenario: I have a data set with the following fields: 1. DeviceId 2. latitude 3. longitude 4. ip address 5. Datetime 6. Mobile application name With the above data, I would like to perform the following steps: 1. Collect all