Trying to improve the old solution.
Do we have a better text classifier now in Spark Mllib?
Regards,
lmk
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr
Hi Dhimant,
As I had indicated in my next mail, my problem was due to disk getting full
with log messages (these were dumped into the slaves) and did not have
anything to do with the content pushed into s3. So, looks like this error
message is very generic and is thrown for various reasons. You may
Hi Michael,
Please correct me if I am wrong. The error seems to originate from spark
only. Please have a look at the stack trace of the error which is as
follows:
[error] (run-main-0) java.lang.NoSuchMethodException: Cannot resolve any
suitable constructor for class org.apache.spark.sql.catalyst.e
cassandra just
like the regular rdd?
If that is not possible, is there any way to convert the schema RDD to a
regular RDD ?
Please advise.
Regards,
lmk
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-saveToCassandra-tp13951.html
Sent from the
Hi Yana
I have done take and confirmed existence of data..Also checked that it is
getting connected to Cassandra.. That is why I suspect that this particular
rdd is not serializable..
Thanks,
Lmk
On Aug 28, 2014 5:13 AM, "Yana [via Apache Spark User List]" <
ml-node+s1001560n12960...@
che.spark.SparkException: Job aborted due to stage failure: Task not
serializable: java.io.NotSerializableException: org.apache.spark.SparkConf
data.saveToCassandra("test", "test_data", Seq("x1", "x2", "x3", "x4", "x5",
"x6&quo
This was a completely misleading error message..
The problem was due to a log message getting dumped to the stdout. This was
getting accumulated in the workers and hence there was no space left on
device after some time.
When I re-tested with spark-0.9.1, the saveAsTextFile api threw "no space
le
files when the cluster is of
4 m3.2xlarge slaves
it throws Bad Digest error after writing 86/100 files when the cluster is of
5 m3.2xlarge slaves
it succeeds writing all the 100 files when the cluster is of 6 m3.2xlarge
slaves..
Please clarify.
Regards,
lmk
--
View this message in context
m3.2xlarge slaves to 6), or reduce the data size.
Is there a possibility that the data is getting corrupt when the load
increases?
Please advice. I am stuck with this problem for the past couple of weeks.
Thanks,
lmk
--
View this message in context:
http://apache-spark-user-list.1001560.n
Anyone has any thoughts on this?
Regards,
lmk
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Bad-Digest-error-while-doing-aws-s3-put-tp10036p11313.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
ing to some partitions only, say while writing to 240
partitions, it might succeed for 156 files and then it will start throwing
the Bad Digest Error and then it hangs.
Please advise.
Regards,
lmk
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Bad-Digest-er
Can someone look into this and help me resolve this error pls..
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Bad-Digest-error-while-doing-aws-s3-put-tp10036p10644.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Thanks Akhil.
I was able to view the files. Actually I was trying to list the same using
regular ls and since it did not show anything I was concerned.
Thanks for showing me the right direction.
Regards,
lmk
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com
hdfs://masteripaddress:9000/root/test-app/test1/
after I login to the cluster?
Thanks,
lmk
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/save-to-HDFS-tp10578p10581.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
ccessfully and log says that save is complete
also. But I am not able to find the file I have saved anywhere. Is there a
way I can access this file?
Pls advice.
Regards,
lmk
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/save-to-HDFS-tp10578.html
Sent from
d the same command to write similar content with lesser data to s3
without any problem. When I googled this error message, they say it might be
due to md5 checksum mismatch. But will this happen due to load?
Regards,
lmk
--
View this message in context:
http://apache-spark-user-list.1001560.n3.
Thanks Alexander, That gave me a clear idea of what I can look for in MLLib.
Regards,
lmk
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Prediction-using-Classification-with-text-attributes-in-Apache-Spark-MLLib-tp8166p8395.html
Sent from the Apache Spark
Hi Alexander,
Just one more question on a related note. Should I be following the same
procedure even if my data is nominal (categorical), but having a lot of
combinations? (In Weka I used to have it as nominal data)
Regards,
-lmk
--
View this message in context:
http://apache-spark-user-list
?
Thanks,
lmk
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Prediction-using-Classification-with-text-attributes-in-Apache-Spark-MLLib-tp8166p8168.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hi,
I am trying to predict an attribute with binary value (Yes/No) using SVM.
All my attributes which belong to the training set are text attributes.
I understand that I have to convert my outcome as double (0.0/1.0). But I
donot understand how to deal with my explanatory variables which are also
Hi Cheng,
Thanks a lot. That solved my problem.
Thanks again for the quick response and solution.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Can-this-be-handled-in-map-reduce-using-RDDs-tp6905p7047.html
Sent from the Apache Spark User List mailing list
Hi Cheng,
Sorry Again.
In this method, i see that the values for
a <- positions.iterator
b <- positions.iterator
always remain the same. I tried to do a b <- positions.iterator.next, it
throws an error: value filter is not a member of (Double, Double)
Is there something I
Hi Cheng,
Thank you for your response. While I tried your solution,
.mapValues { positions =>
for {
a <- positions.iterator
b <- positions.iterator
if lessThan(a, b) && distance(a, b) < 100
} yield {
(a, b)
}
}
I got the result
Hi Oleg/Andrew,
Thanks much for the prompt response.
We expect thousands of lat/lon pairs for each IP address. And that is my
concern with the Cartesian product approach.
Currently for a small sample of this data (5000 rows) I am grouping by IP
address and then computing the distance between lat/
Hi,
I am a new spark user. Pls let me know how to handle the following scenario:
I have a data set with the following fields:
1. DeviceId
2. latitude
3. longitude
4. ip address
5. Datetime
6. Mobile application name
With the above data, I would like to perform the following steps:
1. Collect all
25 matches
Mail list logo