checkpointDirectory);
sparkContext.setCheckpointDir(checkpointPath);
Asher Krim
Senior Software Engineer
On Tue, May 30, 2017 at 12:37 PM, Everett Anderson wrote:
> Still haven't found a --conf option.
>
> Regarding a temporary HDFS checkpoint directory, it looks like when using
> -
e,
so any bag-of-words approach to clustering will likely fail unless you
first convert the features to a smaller and denser space
Asher Krim
Senior Software Engineer
On Wed, Mar 29, 2017 at 5:49 PM, Reth RM wrote:
> Hi Krim,
>
> The dataset that I am experimenting with is gold-trut
(LSA, LDA, document2vec, etc).
Other than that, this isn't a Spark question.
Asher Krim
Senior Software Engineer
On Fri, Mar 24, 2017 at 9:37 PM, Reth RM wrote:
> Hi,
>
> I am using spark k mean for clustering records that consist of news
> documents, vectors are created by ap
03 PM, Benjamin Kim wrote:
> Asher,
>
> You’re right. I don’t see anything but 2.11 being pulled in. Do you know
> where I can change this?
>
> Cheers,
> Ben
>
>
> On Feb 3, 2017, at 10:50 AM, Asher Krim wrote:
>
> Sorry for my persistence, but did you actually run &q
gt; Ben
>
>
> On Feb 3, 2017, at 8:16 AM, Asher Krim wrote:
>
> Did you check the actual maven dep tree? Something might be pulling in a
> different version. Also, if you're seeing this locally, you might want to
> check which version of the scala sdk your IDE is using
>
mance differences between MLeap and vanilla Spark?
What does Tensorflow support look like? I would love to serve models from a
java stack while being agnostic to what framework was used to train them.
Thanks,
Asher Krim
Senior Software Engineer
On Fri, Feb 3, 2017 at 11:53 AM, Hollin Wilkins
Did you check the actual maven dep tree? Something might be pulling in a
different version. Also, if you're seeing this locally, you might want to
check which version of the scala sdk your IDE is using
Asher Krim
Senior Software Engineer
On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim wrote:
Ben,
That looks like a scala version mismatch. Have you checked your dep tree?
Asher Krim
Senior Software Engineer
On Thu, Feb 2, 2017 at 1:28 PM, Benjamin Kim wrote:
> Elek,
>
> Can you give me some sample code? I can’t get mine to work.
>
> import org.apache.spark.
Have you tried using an alias? You should be able to replace
("dbtable”,"sometable")
with ("dbtable”,"SELECT utc_timestamp AS my_timestamp FROM sometable")
--
Asher Krim
Senior Software Engineer
On Thu, Jan 12, 2017 at 10:49 AM, Jorge Machado wrote:
> Hi Guy
gt; exception is thrown.
>>
>>
>> java.lang.UnsupportedOperationException: Pipeline write will fail on
>> this Pipeline because it contains a stage which does not implement
>> Writable. Non-Writable stage: rfc_98f8c9e0bd04 of type class
>> org.apache.spark.ml.classification.Rand
>>
>>
>> Here is my code segment.
>>
>>
>> model.write().overwrite,save
>>
>>
>> model.write().overwrite().save("path
>> model.write().overwrite().save("mypath");
>>
>>
>> How to resolve this?
>>
>> Thanks and regards!
>>
>> Minudika
>>
>>
>
--
Asher Krim
Senior Software Engineer
searched, but haven't found anything.
>
> Thanks!
> --
> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com relato.io
>
--
Asher Krim
Senior Software Engineer
rk.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
> at org.apache.spark.scheduler.Task.run(Task.scala:85)
> at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:274)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> ... 1 more
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/example-LDA-code-ClassCastException-tp28009.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
--
Asher Krim
Senior Software Engineer
We have also found LIMIT to take an unacceptable amount of time when
reading parquet formatted data from s3.
LIMIT was not strictly needed for our usecase, so we worked around it
--
Asher Krim
Senior Software Engineer
On Fri, Oct 28, 2016 at 5:36 AM, Liz Bai wrote:
> Sorry for the late re
Yes, absolutely. Take a look at:
https://spark.apache.org/docs/1.4.1/mllib-statistics.html#summary-statistics
On Fri, Aug 28, 2015 at 8:39 AM, ashensw wrote:
> Hi all,
>
> I have a dataset which consist of large number of features(columns). It is
> in csv format. So I loaded it into a spark data
Did you get a thread dump? We have experienced similar problems during
shuffle operations due to a deadlock in InetAddress. Specifically, look for
a runnable thread at something like
"java.net.Inet6AddressImpl.lookupAllHostAddr(Native
Method)".
Our "solution" has been to put a timeout around the c
:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
Thanks,
Asher Krim
16 matches
Mail list logo