You can file a feature request at
https://issues.apache.org/jira/projects/SPARK/
As a workaround you can create a user defined function like so:
https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1526931011080774/2518747644544276/6320440561800420/late
-deep-learning/blob/f088de45daec06865ac02a9ec1323eb2c9eebb89/src/main/scala/com/databricks/sparkdl/ImageUtils.scala
You can reuse this code potentially.
Richard Garris
Principal Architect
Databricks, Inc
650.200.0840
rlgar...@databricks.com
On December 17, 2017 at 3:12:41 PM, Don Drake (dondr
storing
it as a vector or Array vs a large Java class object?
That might be the more prudent approach.
-RG
Richard Garris
Principal Architect
Databricks, Inc
650.200.0840
rlgar...@databricks.com
On December 14, 2017 at 10:23:00 AM, Marcelo Vanzin (van...@cloudera.com)
wrote:
This sounds like
Hi Frank,
Two suggestions
1. I would recommend caching the corpus prior to running LDA
2. If you are using EM I would tweak the sample size using the
setMiniBatchFraction
parameter to decrease the sample per iteration.
-Richard
On Tue, Sep 20, 2016 at 10:27 AM, Frank Zhang <
dataminin...@yahoo