Hi Riccardo,
Right now, Spark does not support low-latency predictions in Production.
MLeap is an alternative and it's been used in many scenarios. But it's good
to see that Spark Community has decided to provide such support.
On Wed, Jan 23, 2019 at 7:53 AM Riccardo Ferrari wrote:
> Felix, tha
ache.hadoop.util.ShutdownHookManager"
> thread, but I don't see that one in your list.
>
> On Wed, Jan 16, 2019 at 12:08 PM Pola Yao wrote:
> >
> > Hi Marcelo,
> >
> > Thanks for your response.
> >
> > I have dumped the threads on the server where
lication?
On Wed, Jan 16, 2019 at 8:31 AM Marcelo Vanzin wrote:
> If System.exit() doesn't work, you may have a bigger problem
> somewhere. Check your threads (using e.g. jstack) to see what's going
> on.
>
> On Wed, Jan 16, 2019 at 8:09 AM Pola Yao wrote:
> >
&g
or if
> something is creating a non-daemon thread that stays alive somewhere,
> you'll see that.
>
> Or you can force quit with sys.exit.
>
> On Tue, Jan 15, 2019 at 1:30 PM Pola Yao wrote:
> >
> > I submitted a Spark job through ./spark-submit command, the code wa
I submitted a Spark job through ./spark-submit command, the code was
executed successfully, however, the application got stuck when trying to
quit spark.
My code snippet:
'''
{
val spark = SparkSession.builder.master(...).getOrCreate
val pool = Executors.newFixedThreadPool(3)
implicit val xc = E
Hi Spark Comminuty,
I was using XGBoost-spark to train a machine learning model. The dataset
was not large (around 1G). And I used the following command to submit my
application:
'''
./bin/spark-submit --master yarn --deploy-mode client --num-executors 50
--executor-cores 2 --executor-memory 3g -
Hello Spark Community,
I have a dataset of size 20G, 20 columns. Each column is categorical, so I
applied string-indexer and one-hot-encoding on every column. After, I
applied vector-assembler on all the newly derived columns to form a feature
vector for each record, and then feed the feature vect
Hi Comminuty,
I have a 1T dataset which contains records for 50 users. Each user has 20G
data averagely.
I wanted to use spark to train a machine learning model (e.g., XGBoost tree
model) for each user. Ideally, the result should be 50 models. However,
it'd be infeasible to submit 50 spark jobs