hi all..
I am new to machine learning.
i am working on recomender system. for training dataset RMSE is 0.08 while on
test data its is 2.345
whats conclusion and what steps can i take to improve
Sent from Samsung tablet
Thank you Takeshi.As far as I see from the code pointed, the default number of bytes to pack in a partition is set to 128MB - size of the parquet block size. Daniel,It seems you do have a need to modify the number of bytes you want to pack per partition. I am curious to know the scenario. Please sh
Hi,
1. Can we use Spark Structured Streaming for stateless transformations just
like we would do with DStreams or Spark Structured Streaming is only meant
for stateful computations?
2. When we use groupBy and Window operations for event time processing and
specify a watermark does this mean the t
unsubscribe
From: Abir Chakraborty [mailto:abi...@247-inc.com]
Sent: Saturday, May 20, 2017 1:29 AM
To: user@spark.apache.org
Subject: unsubscribe
I think this document points to a logic here:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L418
This logic merge small files into a partition and you can control this
threshold via `spark.sql.files.maxPartitionBytes`.
On 20 May 2017, at 01:44, Bajpai, Amit X. -ND
mailto:n...@disney.com>> wrote:
Hi,
I have a hive external table with the S3 location having no files (but the S3
location directory does exists). When I am trying to use Spark SQL to count the
number of records in the table it is throwing error s