Hi, could you please clarify if you are running a YARN cluster when you see
this problem? I tried on Spark standalone and could not reproduce. If
it's on a YARN cluster, please file a JIRA and I can try to investigate
further.
Thanks,
Bryan
On Sat, Dec 15, 2018 at 3:42 AM 李斌松 wrote:
> spark2.
I was working with custom spark listener library. There, I am not able to
figure out a way to break into the details of task. I only have a listener
which runs on task start, But I want to calculate the time my executor took
to read input data from remote data source for that task, but as spark doe
I actually tried that first. I moved away from it because the algorithm
needs to evaluate all records for all models, for instance, a model trained
on (2,4) needs to be evaluated on a record whose true label is 8. I found
that if I apply the filter in the label-creation transformer, then a record
w
In your custom transformer that produces labels, can you filter null
labels? A transformer doesn't always need to do 1:1 mapping.
On Thu, Jan 10, 2019, 7:53 AM Patrick McCarthy
I'm trying to implement an algorithm on the MNIST digits that runs like so:
>
>
>- for every pair of digits (0,1), (
Hi Tzahi,
by using GROUP BY without any aggregate columns are you just trying to find
out the DISTINCT of the columns ?
Also it may be of help (I do not know whether the SQL optimiser
automatically takes care of this) to have the LEFT JOIN on a smaller data
set by having joined on the device_id b
I'm trying to implement an algorithm on the MNIST digits that runs like so:
- for every pair of digits (0,1), (0,2), (0,3)... assign a 0/1 label to
the digits and build a LogisticRegression Classifier -- 45 in total
- Fit every classifier on the test set separately
- Aggregate the res
cala>
spark.read.schema(StructType(Seq(StructField("_1",StringType,false),
StructField("_2",StringType,true.parque
("hdfs://---/MY_DIRECTORY/*_1=201812030900*").show()
+++
| _1| _2|
+++
|null|ba1ca2dc033440125...|
|null|ba1ca2dc033440125...|
Hi Gourav,
My version of Spark is 2.1.
The data is stored on S3 directory in parquet format.
I sent you an example for a query I would like to run (the raw_e table is
stored as parquet files and event_day is the partitioned filed):
SELECT *
FROM (select *
from parquet_files.raw_e as re