https://issues.apache.org/jira/browse/SPARK-23576 ?
--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Hi Steffen,
Thanks for sharing your results about MLlib — this sounds like a useful tool.
However, I wanted to point out that some of the results may be expected for
certain machine learning algorithms, so it might be good to design those tests
with that in mind. For example:
> - The classific
Hello Friends,
I’ve encountered a bug where spark silently corrupts data when reading from a
parquet hive table where the table schema does not match the file schema. I’d
like to give a shot at adding some extra validations to the code to handle this
corner case and I was wondering if anyone h
Thanks for your responses Saisai and Marco.
I agree that "rename" operation can be time-consuming on object storage,
which can potentially delay the shutdown.
I also agree that customers/users have a way to use log appenders to write
log files and then send them along with Yarn application logs b
FYI. The Spark github sync was 10 hour behind this morning. You might get
fail merges because of this. Just triggered a re-sync. It should work now.
Thanks,
Xiao
Certainly if your tests have found a problem, open a JIRA and/or pull
request with the fix and relevant tests.
More tests generally can't hurt, though I guess we should maybe have a look
at them first. If they're a lot of boilerplate and covering basic functions
already covered by other tests, the
Given popularity of related SO questions:
- https://stackoverflow.com/q/41670103/1560062
- https://stackoverflow.com/q/42465568/1560062
- https://stackoverflow.com/q/41670103/1560062
it is probably more "nobody thought about asking", than "it is not used
often".
On Wed, 22 Aug 2018 at
Hi Reynold/Ivan,
People familiar with pandas and R dataframes will likely have used the
dataframe "melt" idiom, which is the functionality I believe you are
referring to:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt.html
I have had to write this function myself in my own wor
Dear developers,
I am writing you because I applied an approach for the automated testing
of classification algorithms to Spark MLlib and would like to forward
the results to you.
The approach is a combination of smoke testing and metamorphic testing.
The smoke tests try to find problems by
Manu,
thank you very much for your response.
1. Your post helps to further optimize the spark jobs for wide data.
(https://medium.com/@manuzhang/the-hidden-cost-of-spark-withcolumn-8ffea517c015)
The suggested change of code:
df.select(df.columns.map { col =>
df(col).isNotNull
}: _*)
provides
I agree with Saisai. You can also configure log4j to append anywhere else
other than the console. Many companies have their system for collecting and
monitoring logs and they just customize the log4j configuration. I am not
sure how needed this change would be.
Thanks,
Marco
Il giorno mer 22 ago
11 matches
Mail list logo