It should be in the first email in this chain.
On Tue, Dec 12, 2017, 7:10 PM Ryan Blue wrote:
> Great. What's the JIRA issue?
>
> On Mon, Dec 11, 2017 at 8:12 PM, Jason White
> wrote:
>
>> Yes, the fix has been merged at should make it into the 2.3 release.
>>
2017 at 12:59 PM, Jason White
> wrote:
>
>> It doesn't look like the insert command has any metrics in it. I don't see
>> any commands with metrics, but I could be missing something.
>
>
>>
>>
>>
>>
It doesn't look like the insert command has any metrics in it. I don't see
any commands with metrics, but I could be missing something.
--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
-
To unsubscribe e
I think the difference lies somewhere in here:
- RDD writes are done with SparkHadoopMapReduceWriter.executeTask, which
calls outputMetrics.setRecordsWritten
- DF writes are done with InsertIntoHadoopFsRelationCommand.run ? Which I'm
not entirely sure how it works.
executeTask appears to be run on
I'd like to use the SparkListenerInterface to listen for some metrics for
monitoring/logging/metadata purposes. The first ones I'm interested in
hooking into are recordsWritten and bytesWritten as a measure of throughput.
I'm using PySpark to write Parquet files from DataFrames.
I'm able to extrac
Have you looked at t-digests?
Calculating percentiles (including medians) is something that is inherently
difficult/inefficient to do in a distributed system. T-digests provide a
useful probabilistic structure to allow you to compute any percentile with a
known (and tunable) margin of error.
http
Thanks for pointing to those JIRA tickets, I hadn't seen them. Encouraging
that they are recent. I hope we can find a solution there.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Why-are-DataFrames-always-read-with-nullable-True-tp21207p21218.html
S
If I create a dataframe in Spark with non-nullable columns, and then save
that to disk as a Parquet file, the columns are properly marked as
non-nullable. I confirmed this using parquet-tools. Then, when loading it
back, Spark forces the nullable back to True.
https://github.com/apache/spark/blob/
Continuing to dig, I encountered:
https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralExpressionSuite.scala#L125
// TODO(davies): add tests for ArrayType, MapType and StructType
I guess others have thought of this already, jus
It seems that `functions.lit` doesn't support ArrayTypes. To reproduce:
org.apache.spark.sql.functions.lit(2 :: 1 :: Nil)
java.lang.RuntimeException: Unsupported literal type class
scala.collection.immutable.$colon$colon List(2, 1)
at
org.apache.spark.sql.catalyst.expressions.Literal$.apply(lit
Compiling with `build/mvn -Pyarn -Phadoop-2.4 -Phive -Dhadoop.version=2.4.0
-DskipTests clean package` followed by `python/run-tests` seemed to do the
trick! Thanks!
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-run-PySpark-tests-tp16357p16362
Hi,
I'm trying to finish up a PR (https://github.com/apache/spark/pull/10089)
which is currently failing PySpark tests. The instructions to run the test
suite seem a little dated. I was able to find these:
https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals
http://spark.apache.org/
12 matches
Mail list logo