from:"geoHeil"

Off heap memory settings and Tungsten

2017-04-22 Thread geoHeil

Hi, I wonder when to enable spark's off heap settings. Shouldn't tungsten enable these automatically in 2.1? http://stackoverflow.com/questions/43330902/spark-off-heap-memory-config-and-tungsten Regards, Georg -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com

Re: Case class with POJO - encoder issues

2017-03-10 Thread geoHeil

http://stackoverflow.com/questions/36648128/how-to-store-custom-objects-in-a-dataset describes the Problem. Actually, I have the same Problem. Is there a simple way to build such an Encoder which serializes into multiple fields? I would not want to replicate the Whole JTS geometry class hierarchy o

graphframes stateful motif

2017-01-30 Thread geoHeil

Starting out with graph frames I would like to understand stateful motifs better. There is a nice example in the documentation. How can I explicitly return the counts? How could it be extended to count - the friends of each vertex with age > 30 - the percentage of friendsGreater30 / allFriends

Migrate spark sql to rdd for better performance

2017-01-03 Thread geoHeil

I optimized a spark sql script but have come to the conclusion that the sql api is not ideal as the tasks which are generated are slow and require too much shuffling. So the script should be converted to rdd http://stackoverflow.com/q/41445571/2587904 How can I formulate this more efficient usi

ThreadPoolExecutor - slow spark job

2016-12-23 Thread geoHeil

it up here http://stackoverflow.com/questions/41298550/spark-threadpoolexecutor-very-often-called-in-tasks as well with a minimal example of https://github.com/geoHeil/sparkContrastCoding Looking forward to any input to speed up this spark job. cheers, Georg -- View this message in context

Spark kryo serialization register Datatype[]

2016-12-21 Thread geoHeil

To force spark to use kryo serialization I set spark.kryo.registrationRequired to true. Now spark complains that: Class is not registered: org.apache.spark.sql.types.DataType[] is not registered. How can I fix this? So far I could not successfully register this class. -- View this message in co

Dynamic spark sql

2016-12-12 Thread geoHeil

Hi I am curious how to dynamically generate spark sql in the scala api. http://stackoverflow.com/q/41102347/2587904 >From this list val columnsFactor = Seq("bar", "baz") I want to generate multiple withColumn statements dfWithNewLabels.withColumn("replace", lit(null: String)) .withColu

Debugging persistence of custom estimator

2016-12-07 Thread geoHeil

Hi, I am writing my first own spark pipeline components with persistence and have troubles debugging them. https://github.com/geoHeil/sparkCustomEstimatorPersistenceProblem holds a minimal example where `sbt run` and `sbt test` result in "different" errors. When I tried to debug it in

code generation memory issue

2016-11-21 Thread geoHeil

I am facing a strange issue when trying to correct some errors in my raw data The problem is reported here: https://issues.apache.org/jira/browse/SPARK-18532 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/code-generation-memory-issue-tp28114.html Sent from

Fill nan with last (good) value

2016-11-17 Thread geoHeil

How can I fill nan values with the last (good) value? For me, it would be enough to fill it with the previous value of a window function. So far I could it not get to work as my window function only returns nan values. Here is code for a minimal example: http://stackoverflow.com/questions/40592207

Off heap memory settings and Tungsten

Re: Case class with POJO - encoder issues

graphframes stateful motif

Migrate spark sql to rdd for better performance

ThreadPoolExecutor - slow spark job

Spark kryo serialization register Datatype[]

Dynamic spark sql

Debugging persistence of custom estimator

code generation memory issue

Fill nan with last (good) value

10 matches

Site Navigation

Mail list logo

Footer information