Fwd: Saving RDD as Kryo (broken in 2.1)

2017-06-26 Thread Alexander Krasheninnikov
.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) -- *Alexander Krasheninnikov* Head of Data Team

Saving RDD as Kryo (broken in 2.1)

2017-06-21 Thread Alexander Krasheninnikov
.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) -- *Alexander Krasheninnikov* Head of Data Team

Re: JavaDStream to Dataframe: Java

2016-06-10 Thread Alexander Krasheninnikov
Hello! While operating the JavaDStream you may use a transform() or foreach() methods, which give you an access to an RDD. JavaDStream dataFrameStream = ctx.textFileStream("source").transform(new Function2, Time, JavaRDD>() { @Override public JavaRDD call(JavaRDD incomingRdd, Time batchTim

Re: Profiling a spark job

2016-04-11 Thread Alexander Krasheninnikov
If you are profiling in standalone mode, I recommend you to try with Java Mission Control. You just need to start app with these params: -XX:+UnlockCommercialFeatures -XX:+FlightRecorder -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.port=$YOUR_PORT -Dcom.sun.management.jmxremo

Re: problem with a very simple word count program

2015-09-16 Thread Alexander Krasheninnikov
Collect all your rdds from single files into List>, then call context.union(context.emptyRdd(), YOUR_LIST); Otherwise, on greater number of elements to union, you will get stack overflow exception. On Wed, Sep 16, 2015 at 10:17 PM, Shawn Carroll wrote: > Your loop is deciding the files to proces

Terminate streaming app on cluster restart

2015-08-06 Thread Alexander Krasheninnikov
Hello, everyone! I have a case, when running standalone cluster: on master stop-all.sh/star-all.sh are invoked, streaming app loses all it's executors, but does not interrupt. Since it is a streaming app, expected to get it's results ASAP, an downtime is undesirable. Is there any workaround to

Re: How to set log level in spark-submit ?

2015-07-30 Thread Alexander Krasheninnikov
I saw such example in docs: --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file://$path_to_file" but, unfortunately, it does not work for me. On 30.07.2015 05:12, canan chen wrote: Yes, that should work. What I mean is is there any option in spark-submit command that I can specify

Re: Count of distinct values in each column

2015-07-29 Thread Alexander Krasheninnikov
I made such naive implementation: SparkConf conf =newSparkConf(); conf.setMaster("local[4]").setAppName("Stub"); finalJavaSparkContext ctx =newJavaSparkContext(conf); JavaRDD input = ctx.textFile("path_to_file"); // explode each line into list of column values JavaRDD> rowValues = input.map(new

Re: Override Logging with spark-streaming

2015-06-05 Thread Alexander Krasheninnikov
Have you tried putting this file on local disk on each of executor nodes? That worked for me. On 05.06.2015 16:56, nib...@free.fr wrote: Hello, I want to override the log4j configuration when I start my spark job. I tried : .../bin/spark-submit --class --conf "spark.executor.extraJavaOptio

Nested reduceByKeyAndWindow losing data

2015-06-05 Thread Alexander Krasheninnikov
Hello, everyone! I've experienced problem, wnen using nested reduceByKeyAndWindow. My task is to parse json-formatted events from textFileStream, and create aggregations for each field. E.g. having such input: {"type":"EventOne", "attr1":10,"attr2":20} I have projections: {"type":"EventOne",

Re: kafka + Spark Streaming with checkPointing fails to start with

2015-05-15 Thread Alexander Krasheninnikov
I had same problem. The solution, I've found was to use: JavaStreamingContext streamingContext = JavaStreamingContext.getOrCreate('checkpoint_dir', contextFactory); ALL configuration should be performed inside contextFactory. If you try to configure streamContext after ::getOrCreate, you recei

Streaming app with windowing and persistence

2015-04-27 Thread Alexander Krasheninnikov
Hello, everyone. I develop stream application, working with window functions - each window create table and perform some SQL-operations on extracted data. I met such problem: when using window operations and checkpointing, application does not start next time. Here is the code: ---