spark read from http endpoint?

2016-07-07 Thread Robert Towne
Do any of the Spark SQL 1.x or 2.0 api’s allow reading from a rest endpoint consuming json or xml? For example, in the 2.0 context I’ve tried the following attempts with varying errors: val conf = new SparkConf().setAppName("http test").setMaster("local[2]") val builder = SparkSession.builder.c

Apache Arrow + Spark examples?

2016-02-23 Thread Robert Towne
I have been reading some of the news this week about Apache Arrow as a new top level project. It appears to be a common data layer between Spark and other systems (Cassandra, Drill, Impala, etc). Has anyone seen any sample Spark code that integrates with Arrow? Thanks

Re: Spark Streaming having trouble writing checkpoint

2015-12-14 Thread Robert Towne
I forgot to include the data node logs for this time period: 2015-12-14 00:14:52,836 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: server51:50010:DataXceiver error processing unknown operation src: /127.0.0.1:39442 dst: /127.0.0.1:50010 java.io.EOFException at java.io.DataInputStream.

Spark Streaming having trouble writing checkpoint

2015-12-14 Thread Robert Towne
I have a Spark Streaming app (1.5.2 compile for hadoop 2.6) that occasionally has problem writing its checkpoint file. This is a YARN (yarn cluster) app running as user mapred. What I see in my streaming app logs are: App log App log 15/12/15 00:14:08 server51: /mnt/hdfs/hdfs01/data4/yarn-lo

Re: Problems w/YARN Spark Streaming app reading from Kafka

2015-12-14 Thread Robert Towne
Cody, sorry I didn’t get back sooner, I never saw the response pass by. I was looking at the spark ui. I’ll see if I can recreate the issue w/version 1.5.2. Thanks.. From: Cody Koeninger mailto:c...@koeninger.org>> Date: Friday, October 16, 2015 at 12:48 To: robert towne mailto:rob

Dynamic Allocation & Spark Streaming

2015-10-19 Thread robert towne
I have watched a few videos from Databricks/Andrew Or around the Spark 1.2 release and it seemed that dynamic allocation was not yet available for Spark Streaming. I now see SPARK-10955 which is tied to 1.5.2 and allows disabling of Spark Streami

Dynamic Allocation & Spark Streaming

2015-10-19 Thread robert towne
I have watched a few videos from Databricks/Andrew Or around the Spark 1.2 release and it seemed that dynamic allocation was not yet available for Spark Streaming. I now see SPARK-10955 which is tied to 1.5.2 and allows disabling of Spark Streami

Problems w/YARN Spark Streaming app reading from Kafka

2015-10-16 Thread Robert Towne
I have a Spark Streaming app that reads using a reciever-less connection ( KafkaUtils.createDirectStream) with an interval of 1 minute. For about 15 hours it was running fine, ranging in input size of 3,861,758 to 16,836 events. Then about 3 hours ago, every minute batch brought in the same numb

how to clear state in Spark Streaming based on emitting

2015-06-09 Thread Robert Towne
With Spark Streaming, I am maintaining a state (updateStateByKey every 30s) and emitting to file parts of that state that have been closed every 5 minutes, but only care about the last state collected. In 5m, there will be 10 updateStateByKey iterations called. For example: … val ssc = new St