Re: Get filename in Spark Streaming

2015-02-06 Thread Subacini B
3.nabble.com/access-hdfs-file-name-in-map-td6551.html > > -- > Emre Sevinç > > > On Fri, Feb 6, 2015 at 2:16 AM, Subacini B wrote: > >> Hi All, >> >> We have filename with timestamp say ABC_1421893256000.txt and the >> timestamp needs to be extrac

Get filename in Spark Streaming

2015-02-05 Thread Subacini B
Hi All, We have filename with timestamp say ABC_1421893256000.txt and the timestamp needs to be extracted from file name for further processing.Is there a way to get input file name picked up by spark streaming job? Thanks in advance Subacini

Improve performance using spark streaming + sparksql

2015-01-24 Thread Subacini B
Hi All, I have a cluster of 3 nodes [each 8 core/32 GB memory]. My program uses Spark Streaming with Spark SQL[Spark 1.1] and writes incoming JSON to elasticsearch, Hbase. Below is my code and i receive json files [input data varies from 30MB to 300 MB] every 10 seconds. Irrespective of 3 nodes

Re: SchemaRDD to Hbase

2014-12-20 Thread Subacini B
Hi , Can someone help me , Any pointers would help. Thanks Subacini On Fri, Dec 19, 2014 at 10:47 PM, Subacini B wrote: > Hi All, > > Is there any API that can be used directly to write schemaRDD to HBase?? > If not, what is the best way to write schemaRDD to HBase. > > Thanks > Subacini >

SchemaRDD to Hbase

2014-12-19 Thread Subacini B
Hi All, Is there any API that can be used directly to write schemaRDD to HBase?? If not, what is the best way to write schemaRDD to HBase. Thanks Subacini

Processing multiple request in cluster

2014-09-24 Thread Subacini B
hi All, How to run concurrently multiple requests on same cluster. I have a program using *spark streaming context *which reads* streaming data* and writes it to HBase. It works fine, the problem is when multiple requests are submitted to cluster, only first request is processed as the entire clu

Re: Spark SQL - groupby

2014-07-03 Thread Subacini B
Hi, Can someone provide me pointers for this issue. Thanks Subacini On Wed, Jul 2, 2014 at 3:34 PM, Subacini B wrote: > Hi, > > Below code throws compilation error , "not found: *value Sum*" . Can > someone help me on this. Do i need to add any jars or imports ? e

Shark Vs Spark SQL

2014-07-02 Thread Subacini B
Hi, http://mail-archives.apache.org/mod_mbox/spark-user/201403.mbox/%3cb75376b8-7a57-4161-b604-f919886cf...@gmail.com%3E This talks about Shark backend will be replaced with Spark SQL engine in future. Does that mean Spark will continue to support Shark + Spark SQL for long term? OR After some p

Spark SQL - groupby

2014-07-02 Thread Subacini B
Hi, Below code throws compilation error , "not found: *value Sum*" . Can someone help me on this. Do i need to add any jars or imports ? even for Count , same error is thrown val queryResult = sql("select * from Table) queryResult.groupBy('colA)('colA,*Sum*('colB) as 'totB).aggregate(*Sum*

Spark SQL : Join throws exception

2014-07-01 Thread Subacini B
Hi All, Running this join query sql("SELECT * FROM A_TABLE A JOIN B_TABLE B WHERE A.status=1").collect().foreach(println) throws Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1.0:3 failed 4 times, most recent failure: Exception failure in T

Re: Spark Worker Core Allocation

2014-06-08 Thread Subacini B
se > it will instead try to take all resource from a few nodes. > On Jun 8, 2014 1:55 AM, "Subacini B" wrote: > >> Hi All, >> >> My cluster has 5 workers each having 4 cores (So total 20 cores).It is >> in stand alone mode (not using Mesos or Yarn).I want two

Re: Spark Worker Core Allocation

2014-06-08 Thread Subacini B
HI, I am stuck here, my cluster is not effficiently utilized . Appreciate any input on this. Thanks Subacini On Sat, Jun 7, 2014 at 10:54 PM, Subacini B wrote: > Hi All, > > My cluster has 5 workers each having 4 cores (So total 20 cores).It is in > stand alone mode (not using M

Spark Worker Core Allocation

2014-06-07 Thread Subacini B
Hi All, My cluster has 5 workers each having 4 cores (So total 20 cores).It is in stand alone mode (not using Mesos or Yarn).I want two programs to run at same time. So I have configured "spark.cores.max=3" , but when i run the program it allocates three cores taking one core from each worker mak