Re: EVent generation

2015-05-10 Thread Akhil Das
Have a look over here https://storm.apache.org/community.html Thanks Best Regards On Sun, May 10, 2015 at 3:21 PM, anshu shukla wrote: > > http://stackoverflow.com/questions/30149868/generate-events-tuples-using-csv-file-with-timestamps > > -- > Thanks & Regards, > Anshu Shukla >

Re: Is it possible to set the akka specify properties (akka.extensions) in spark

2015-05-10 Thread Akhil Das
Try SparkConf.set("spark.akka.extensions","Whatever"), underneath i think spark won't ship properties which don't start with spark.* to the executors. Thanks Best Regards On Mon, May 11, 2015 at 8:33 AM, Terry Hole wrote: > Hi all, > > I'd like to monitor the akka using kamon, which need to set

Re: guardian failed, shutting down system

2015-05-10 Thread Akhil Das
What kind of application are you running? Can you look in the worker logs and see what is going on? Thanks Best Regards On Mon, May 11, 2015 at 11:26 AM, 董帅阳 <917361...@qq.com> wrote: > *when i run spark application longtime.* > > > > 15/05/11 12:50:47 INFO spark.SparkContext: Created broadcast

Re: Spark can not access jar from HDFS !!

2015-05-10 Thread Ravindra
Hi All, Thanks for suggestions. What I tried is - hiveContext.sql ("add jar ") and that helps to complete the "create temporary function" but while using this function I get ClassNotFound for the class handling this function. The same class is present in the jar added . Please note that the s

Python -> SQL (geonames dataset)

2015-05-10 Thread Tyler Mitchell
I'm using Python to setup a dataframe, but for some reason it is not being made available to SQL. Code (from Zeppelin) below. I don't get any error when loading/prepping the data or dataframe. Any tips? (Originally I was not hardcoding the Row() structure, as my other tutorial added it by d

guardian failed, shutting down system

2015-05-10 Thread ??????
when i run spark application longtime. 15/05/11 12:50:47 INFO spark.SparkContext: Created broadcast 8 from broadcast at DAGScheduler.scala:839 15/05/11 12:50:47 INFO storage.BlockManagerInfo: Removed input-0-1431345102800 on dn2.vi:45688 in memory (size: 27.6 KB, free: 524.2 MB) 15/05/11 1

Spark on top of YARN Compression in iPython notebook

2015-05-10 Thread Bin Wang
I started a AWS cluster (1master + 3core) and download the prebuilt Spark binary. I downloaded the latest Anaconda Python and started a iPython notebook server by running the command below: ipython notebook --port --profile nbserver --no-browser Then, I try to develop a Spark application

Re: Does NullWritable can not be used in Spark?

2015-05-10 Thread donhoff_h
Hi, I am using hadoop2.5.2. My codes are listed as following. Besides, I also made some further tests. I found the following interesting result: 1.I will meet those exceptions when I set the Key Class as NullWritable, LongWritable, or IntWritable and used the PairRDDFunctions.saveAsNewAPIHad

Is it possible to set the akka specify properties (akka.extensions) in spark

2015-05-10 Thread Terry Hole
Hi all, I'd like to monitor the akka using kamon, which need to set the akka.extension to a list like this in typesafe config format: akka { extensions = ["kamon.system.SystemMetrics", "kamon.statsd.StatsD"] } But i can not find a way to do this, i have tried these: 1. SparkConf.set("akka

Re: Nullable is true for the schema of parquet data

2015-05-10 Thread dsgriffin
Ran into this same issue. Only solution seems to be to coerce the DataFrame's schema back into the right state. Looks like you have to convert the DF to an RDD, which has an overhead. But otherwise this worked for me: val newDF = sqlContext.createDataFrame(origDF.rdd, new StructType(origDF.schema.

Re: Spark Cassandra connector number of Tasks

2015-05-10 Thread vijaypawnarkar
Looking for help with this. Thank you! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Cassandra-connector-number-of-Tasks-tp22820p22839.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

RE: [SparkSQL] cannot filter by a DateType column

2015-05-10 Thread Haopu Wang
Sorry, I was using Spark 1.3.x. I cannot reproduce it in master. But should I still open a JIRA because can I request it to be back ported to 1.3.x branch? Thanks again! From: Michael Armbrust [mailto:mich...@databricks.com] Sent: Saturday, May 09, 201

Re: Multiple DataFrames per Parquet file?

2015-05-10 Thread ayan guha
Hi In that case read entire folder as a rdd and give some reasonable number of partitions. Best Ayan On 11 May 2015 01:35, "Peter Aberline" wrote: > Hi > > Thanks for the quick response. > > No I'm not using Streaming. Each DataFrame represents tabular data read > from a CSV file. They have the

Re: Multiple DataFrames per Parquet file?

2015-05-10 Thread Peter Aberline
Hi Thanks for the quick response. No I'm not using Streaming. Each DataFrame represents tabular data read from a CSV file. They have the same schema. There is also the option of appending each DF to the parquet file, but then I can't maintain them as separate DF when reading back in without filt

Re: Multiple DataFrames per Parquet file?

2015-05-10 Thread ayan guha
How did you end up with thousands of df? Are you using streaming? In that case you can do foreachRDD and keep merging incoming rdds to single rdd and then save it through your own checkpoint mechanism. If not, please share your use case. On 11 May 2015 00:38, "Peter Aberline" wrote: > Hi > > I

Re: Does NullWritable can not be used in Spark?

2015-05-10 Thread Ted Yu
Looking at ./core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala : * Load an RDD saved as a SequenceFile containing serialized objects, with NullWritable keys and * BytesWritable values that contain a serialized partition. This is still an experimental storage ... def objec

Multiple DataFrames per Parquet file?

2015-05-10 Thread Peter Aberline
Hi I have many thousands of small DataFrames that I would like to save to the one Parquet file to avoid the HDFS 'small files' problem. My understanding is that there is a 1:1 relationship between DataFrames and Parquet files if a single partition is used. Is it possible to have multiple DataFram

RE: Spark streaming closes with Cassandra Conector

2015-05-10 Thread Evo Eftimov
And in case you are running in local mode try giving more cores to spark with e.g. [5] – low number could be interfering with the tuning params which you can try to play with as well – all this is in the context of how those params interact with the Connection Pool and what that pool is doing i

Re: Loading file content based on offsets into the memory

2015-05-10 Thread in4maniac
As far as I know, that is not possible. If the file is too big to load to one node, What I would do is to use a RDD.map() function instead to load the file to distributed memory and then filter the lines that are relevant to me. I am not sure how to just read part of a single file. Sorry I'm unab

RE: Spark streaming closes with Cassandra Conector

2015-05-10 Thread Evo Eftimov
And one other suggestion in relation to the connection pool line of enquiry - check whether your cassandra service is configured to allow only one session per e.g. User I think the error is generated inside thr connection pool when it tries to initialize a connection after the first one Sent

RE: Spark streaming closes with Cassandra Conector

2015-05-10 Thread Evo Eftimov
Hmm there is also a Connection Pool involved and such things (especially while still rough on the edges) may behave erratically in a distributed multithreaded environment Can you try forEachPartition and foreach together – this will create a slightly different multithreading execution and

Re: Spark streaming closes with Cassandra Conector

2015-05-10 Thread Gerard Maas
I'm familiar with the TableWriter code and that log only appears if the write actually succeeded. (See https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/scala/com/datastax/spark/connector/writer/TableWriter.scala ) Thinking infrastructure, we see

RE: Spark streaming closes with Cassandra Conector

2015-05-10 Thread Evo Eftimov
I think the message that it has written 2 rows is misleading If you look further down you will see that it could not initialize a connection pool for Casandra (presumably while trying to write the previously mentioned 2 rows) Another confirmation of this hypothesis is the phrase “error d

Re: Spark streaming closes with Cassandra Conector

2015-05-10 Thread Gerard Maas
It successfully writes some data and fails afterwards, like the host or connection goes down. Weird. Maybe you should post this question on the Spark-Cassandra connector group: https://groups.google.com/a/lists.datastax.com/forum/#!forum/spark-connector-user -kr, Gerard. On Sun, May 10, 2015 a

EVent generation

2015-05-10 Thread anshu shukla
http://stackoverflow.com/questions/30149868/generate-events-tuples-using-csv-file-with-timestamps -- Thanks & Regards, Anshu Shukla

RE: spark streaming and computation

2015-05-10 Thread Evo Eftimov
You can implement a custom partitioner -Original Message- From: skippi [mailto:skip...@gmx.de] Sent: Sunday, May 10, 2015 10:19 AM To: user@spark.apache.org Subject: spark streaming and computation Assuming a web server access log shall be analyzed and target of computation shall be csv

spark streaming and computation

2015-05-10 Thread skippi
Assuming a web server access log shall be analyzed and target of computation shall be csv-files per time, e.g. one per day containing the minute-statistics and one per month containing the hour statistics. Incoming statistics are computed as discretized streams using spark streaming context. Basic

Is it possible to set the akka specify properties (akka.extensions) in spark

2015-05-10 Thread Terry Hole
Hi all, I'd like to monitor the akka using kamon, which need to set the akka.extension to a list like this in typesafe config format: akka { extensions = ["kamon.system.SystemMetrics", "kamon.statsd.StatsD"] } But i can not find a way to do this, i have tried these: 1. SparkConf.set("