Re: Question about Serialization in Storage Level

2015-05-21 Thread Todd Nist
>From the docs, https://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence: Storage LevelMeaningMEMORY_ONLYStore RDD as deserialized Java objects in the JVM. If the RDD does not fit in memory, some partitions will not be cached and will be recomputed on the fly each time they're n

spark.executor.extraClassPath - Values not picked up by executors

2015-05-22 Thread Todd Nist
I'm using the spark-cassandra-connector from DataStax in a spark streaming job launched from my own driver. It is connecting a a standalone cluster on my local box which has two worker running. This is Spark 1.3.1 and spark-cassandra-connector-1.3.0-SNAPSHOT. I have added the following entry to

Re: spark.executor.extraClassPath - Values not picked up by executors

2015-05-23 Thread Todd Nist
I'm just starting out with Cassandra and discovered > https://datastax-oss.atlassian.net/browse/SPARKC-98 is still open... > > On Fri, May 22, 2015 at 6:15 PM, Todd Nist wrote: > >> I'm using the spark-cassandra-connector from DataStax in a spark >> streaming job laun

Re: Spark SQL and Streaming Results

2015-06-05 Thread Todd Nist
There use to be a project, StreamSQL ( https://github.com/thunderain-project/StreamSQL), but it appears a bit dated and I do not see it in the Spark repo, but may have missed it. @TD Is this project still active? I'm not sure what the status is but it may provide some insights on how to achieve w

Re: How to pass arguments dynamically, that needs to be used in executors

2015-06-11 Thread Todd Nist
Hi Gaurav, Seems like you could use a broadcast variable for this if I understand your use case. Create it in the driver based on the CommandLineArguments and then use it in the workers. https://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables So something like: Broadcas

Re: Spark 1.4 release date

2015-06-12 Thread Todd Nist
It was released yesterday. On Friday, June 12, 2015, ayan guha wrote: > Hi > > When is official spark 1.4 release date? > Best > Ayan >

Re: Spark DataFrame Reduce Job Took 40s for 6000 Rows

2015-06-15 Thread Todd Nist
Hi Proust, Is it possible to see the query you are running and can you run EXPLAIN EXTENDED to show the physical plan for the query. To generate the plan you can do something like this from $SPARK_HOME/bin/beeline: 0: jdbc:hive2://localhost:10001> explain extended select * from YourTableHere;

Re: Spark 1.4 on HortonWork HDP 2.2

2015-06-19 Thread Todd Nist
You can get HDP with at least 1.3.1 from Horton: http://hortonworks.com/hadoop-tutorial/using-apache-spark-technical-preview-with-hdp-2-2/ for your convenience from the dos: wget -nv http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.2.4.4/hdp.repo -O /etc/yum.repos.d/HDP-TP.repo

Re: Setting JVM heap start and max sizes, -Xms and -Xmx, for executors

2015-07-02 Thread Todd Nist
You should use: spark.executor.memory from the docs : spark.executor.memory512mAmount of memory to use per executor process, in the same format as JVM memory strings (e.g.512m, 2g). -Todd On Thu, Jul 2, 2015 at 3:36 PM, Mulugeta Mammo

Re: Setting JVM heap start and max sizes, -Xms and -Xmx, for executors

2015-07-02 Thread Todd Nist
limitation at this time. -Todd On Thu, Jul 2, 2015 at 4:13 PM, Mulugeta Mammo wrote: > thanks but my use case requires I specify different start and max heap > sizes. Looks like spark sets start and max sizes same value. > > On Thu, Jul 2, 2015 at 1:08 PM, Todd Nist wrote: > &g

Re: [X-post] Saving SparkSQL result RDD to Cassandra

2015-07-09 Thread Todd Nist
foreachRDD returns a unit: def foreachRDD(foreachFunc: (RDD [T]) ⇒ Unit): Unit Apply a function to each RDD in this DStream. This is an output operator, so 'this' DStream will be registered as an output stream and ther

Re: Saving RDD into cassandra keyspace.

2015-07-10 Thread Todd Nist
I would strongly encourage you to read the docs at, they are very useful in getting up and running: https://github.com/datastax/spark-cassandra-connector/blob/master/doc/0_quick_start.md For your use case shown above, you will need to ensure that you include the appropriate version of the spark-c

Re: spark streaming job to hbase write

2015-07-15 Thread Todd Nist
There are there connector packages listed on spark packages web site: http://spark-packages.org/?q=hbase HTH. -Todd On Wed, Jul 15, 2015 at 2:46 PM, Shushant Arora wrote: > Hi > > I have a requirement of writing in hbase table from Spark streaming app > after some processing. > Is Hbase put o

Re: Use rank with distribute by in HiveContext

2015-07-16 Thread Todd Nist
Did you take a look at the excellent write up by Yin Huai and Michael Armbrust? It appears that rank is supported in the 1.4.x release. https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html Snippet from above article for your convenience: To answer the first ques

Re: Building a REST Service with Spark back-end

2016-03-02 Thread Todd Nist
Have you looked at Apache Toree, http://toree.apache.org/. This was formerly the Spark-Kernel from IBM but contributed to apache. https://github.com/apache/incubator-toree You can find a good overview on the spark-kernel here: http://www.spark.tc/how-to-enable-interactive-applications-against-ap

Re: Spark Streaming, very slow processing and increasing scheduling delay of kafka input stream

2016-03-10 Thread Todd Nist
Hi Vinti, All of your tasks are failing based on the screen shots provided. I think a few more details would be helpful. Is this YARN or a Standalone cluster? How much overall memory is on your cluster? On each machine where workers and executors are running? Are you using the Direct (KafkaUt

Re: "bootstrapping" DStream state

2016-03-10 Thread Todd Nist
The updateStateByKey can be supplied an initialRDD to populate it with. Per code ( https://github.com/apache/spark/blob/v1.4.0/streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala#L435-L445 ). Provided here for your convenience. /** * Return a new "state" D

Re: Correct way to use spark streaming with apache zeppelin

2016-03-12 Thread Todd Nist
Below is a link to an example which Silvio Fiorito put together demonstrating how to link Zeppelin with Spark Stream for real-time charts. I think the original thread was pack in early November 2015, subject: Real time chart in Zeppelin, if you care to try to find it. https://gist.github.com/grant

Re: Apache Flink

2016-04-17 Thread Todd Nist
So there is an offering from Stratio, https://github.com/Stratio/Decision Decision CEP engine is a Complex Event Processing platform built on Spark > Streaming. > > It is the result of combining the power of Spark Streaming as a continuous > computing framework and Siddhi CEP engine as complex e

Re: How to change akka.remote.startup-timeout in spark

2016-04-21 Thread Todd Nist
I believe you can adjust it by setting the following: spark.akka.timeout 100s Communication timeout between Spark nodes. HTH. -Todd On Thu, Apr 21, 2016 at 9:49 AM, yuemeng (A) wrote: > When I run a spark application,sometimes I get follow ERROR: > > 16/04/21 09:26:45 ERROR SparkContext: Er

Re: Spark 1.6.1. How to prevent serialization of KafkaProducer

2016-04-21 Thread Todd Nist
Have you looked at these: http://allegro.tech/2015/08/spark-kafka-integration.html http://mkuthan.github.io/blog/2016/01/29/spark-kafka-integration2/ Full example here: https://github.com/mkuthan/example-spark-kafka HTH. -Todd On Thu, Apr 21, 2016 at 2:08 PM, Alexander Gallego wrote: > Than

Re: Spark SQL Transaction

2016-04-23 Thread Todd Nist
I believe the class you are looking for is org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala. By default in savePartition(...) , it will do the following: if (supportsTransactions) { conn.setAutoCommit(false) // Everything in the same db transaction. } Then at line 224, it will issu

Re: Unit testing framework for Spark Jobs?

2016-05-18 Thread Todd Nist
Perhaps these may be of some use: https://github.com/mkuthan/example-spark http://mkuthan.github.io/blog/2015/03/01/spark-unit-testing/ https://github.com/holdenk/spark-testing-base On Wed, May 18, 2016 at 2:14 PM, swetha kasireddy wrote: > Hi Lars, > > Do you have any examples for the methods

Re: Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Todd Nist
What version of Spark are you using? I do not believe that 1.6.x is compatible with 0.9.0.1 due to changes in the kafka clients between 0.8.2.2 and 0.9.0.x. See this for more information: https://issues.apache.org/jira/browse/SPARK-12177 -Todd On Tue, Jun 7, 2016 at 7:35 AM, Dominik Safaric w

Re: Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Todd Nist
fig, running in standalone mode > (org.apache.zookeeper.server.quorum.QuorumPeerMain) > > Any indication onto why the channel connection might be closed? Would it > be Kafka or Zookeeper related? > > On 07 Jun 2016, at 14:07, Todd Nist wrote: > > What version of Spark are you

Re: Load selected rows with sqlContext in the dataframe

2016-07-21 Thread Todd Nist
You can set the dbtable to this: .option("dbtable", "(select * from master_schema where 'TID' = '100_0')") HTH, Todd On Thu, Jul 21, 2016 at 10:59 AM, sujeet jog wrote: > I have a table of size 5GB, and want to load selective rows into dataframe > instead of loading the entire table in memor

Re: HiveThriftServer2.startWithContext no more showing tables in 1.6.2

2016-07-21 Thread Todd Nist
This is due to a change in 1.6, by default the Thrift server runs in multi-session mode. You would want to set the following to true on your spark config. spark-default.conf set spark.sql.hive.thriftServer.singleSession Good write up here: https://community.hortonworks.com/questions/29090/i-cant

Re: Securing objects on the thrift server

2015-12-15 Thread Todd Nist
see https://issues.apache.org/jira/browse/SPARK-11043, it is resolved in 1.6. On Tue, Dec 15, 2015 at 2:28 PM, Younes Naguib < younes.nag...@tritondigital.com> wrote: > The one coming with spark 1.5.2. > > > > y > > > > *From:* Ted Yu [mailto:yuzhih...@gmail.com] > *Sent:* December-15-15 1:59 PM

Re: looking for a easier way to count the number of items in a JavaDStream

2015-12-16 Thread Todd Nist
Another possible alternative is to register a StreamingListener and then reference the BatchInfo.numRecords; good example here, https://gist.github.com/akhld/b10dc491aad1a2007183. After registering the listener, Simply implement the appropriate "onEvent" method where onEvent is onBatchStarted, onB

Re: problem building spark on centos

2016-01-06 Thread Todd Nist
Hi Jade, I think you "--name" option. The makedistribution should look like this: ./make-distribution.sh --name hadoop-2.6 --tgz -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver -DskipTests. As for why it failed to build with scala 2.11, did you run the ./dev/change-scala-ve

Re: problem building spark on centos

2016-01-06 Thread Todd Nist
That should read "I think your missing the --name option". Sorry about that. On Wed, Jan 6, 2016 at 3:03 PM, Todd Nist wrote: > Hi Jade, > > I think you "--name" option. The makedistribution should look like this: > > ./make-distribution.sh --name h

Re: problem building spark on centos

2016-01-06 Thread Todd Nist
y/MAVEN/PluginExecutionException > [ERROR] > [ERROR] After correcting the problems, you can resume the build with the > command > [ERROR] mvn -rf :spark-launcher_2.10 > > Do you think it’s java problem? I’m using oracle JDK 1.7. Should I update > it to 1.8 instead? I just

Re: write new data to mysql

2016-01-08 Thread Todd Nist
It is not clear from the information provided why the insertIntoJDBC failed in #2. I would note that method on the DataFrame as been deprecated since 1.4, not sure what version your on. You should be able to do something like this: DataFrame.write.mode(SaveMode.Append).jdbc(MYSQL_CONNECTION_URL

Re: write new data to mysql

2016-01-08 Thread Todd Nist
Sorry, did not see your update until now. On Fri, Jan 8, 2016 at 3:52 PM, Todd Nist wrote: > Hi Yasemin, > > What version of Spark are you using? Here is the reference, it is off of > the DataFrame > https://spark.apache.org/docs/latest/api/java/index.html#org.apache.spark.sql.D

Re: write new data to mysql

2016-01-08 Thread Todd Nist
cant find it. > The code and error are in gist > <https://gist.github.com/yaseminn/f5a2b78b126df71dfd0b>. Could you check > it out please? > > Best, > yasemin > > 2016-01-08 18:23 GMT+02:00 Todd Nist : > >> It is not clear from the information provided why the i

Re: GroupBy on DataFrame taking too much time

2016-01-11 Thread Todd Nist
Hi Rajeshwar Gaini, dbtable can be any valid sql query, simple define it as a sub query, something like: val query = "(SELECT country, count(*) FROM customer group by country) as X" val df1 = sqlContext.read .format("jdbc") .option("url", url) .option("user", username) .opti

Re: NPE when using Joda DateTime

2016-01-14 Thread Todd Nist
I had a similar problem a while back and leveraged these Kryo serializers, https://github.com/magro/kryo-serializers. I had to fallback to version 0.28, but that was a while back. You can add these to the org.apache.spark.serializer.KryoRegistrator and then set your registrator in the spark con

Re: Passing binding variable in query used in Data Source API

2016-01-21 Thread Todd Nist
Hi Satish, You should be able to do something like this: val props = new java.util.Properties() props.put("user", username) props.put("password",pwd) props.put("driver", "org.postgresql.Drive") val deptNo = 10 val where = Some(s"dept_number = $deptNo") val df = sqlContext.rea

Re: Saving Kafka Offsets to Cassandra at begining of each batch in Spark Streaming

2016-02-16 Thread Todd Nist
You could use the "withSessionDo" of the SparkCassandrConnector to preform the simple insert: CassandraConnector(conf).withSessionDo { session => session.execute() } -Todd On Tue, Feb 16, 2016 at 11:01 AM, Cody Koeninger wrote: > You could use sc.parallelize... but the offsets are already

Re: Spark Integration Patterns

2016-02-28 Thread Todd Nist
Define your SparkConfig to set the master: val conf = new SparkConf().setAppName(AppName) .setMaster(SparkMaster) .set() Where SparkMaster = "spark://SparkServerHost:7077". So if your spark server hostname it "RADTech" then it would be "spark://RADTech:7077". Then when you create

Re: Spark Integration Patterns

2016-02-28 Thread Todd Nist
hing obvious ? > > > Le dim. 28 févr. 2016 à 19:01, Todd Nist a écrit : > >> Define your SparkConfig to set the master: >> >> val conf = new SparkConf().setAppName(AppName) >> .setMaster(SparkMaster) >> .set() >> >> Where Spark

Re: Spark for client

2016-03-01 Thread Todd Nist
You could also look at Apache Toree, http://toree.apache.org/ , github : https://github.com/apache/incubator-toree. This use to be the Spark Kernel from IBM but has been contributed to Apache. Good overview here on its features, http://www.spark.tc/how-to-enable-interactive-applications-against-a

Re: Spark SQL Thriftserver and Hive UDF in Production

2015-10-19 Thread Todd Nist
>From tableau, you should be able to use the Initial SQL option to support this: So in Tableau add the following to the “Initial SQL” create function myfunc AS 'myclass' using jar 'hdfs:///path/to/jar'; HTH, Todd On Mon, Oct 19, 2015 at 11:22 AM, Deenar Toraskar wrote: > Reece > > You can

Re: java.lang.NegativeArraySizeException? as iterating a big RDD

2015-10-23 Thread Todd Nist
Hi Yifan, You could also try increasing the spark.kryoserializer.buffer.max.mb *spark.kryoserializer.buffer.max.mb *(64 Mb by default) : useful if your default buffer size goes further than 64 Mb; Per doc: Maximum allowable size of Kryo serialization buffer. This must be larger than any object y

Re: Newbie Help for spark compilation problem

2015-10-25 Thread Todd Nist
Hi Bilnmek, Spark 1.5.x does not support Scala 2.11.7 so the easiest thing to do it build it like your trying. Here are the steps I followed to build it on a Max OS X 10.10.5 environment, should be very similar on ubuntu. 1. set theJAVA_HOME environment variable in my bash session via export JA

Re: Newbie Help for spark compilation problem

2015-10-25 Thread Todd Nist
It does. > > It is not even this difficult; you just need a source distribution, > and then run "./dev/change-scala-version.sh 2.11" as you say. Then > build as normal > > On Sun, Oct 25, 2015 at 4:00 PM, Todd Nist > wrote: > > Hi Bilnmek, > > > > Spark 1.

Re: Newbie Help for spark compilation problem

2015-10-25 Thread Todd Nist
published: > http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-parent_2.11%22 > > On Sun, Oct 25, 2015 at 7:37 PM, Todd Nist wrote: > > Sorry Sean you are absolutely right it supports 2.11 all o meant is > there is > > no release available as a standard download and that on

Re: Maven build failed (Spark master)

2015-10-27 Thread Todd Nist
I issued the same basic command and it worked fine. RADTech-MBP:spark $ ./make-distribution.sh --name hadoop-2.6 --tgz -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver -DskipTests Which created: spark-1.6.0-SNAPSHOT-bin-hadoop-2.6.tgz in the root directory of the project. FW

Re: Getting the batch time of the active batches in spark streaming

2015-11-24 Thread Todd Nist
Hi Abhi, You should be able to register a org.apache.spark.streaming.scheduler.StreamListener. There is an example here that may help: https://gist.github.com/akhld/b10dc491aad1a2007183 and the spark api docs here, http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SparkListe

Re: Getting the batch time of the active batches in spark streaming

2015-11-24 Thread Todd Nist
(StreamingListenerBatchSubmitted batchSubmitted) { system.out.println("Start time: " + batchSubmitted.batchInfo.processingStartTime) } Sorry for the confusion. -Todd On Tue, Nov 24, 2015 at 7:51 PM, Todd Nist wrote: > Hi Abhi, > > You should be able to register a > org.apache.spark.streaming.sc

Re: [Spark Streaming] How to clear old data from Stream State?

2015-11-25 Thread Todd Nist
Perhaps the new trackStateByKey targeted for very 1.6 may help you here. I'm not sure if it is part of 1.6 or not for sure as the jira does not specify a fixed version. The jira describing it is here: https://issues.apache.org/jira/browse/SPARK-2629, and the design doc that discusses the API chang

Re: Spark Driver Port Details

2015-11-25 Thread Todd Nist
The default is to start applications with port 4040 and then increment them by 1 as you are seeing, see docs here: http://spark.apache.org/docs/latest/monitoring.html#web-interfaces You can override this behavior by setting passing the --conf spark.ui.port=4080 or in your code; something like thi

Re: Does Spark streaming support is there with RabbitMQ

2015-07-20 Thread Todd Nist
There is one package available on the spark-packages site, http://spark-packages.org/package/Stratio/RabbitMQ-Receiver The source is here: https://github.com/Stratio/RabbitMQ-Receiver Not sure that meets your needs or not. -Todd On Mon, Jul 20, 2015 at 8:52 AM, Jeetendra Gangele wrote: > Do

Re: Starting Spark SQL thrift server from within a streaming app

2015-08-05 Thread Todd Nist
Hi Danniel, It is possible to create an instance of the SparkSQL Thrift server, however seems like this project is what you may be looking for: https://github.com/Intel-bigdata/spark-streamingsql Not 100% sure of your use case is, but you can always convert the data into DF then issue a query ag

Re: How can I know currently supported functions in Spark SQL

2015-08-06 Thread Todd Nist
They are covered here in the docs: http://spark.apache.org/docs/1.4.1/api/scala/index.html#org.apache.spark.sql.functions$ On Thu, Aug 6, 2015 at 5:52 AM, Netwaver wrote: > Hi All, > I am using Spark 1.4.1, and I want to know how can I find the > complete function list supported in Sp

Re: Starting Spark SQL thrift server from within a streaming app

2015-08-06 Thread Todd Nist
> server on a streaming app ? > > Thanks again. > Daniel > > > On Thu, Aug 6, 2015 at 1:53 AM, Todd Nist wrote: > >> Hi Danniel, >> >> It is possible to create an instance of the SparkSQL Thrift server, >> however seems like this project is what yo

Re: Tungsten and Spark Streaming

2015-09-10 Thread Todd Nist
https://issues.apache.org/jira/browse/SPARK-8360?jql=project%20%3D%20SPARK%20AND%20text%20~%20Streaming -Todd On Thu, Sep 10, 2015 at 10:22 AM, Gurvinder Singh < gurvinder.si...@uninett.no> wrote: > On 09/10/2015 07:42 AM, Tathagata Das wrote: > > Rewriting is necessary. You will have to convert

Re: Replacing Esper with Spark Streaming?

2015-09-14 Thread Todd Nist
Stratio offers a CEP implementation based on Spark Streaming and the Siddhi CEP engine. I have not used the below, but they may be of some value to you: http://stratio.github.io/streaming-cep-engine/ https://github.com/Stratio/streaming-cep-engine HTH. -Todd On Sun, Sep 13, 2015 at 7:49 PM, O

Re: KafkaProducer using Cassandra as source

2015-09-23 Thread Todd Nist
Hi Kali, If you do not mind sending JSON, you could do something like this, using json4s: val rows = p.collect() map ( row => TestTable(row.getString(0), row.getString(1)) ) val json = parse(write(rows)) producer.send(new KeyedMessage[String, String]("trade", writePretty(json))) // or for eac

Re: Writing to Hbase table from Spark

2016-08-30 Thread Todd Nist
Have you looked at spark-packges.org? There are several different HBase connectors there, not sure if any meet you need or not. https://spark-packages.org/?q=hbase HTH, -Todd On Tue, Aug 30, 2016 at 5:23 AM, ayan guha wrote: > You can use rdd level new hadoop format api and pass on appropr

Re: Design patterns involving Spark

2016-08-30 Thread Todd Nist
Have not tried this, but looks quite useful if one is using Druid: https://github.com/implydata/pivot - An interactive data exploration UI for Druid On Tue, Aug 30, 2016 at 4:10 AM, Alonso Isidoro Roman wrote: > Thanks Mitch, i will check it. > > Cheers > > > Alonso Isidoro Roman > [image: htt

Re: Creating HiveContext withing Spark streaming

2016-09-08 Thread Todd Nist
Hi Mich, Perhaps the issue is having multiple SparkContexts in the same JVM ( https://issues.apache.org/jira/browse/SPARK-2243). While it is possible, I don't think it is encouraged. As you know, the call your currently invoking to create the StreamingContext also creates a SparkContext. /** * C

Re: Is there such thing as cache fusion with the underlying tables/files on HDFS

2016-09-17 Thread Todd Nist
Hi Mich, Have you looked at Apache Ignite? https://apacheignite-fs.readme.io/docs. This looks like something that may be what your looking for: http://apacheignite.gridgain.org/docs/data-analysis-with-apache-zeppelin HTH. -Todd On Sat, Sep 17, 2016 at 12:53 PM, Mich Talebzadeh wrote: > Hi

Re: is there any bug for the configuration of spark 2.0 cassandra spark connector 2.0 and cassandra 3.0.8

2016-09-20 Thread Todd Nist
These types of questions would be better asked on the user mailing list for the Spark Cassandra connector: http://groups.google.com/a/lists.datastax.com/forum/#!forum/spark-connector-user Version compatibility can be found here: https://github.com/datastax/spark-cassandra-connector#version-compa

Re: Tableau BI on Spark SQL

2017-01-30 Thread Todd Nist
Hi Mich, You could look at http://www.exasol.com/. It works very well with Tableau without the need to extract the data. Also in V6, it has the virtual schemas which would allow you to access data in Spark, Hive, Oracle, or other sources. May be outside of what you are looking for, it works wel

Re: Backpressure initial rate not working

2018-07-26 Thread Todd Nist
Hi Biplob, How many partitions are on the topic you are reading from and have you set the maxRatePerPartition? iirc, spark back pressure is calculated as follows: *Spark back pressure:* Back pressure is calculated off of the following: • maxRatePerPartition=200 • batchInterval 30s • 3 parti

Re: Backpressure initial rate not working

2018-07-26 Thread Todd Nist
uration=log4j-spark.properties" \ >--files "${JAAS_CONF},${KEYTAB}" \ >--class "${MAIN_CLASS}" \ >"${ARTIFACT_FILE}" > > > The first batch is huge, even if it worked for the first batch I would've > tried researching more. Th

Re: cache table vs. parquet table performance

2019-01-16 Thread Todd Nist
Hi Tomas, Have you considered using something like https://www.alluxio.org/ for you cache? Seems like a possible solution for what your trying to do. -Todd On Tue, Jan 15, 2019 at 11:24 PM 大啊 wrote: > Hi ,Tomas. > Thanks for your question give me some prompt.But the best way use cache > usual

Re: spark.submit.deployMode: cluster

2019-03-29 Thread Todd Nist
A little late, but have you looked at https://livy.incubator.apache.org/, works well for us. -Todd On Thu, Mar 28, 2019 at 9:33 PM Jason Nerothin wrote: > Meant this one: https://docs.databricks.com/api/latest/jobs.html > > On Thu, Mar 28, 2019 at 5:06 PM Pat Ferrel wrote: > >> Thanks, are you

Re: Using P4J Plugins with Spark

2020-04-21 Thread Todd Nist
You may want to make sure you include the jar of P4J and your plugins as part of the following so that both the driver and executors have access. If HDFS is out then you could make a common mount point on each of the executor nodes so they have access to the classes. - spark-submit --jars /com

Re: Exception handling in Spark

2020-05-05 Thread Todd Nist
Could you do something like this prior to calling the action. // Create FileSystem object from Hadoop Configuration val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration) // This methods returns Boolean (true - if file exists, false - if file doesn't exist val fileExists = fs.exists(new P

Re: Link existing Hive to Spark

2015-02-06 Thread Todd Nist
Hi Ashu, Per the documents: Configuration of Hive is done by placing your hive-site.xml file in conf/. For example, you can place a something like this in your $SPARK_HOME/conf/hive-site.xml file: hive.metastore.uris ** thrift://*HostNameHere*:9083 URI for client to contact metastore

Re: Link existing Hive to Spark

2015-02-06 Thread Todd Nist
e.xml > there? > > If I build Spark from source code , I can put the file in conf/ but I am > avoiding that. > -- > *From:* Todd Nist > *Sent:* Friday, February 6, 2015 8:32 PM > *To:* Ashutosh Trivedi (MT2013030) > *Cc:* user@spark.apache.org &

SparkSQL + Tableau Connector

2015-02-10 Thread Todd Nist
Hi, I'm trying to understand how and what the Tableau connector to SparkSQL is able to access. My understanding is it needs to connect to the thriftserver and I am not sure how or if it exposes parquet, json, schemaRDDs, or does it only expose schemas defined in the metastore / hive. For exampl

Re: SparkSQL + Tableau Connector

2015-02-10 Thread Todd Nist
; fashion, sort of related to question 2 you would need to configure thrift > to read from the metastore you expect it read from - by default it reads > from metastore_db directory present in the directory used to launch the > thrift server. > On 11 Feb 2015 01:35, "Todd Nist"

Re: SparkSQL + Tableau Connector

2015-02-10 Thread Todd Nist
options > (path 'examples/src/main/resources/people.json’) > cache table people > > create temporary table users using org.apache.spark.sql.parquet options > (path 'examples/src/main/resources/users.parquet’) > cache table users > > From: Todd Nist > Date

Re: SparkSQL + Tableau Connector

2015-02-10 Thread Todd Nist
27;examples/src/main/resources/kv1.txt' INTO TABLE src") // Queries are expressed in HiveQLsqlContext.sql("FROM src SELECT key, value").collect().foreach(println) Or did you have something else in mind? -Todd On Tue, Feb 10, 2015 at 6:35 PM, Todd Nist wrote: > Arush, > &g

Re: SparkSQL + Tableau Connector

2015-02-10 Thread Todd Nist
'examples/src/main/resources/json/*') > > ; > Time taken: 0.34 seconds > > spark-sql> select * from people; > NULL Michael > 30 Andy > 19 Justin > NULLMichael > 30 Andy > 19 Justin > Time taken: 0.576 seconds > > F

Re: SparkSQL + Tableau Connector

2015-02-11 Thread Todd Nist
or >> sharp-shell using the same option. >> >> >> >> On Wed, Feb 11, 2015 at 5:23 AM, Todd Nist wrote: >> >>> Arush, >>> >>> As for #2 do you mean something like this from the docs: >>> >>> // sc is an existi

Re: SparkSQL + Tableau Connector

2015-02-11 Thread Todd Nist
> From: ar...@sigmoidanalytics.com > To: tsind...@gmail.com > CC: user@spark.apache.org > > Hi > > I used this, though its using a embedded driver and is not a good > approch.It works. You can configure for some other metastore type also. I > have not tried the metastor

Is it possible to expose SchemaRDD’s from thrift server?

2015-02-12 Thread Todd Nist
I have a question with regards to accessing SchemaRDD’s and Spark SQL temp tables via the thrift server. It appears that a SchemaRDD when created is only available in the local namespace / context and are unavailable to external services accessing Spark through thrift server via ODBC; is this corr

Re: Is it possible to expose SchemaRDD’s from thrift server?

2015-02-12 Thread Todd Nist
On Thu, Feb 12, 2015 at 7:24 AM, Todd Nist wrote: > >> I have a question with regards to accessing SchemaRDD’s and Spark SQL >> temp tables via the thrift server. It appears that a SchemaRDD when >> created is only available in the local namespace / context and are &g

Re: Unable to query hive tables from spark

2015-02-15 Thread Todd Nist
What does your hive-site.xml look like? Do you actually have a directory at the location shown in the error? i.e does "/user/hive/warehouse/src" exist? You should be able to override this by specifying the following: --hiveconf hive.metastore.warehouse.dir=/location/where/your/warehouse/exists

Re: Tableau beta connector

2015-02-19 Thread Todd Nist
I am able to connect by doing the following using the Tableau Initial SQL and a custom query: 1. First ingest csv file or json and save out to file system: import org.apache.spark.sql.SQLContext import com.databricks.spark.csv._ val sqlContext = new SQLContext(sc) val demo = sq

Re: SparkSQL + Tableau Connector

2015-02-19 Thread Todd Nist
are actually in the metastore so Tableau can discover them > in the schema. In that case you will either have to generate the Hive > tables externally from Spark or use Spark to process the data and save them > using a HiveContext. > > >From: Todd Nist > Date: Wednesday, February

Re: No suitable driver found error, Create table in hive from spark sql

2015-02-19 Thread Todd Nist
Hi Dhimant, I believe if you change your spark-shell to pass "-driver-class-path /usr/local/spark/lib/mysql-connector-java-5.1.34-bin.jar" vs putting it in "--jars". -Todd On Wed, Feb 18, 2015 at 10:41 PM, Dhimant wrote: > Found solution from one of the post found on internet. > I updated spar

Re: Where to look for potential causes for Akka timeout errors in a Spark Streaming Application?

2015-02-20 Thread Todd Nist
Hi Emre, Have you tried adjusting these: .set("spark.akka.frameSize", "500").set("spark.akka.askTimeout", "30").set("spark.core.connection.ack.wait.timeout", "600") -Todd On Fri, Feb 20, 2015 at 8:14 AM, Emre Sevinc wrote: > Hello, > > We are building a Spark Streaming application that listen

Re: Is SPARK_CLASSPATH really deprecated?

2015-02-26 Thread Todd Nist
Hi Kannan, I believe you should be able to use the --jars for this when invoke the spark-shell or perform a spark-submit. Per docs: --jars JARSComma-separated list of local jars to include on the driver and executor classpaths. HTH. -Todd On Thu, Feb

Re: Is SPARK_CLASSPATH really deprecated?

2015-02-26 Thread Todd Nist
Hi Kannan, Issues with using --jars make sense. I believe you can set the classpath via the use the --conf spark.executor.extraClassPath= or in your driver with .set("spark.executor.extraClassPath", ".") I believe you are correct with the localize as well as long as your guaranteed that

Re: What joda-time dependency does spark submit use/need?

2015-02-27 Thread Todd Nist
You can specify these jars (joda-time-2.7.jar, joda-convert-1.7.jar) either as part of your build and assembly or via the --jars option to spark-submit. HTH. On Fri, Feb 27, 2015 at 2:48 PM, Su She wrote: > Hello Everyone, > > I'm having some issues launching (non-spark) applications via the >

Re: Spark Monitoring UI for Hadoop Yarn Cluster

2015-03-03 Thread Todd Nist
Hi Srini, If you start the $SPARK_HOME/sbin/start-history-server, you should be able to see the basic spark ui. You will not see the master, but you will be able to see the rest as I recall. You also need to add an entry into the spark-defaults.conf, something like this: *## Make sure the host

Spark Build with Hadoop 2.6, yarn - encounter java.lang.NoClassDefFoundError: org/codehaus/jackson/map/deser/std/StdDeserializer

2015-03-05 Thread Todd Nist
I am running Spark on a HortonWorks HDP Cluster. I have deployed there prebuilt version but it is only for Spark 1.2.0 not 1.2.1 and there are a few fixes and features in there that I would like to leverage. I just downloaded the spark-1.2.1 source and built it to support Hadoop 2.6 by doing the f

Re: Spark Build with Hadoop 2.6, yarn - encounter java.lang.NoClassDefFoundError: org/codehaus/jackson/map/deser/std/StdDeserializer

2015-03-05 Thread Todd Nist
-Djackson.version=1.9.3 > > > > Cheers > > > > On Thu, Mar 5, 2015 at 10:04 AM, Todd Nist wrote: > >> > >> I am running Spark on a HortonWorks HDP Cluster. I have deployed there > >> prebuilt version but it is only for Spark 1.2.0 not 1.2.1 and the

Re: Spark Build with Hadoop 2.6, yarn - encounter java.lang.NoClassDefFoundError: org/codehaus/jackson/map/deser/std/StdDeserializer

2015-03-06 Thread Todd Nist
to > work: > > https://github.com/apache/spark/pull/3938 > > On Thu, Mar 5, 2015 at 10:04 AM, Todd Nist wrote: > > > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:166) >at > org.apache.hadoop.service.AbstractService.init(A

Re: Spark Build with Hadoop 2.6, yarn - encounter java.lang.NoClassDefFoundError: org/codehaus/jackson/map/deser/std/StdDeserializer

2015-03-06 Thread Todd Nist
shell failed in the first place. > > Thanks. > > Zhan Zhang > > On Mar 6, 2015, at 9:59 AM, Todd Nist wrote: > > First, thanks to everyone for their assistance and recommendations. > > @Marcelo > > I applied the patch that you recommended and am no

Re: Spark Build with Hadoop 2.6, yarn - encounter java.lang.NoClassDefFoundError: org/codehaus/jackson/map/deser/std/StdDeserializer

2015-03-06 Thread Todd Nist
Mar 6, 2015, at 11:40 AM, Zhan Zhang wrote: > > You are using 1.2.1 right? If so, please add java-opts in conf > directory and give it a try. > > [root@c6401 conf]# more java-opts > -Dhdp.version=2.2.2.0-2041 > > Thanks. > > Zhan Zhang > > On Mar 6, 2015, at 11:35 AM, Todd Nist wrote: > > -Dhdp.version=2.2.0.0-2041 > > > >

Re: hbase sql query

2015-03-12 Thread Todd Nist
Have you considered using the spark-hbase-connector for this: https://github.com/nerdammer/spark-hbase-connector On Thu, Mar 12, 2015 at 5:19 AM, Udbhav Agarwal wrote: > Thanks Akhil. > > Additionaly if we want to do sql query we need to create JavaPairRdd, then > JavaRdd, then JavaSchemaRdd

Re: hbase sql query

2015-03-12 Thread Todd Nist
scala, I was looking for some help with > java Apis. > > > > *Thanks,* > > *Udbhav Agarwal* > > > > *From:* Todd Nist [mailto:tsind...@gmail.com] > *Sent:* 12 March, 2015 5:28 PM > *To:* Udbhav Agarwal > *Cc:* Akhil Das; user@spark.apache.org > *Subject:*

Re: Visualizing the DAG of a Spark application

2015-03-13 Thread Todd Nist
There is the PR https://github.com/apache/spark/pull/2077 for doing this. On Fri, Mar 13, 2015 at 6:42 AM, t1ny wrote: > Hi all, > > We are looking for a tool that would let us visualize the DAG generated by > a > Spark application as a simple graph. > This graph would represent the Spark Job, i

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-03-16 Thread Todd Nist
Hi Bharath, I ran into the same issue a few days ago, here is a link to a post on Horton's fourm. http://hortonworks.com/community/forums/search/spark+1.2.1/ Incase anyone else needs to perform this these are the steps I took to get it to work with Spark 1.2.1 as well as Spark 1.3.0-RC3: 1. Pul

  1   2   >