Re: Appropriate Apache Users List Uses

2016-02-09 Thread Pierce Lamb
s this appropriate list use? Note: This was >> unsolicited. >> >> Thanks >> John >> >> >> >> From: Pierce Lamb >> 11:57 AM (1 hour ago) >> to me >> >> Hi John, >> >> I saw you on the Spark Mailing List and noticed you work

Re: How best we can store streaming data on dashboards for real time user experience?

2017-03-30 Thread Pierce Lamb
SnappyData should work well for what you want, it deeply integrates an in-memory database with Spark which supports ingesting streaming data and concurrently querying it from a dashboard. SnappyData currently has an integration with Apache Zeppelin (notebook visualization) and soon it will have one

Re: Apache Drill vs Spark SQL

2017-04-07 Thread Pierce Lamb
Hi Kant, If you are interested in using Spark alongside a database to serve real time queries, there are many options. Almost every popular database has built some sort of connector to Spark. I've listed a majority of them and tried to delineate them in some way in this StackOverflow answer: http

Re: Spark Streaming. Real-time save data and visualize on dashboard

2017-04-11 Thread Pierce Lamb
Hi, It is possible to use Mongo or Cassandra to persist results from Spark. In fact, a wide variety of data stores are available to use with Spark and many are aimed at serving queries for dashboard visualizations. I cannot comment on which work well with Grafana or Kabana, however, I've listed (w

Re: [Spark Streaming] Dynamic Broadcast Variable Update

2017-05-05 Thread Pierce Lamb
Hi Nipun, To expand a bit, you might find this stackoverflow answer useful: http://stackoverflow.com/a/39753976/3723346 Most spark + database combinations can handle a use case like this. Hope this helps, Pierce On Thu, May 4, 2017 at 9:18 AM, Gene Pang wrote: > As Tim pointed out, Alluxio

Re: "Sharing" dataframes...

2017-06-21 Thread Pierce Lamb
Hi Jean, Since many in this thread have mentioned datastores from what I would call the "Spark datastore ecosystem" I thought I would link you to a StackOverflow answer I posted awhile back that tried to capture the majority of this ecosystem. Most would claim to allow you to do something like you

Re: using Kudu with Spark

2017-07-24 Thread Pierce Lamb
Hi Mich, I tried to compile a list of datastores that connect to Spark and provide a bit of context. The list may help you in your research: https://stackoverflow.com/a/39753976/3723346 I'm going to add Kudu, Druid and Ampool from this thread. I'd like to point out SnappyData

Re: Update MySQL table via Spark/SparkR?

2017-08-22 Thread Pierce Lamb
Hi Jake, There is a another option within the 3rd party projects in the spark database ecosystem that have combined Spark with a DBMS in such a way that DataFrame API has been extended to include UPDATE operations

Re: Streaming Analytics/BI tool to connect Spark SQL

2017-12-07 Thread Pierce Lamb
Hi Umar, While this answer is a bit dated, you make find it useful in diagnosing a store for Spark SQL tables: https://stackoverflow.com/a/39753976/3723346 I don't know much about Pentaho or Arcadia, but I assume many of the listed options have a JDBC or ODBC client. Hope this helps, Pierce O

Re: [Arrow][Dremio]

2018-05-14 Thread Pierce Lamb
Hi Xavier, Along the lines of connecting to multiple sources of data and replacing ETL tools you may want to check out Confluent's blog on building a real-time streaming ETL pipeline on Kafka as well as SnappyDat

MLlib/kmeans newbie question(s)

2015-03-07 Thread Pierce Lamb
Hi all, I'm very new to machine learning algorithms and Spark. I'm follow the Twitter Streaming Language Classifier found here: http://databricks.gitbooks.io/databricks-spark-reference-applications/content/twitter_classifier/README.html Specifically this code: http://databricks.gitbooks.io/data

Help with updateStateByKey

2014-12-17 Thread Pierce Lamb
I am trying to run stateful Spark Streaming computations over (fake) apache web server logs read from Kafka. The goal is to "sessionize" the web traffic similar to this blog post: http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/

Re: Help with updateStateByKey

2014-12-18 Thread Pierce Lamb
weeks at a > time, processing up to 15M events per day with fluctuating traffic. > > Thanks, > Silvio > > > > On 12/17/14, 10:07 PM, "Pierce Lamb" > wrote: > > >I am trying to run stateful Spark Streaming computations over (fake) > >apache web server lo

Re: Help with updateStateByKey

2014-12-18 Thread Pierce Lamb
returns None. > > Instead, try: > > Some(currentValue.getOrElse(Seq.empty) ++ newValues) > > I think that should give you the expected result. > > > From: Pierce Lamb > Date: Thursday, December 18, 2014 at 2:31 PM > To: Silvio Fiorito > Cc: "user@spark.a