I can add it, just send me the info once it's available.
On Sat, Oct 8, 2016 at 7:45 PM Karim, Md. Rezaul <
rezaul.ka...@insight-centre.org> wrote:
> Hi,
>
> I am writing a book on machine learning using Spark, which is going to be
> published soon.
>
> Could anyone tell me how to include the tit
Cloudera 5.8 has a very old version of Hive without Tez, but Mich provided
already a good alternative. However, you should check if it contains a recent
version of Hbase and Phoenix. That being said, I just wonder what is the
dataflow, data model and the analysis you plan to do. Maybe there are
Hi users i need to test the performance of the query in hive and spark. Can
any one convert these sql to spark sql. Here is the sql.
SELECT split(DTD.TRAN_RMKS,'/')[0] AS TRAB_RMK1,
split(DTD.TRAN_RMKS,'/')[1] AS ATM_ID,
DTD.ACID,
G.FORACID,
DTD.REF_NUM,
DTD.TRAN_ID,
DTD.TRAN_DATE,
DTD.VALUE_DATE
have you tried in sparkit should work as it is
On Sun, Oct 9, 2016 at 7:55 PM, Sree Eedupuganti wrote:
> Hi users i need to test the performance of the query in hive and spark.
> Can any one convert these sql to spark sql. Here is the sql.
>
>
> SELECT split(DTD.TRAN_RMKS,'/')[0] AS TRA
Hi Owen,
Thanks so much for the quick response. The book is already available online
as an Alpha. It would be great and appreciated if you could add the title
to the Spark website.
Here's the related information about the book:
*Title: *Large Scale Machine Learning with Spark
*Author:* Md. Rezau
please keep also in mind that Tableau Server has the capabilities to store
data in-memory and refresh only when needed the in-memory data. This means
you can import it from any source and let your users work only on the
in-memory data in Tableau Server.
On Sun, Oct 9, 2016 at 9:22 AM, Jörn Franke
Ayan is correct. In Spark < 2 you can do
val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
scala> var sqltext =
| """
| select count(1) from prices
| """
sqltext: String =
"
select count(1) from prices
"
scala> HiveContext.sql(sqltext).show
++
|count(1)|
+
Hi,
please, I would like to see my debug/error info logged in the spark web ui.
Problem is that in my current setup running master locally (and connecting
my pc as a worker node) I can see output in my console (my debug info) but
not in the stderr.
I have tried different setups, Logger, RootLogge
I'll probably add this behavior. It's a good balance between not having to
rely on another external system just for offset management and reducing
duplicates.
I was more worried about the underlying framework using the consumer in
parallel. Will watch out for concurrent mod exp.
BTW, the commitQue
Thanks for all the suggestions. It would seem you guys are right about the
Tableau side of things. The reports don’t need to be real-time, and they won’t
be directly feeding off of the main DMP HBase data. Instead, it’ll be batched
to Parquet or Kudu/Impala or even PostgreSQL.
I originally thou
That's cool, just be aware that all you're affecting is the time
between commits, not overall correctness.
Good call on the iterator not draining the queue, I'll fix that.
On Sun, Oct 9, 2016 at 12:22 PM, Srikanth wrote:
> I'll probably add this behavior. It's a good balance between not having t
the Spark-SQL Row trait has a schema that by default is null. when the
schema is null operations that rely on fieldIndex such as
getAs[T](fieldName: String): T do not work.
i noticed that when i convert a DataFrame to Rdd[Row] that the Row objects
do have schemas. can i rely on this?
when can i b
Hi Ishizaki san,
Thanks for the reply.
So, when I pre-cache the dataframe, the cache is being used during the job
execution.
Actually there are 3 events:
1. call res.collect
2. job started
3. job completed
I am concerning about the longer time taken between 1st and 2nd events. It
seems like the
I tried SpanBy but look like there is a strange error that happening no matter
which way I try. Like the one here described for Java solution.
http://qaoverflow.com/question/how-to-use-spanby-in-java/
java.lang.ClassCastException: cannot assign instance of
scala.collection.immutable.List$Serializ
You should probably check with DataStax who build the Cassandra connector
for Spark.
On Sun, Oct 9, 2016 at 8:13 PM, kant kodali wrote:
>
> I tried SpanBy but look like there is a strange error that happening no
> matter which way I try. Like the one here described for Java solution.
>
> http:/
Hi Reynold,
Actually, I did that a well before posting my question here.
Thanks,kant
On Sun, Oct 9, 2016 8:48 PM, Reynold Xin r...@databricks.com
wrote:
You should probably check with DataStax who build the Cassandra connector for
Spark.
On Sun, Oct 9, 2016 at 8:13 PM, kant kodali wrote:
A value of a row can be accessed through both generic access by ordinal,
which will incur boxing overhead for primitives, as well as native
primitive access. An example of generic access by ordinal:
import org.apache.spark.sql._
val row = Row(1, true, "a string", null)
// row: Row = [1,true,a
Hi all,
I tried to use the window function DataFrame API this weekend and found it
awkward to use, especially with respect to specifying frame boundaries. I
wrote down some options here and am curious your thoughts. If you have
suggestions on the API beyond what's already listed in the JIRA ticket
Hi Reynold
Thanks for asking. I am from sql world and use sparl sql with analytical
functions prety heavily.
IMHO, Window.rowsBetween() as a function name looks fine. What i would
propose would be:
Window.rowsBetween(startFrom=UNBOUNDED,endTo=CURRENT_ROW,preceeding=0,following=0)
startFrom, en
You can use Kryo. It also implements KryoSerializable which is supported by
Kryo.
On Fri, Oct 7, 2016 at 11:39 AM, Joey Echeverria wrote:
> Looking at the source code for StateMap[1], which is used by
> JavaPairDStream#mapWithState(), it looks like state keys are
> serialized using an ObjectOutp
Seems the runtime Spark is different from the compiled one. You should
mark the Spark components "provided". See
https://issues.apache.org/jira/browse/SPARK-9219
On Sun, Oct 9, 2016 at 8:13 PM, kant kodali wrote:
>
> I tried SpanBy but look like there is a strange error that happening no
> matt
21 matches
Mail list logo