Hi Manoj,
Yes, you've already hit the point. I think timestamp type support in the
in-memory columnar support can be a good reference for you. Also, you may
want to enable compression support for decimal type by adding DECIMAL
column type to RunLengthEncoding.supports and DictionaryEncoding.suppor
Hi,
I tried quick and simple tests though, ISTM the vertices below were
correctly cached.
Could you give me the differences between my codes and yours?
import org.apache.spark.graphx._
import org.apache.spark.graphx.lib._
object Prog {
def processInt(d: Int) = d * 2
}
val g = GraphLoader.edge
Applying schema is a pretty low-level operation, and I would expect most
users would use the type safe interfaces. If you are unsure you can always
run:
import org.apache.spark.sql.execution.debug._
schemaRDD.typeCheck()
and it will tell you if you have made any mistakes.
Michael
On Sat, Feb 1
I think Xuefeng Wu's suggestion is likely correct. This different is more
likely explained by the compression library changing versions than sort vs
hash shuffle (which should not affect output size significantly). Others
have reported that switching to lz4 fixed their issue.
We should document th
Hi guys,
I deployed BlinkDB(built atop Shark) and running Spark 0.9.
I tried to run several TPCDS shark queries taken from
https://github.com/cloudera/impala-tpcds-kit/tree/master/queries-sql92-modified/queries/shark.
However, the following exceptions are encountered. Do you have any idea why
t
I have seen same behavior! I would love to hear an update on this...
Thanks,
Ami
On Thu, Feb 5, 2015 at 8:26 AM, Anubhav Srivastav <
anubhav.srivas...@gmail.com> wrote:
> Hi Kevin,
> We seem to be facing the same problem as well. Were you able to find
> anything after that? The ticket does not
I'm using Spark 1.1.0 and find that *ImmutableBytesWritable* can be
serialized by Kryo but *Array[ImmutableBytesWritable] *can't be serialized
even when I registered both of them in Kryo.
The code is as follows:
val conf = new SparkConf()
.setAppName("Hello Spark")
That sound right to me. Cheng could elaborate if you are missing something.
On Fri, Feb 13, 2015 at 11:36 AM, Manoj Samel
wrote:
> Thanks Michael for the pointer & Sorry for the delayed reply.
>
> Taking a quick inventory of scope of change - Is the column type for
> Decimal caching needed only
Hi there,
Is there a way to specify the AWS AMI with particular OS (say Ubuntu) when
launching Spark on Amazon cloud with provided scripts?
What is the default AMI, operating system that is launched by EC-2 script?
Thanks
--
View this message in context:
http://apache-spark-user-list.1001560
Are you using the SQLContext? I think the HiveContext is recommended.
Cheng Hao
From: Wush Wu [mailto:w...@bridgewell.com]
Sent: Thursday, February 12, 2015 2:24 PM
To: u...@spark.incubator.apache.org
Subject: Extract hour from Timestamp in Spark SQL
Dear all,
I am new to Spark SQL and have no
Hi,
You can use -a or --ami to launch the cluster using specific
ami.
If I remember well, the default system is Amazon Linux.
Hope it will help
Cheers
Gen
On Sun, Feb 15, 2015 at 6:20 AM, olegshirokikh wrote:
> Hi there,
>
> Is there a way to specify the AWS AMI with particular OS (say Ubun
I was looking at https://github.com/twitter/chill
It seems this would achieve what you want:
chill-scala/src/main/scala/com/twitter/chill/WrappedArraySerializer.scala
Cheers
On Sat, Feb 14, 2015 at 6:36 PM, Tao Xiao wrote:
> I'm using Spark 1.1.0 and find that *ImmutableBytesWritable* can be
>
Hi,
My spark cluster contains machines like Pentium-4, dual core and quad-core
machines. I am trying to run a character frequency count application. The
application contains several threads, each submitting a job(action) that
counts the frequency of a single character. But, my problem is, I get
dif
hello,
I am a newbie to spark and trying to figure out how to get percentile against a
big data set. Actually, I googled this topic but not find any very useful code
example and explanation. Seems that I can use transformer SortBykey to get my
data set in order, but not pretty sure how can I ge
Dear Spark User List,
I'm fairly new to Spark, trying to use it for multi-dimensional clustering
(using the k-means clustering from MLib). However, based on the examples
the clustering seems to work only for a single dimension (KMeans.train()
accepts an RDD[Vector], which is a vector of doubles -
Clustering operates on a large number of n-dimensional vectors. That
seems to be what you are describing, and that is what the MLlib API
accepts. What are you expecting that you don't find?
Did you have a look at the KMeansModel that this method returns? it
has a "clusterCenters" method that gives
Hi,
HCatalog allows you to specify the pattern of paths for partitions, which
will be used by dynamic partition loading.
https://cwiki.apache.org/confluence/display/Hive/HCatalog+DynamicPartitions#HCatalogDynamicPartitions-ExternalTables
Can we have similar feature in SparkSQL?
Jira is here: h
I'd suggest you updating your spark to the latest version and try SparkSQL
instead of Shark.
Thanks
Best Regards
On Sun, Feb 15, 2015 at 7:36 AM, Grandl Robert
wrote:
> Hi guys,
>
> I deployed BlinkDB(built atop Shark) and running Spark 0.9.
>
> I tried to run several TPCDS shark queries taken
Thanks Enno, let me have a look at Stream Parser version of Jackson.
Thanks
Best Regards
On Sat, Feb 14, 2015 at 9:30 PM, Enno Shioji wrote:
> Huh, that would come to 6.5ms per one JSON. That does feel like a lot but
> if your JSON file is big enough, I guess you could get that sort of
> proces
Hi,
If I have a table in Hive metastore saved as Parquet, and I want to use it
in Spark. It seems Spark will use Hive's Parquet serde to load the actual
data.
So is there any difference here? Will predicate pushdown, pruning and
future Parquet optimizations in SparkSQL work for using Hive serde?
Hi,
I am running bigger ALS on spark 1.2.0 on yarn (cdh 5.3.0) - ALS is using about
3 billions of ratings and I am doing several trainImplicit() runs in loop
within one spark session. I have four node cluster with 3TB disk space on each.
before starting the job there is less then 8% of the disk
Hi Sean,
Thanks for the quick answer. I have not realized that I can make an
RDD[Vector] with eg.
val dataSet = sparkContext.makeRDD(List(Vectors.dense(10.0,20.0),
Vectors.dense(20.0,30.0)))
Using this KMeans.train works as it should.
So my bad. Thanks again!
Attila
2015-02-15 17:29 GMT+01:00
What does your hive-site.xml look like? Do you actually have a directory
at the location shown in the error? i.e does "/user/hive/warehouse/src"
exist? You should be able to override this by specifying the following:
--hiveconf
hive.metastore.warehouse.dir=/location/where/your/warehouse/exists
Thanks for reply, Akhil. I cannot update the spark version and run SparkSQL due
to some old dependencies and a specific project I want to run.
I was wondering if you have any clue, why that exception might be triggered, or
if you saw it before.
Thanks,Robert
On Sunday, February 15, 20
spark.cleaner.ttl ?
On Sunday, 15 February 2015, 18:23, Antony Mayi
wrote:
Hi,
I am running bigger ALS on spark 1.2.0 on yarn (cdh 5.3.0) - ALS is using about
3 billions of ratings and I am doing several trainImplicit() runs in loop
within one spark session. I have four node clus
We want to monitor spark master and spark slaves using monit but we want to
use the sbin scripts to do so. The scripts create the spark master and
salve processes independent from themselves so monit would not know the
started processed pid to watch. Is this correct? Should we watch the ports?
How
Hi,
I am new to spark and planning on writing a machine learning application
with Spark mllib. My dataset is in json format. Is it possible to load data
into spark without using any external json libraries? I have explored the
option of SparkSql but I believe that is only for interactive use or
lo
Hi,
In fact, you can use sqlCtx.jsonFile() which loads a text file storing one
JSON object per line as a SchemaRDD.
Or you can use sc.textFile() to load the textFile to RDD and then use
sqlCtx.jsonRDD() which loads an RDD storing one JSON object per string as a
SchemaRDD.
Hope it could help
Cheer
It works now using 1.2.1. Thanks for all the help. Spark rocks !!
-
Thanks,
Roy
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Re-HiveContext-created-SchemaRDD-s-saveAsTable-is-not-working-on-1-2-0-tp21442p21664.html
Sent from the Apache Spark User List
I used the latest assembly jar and the below as suggested by Akhil to fix
this problem...
temp.saveAsHadoopFiles("DailyCSV",".txt", String.class, String.class,
*(Class)* TextOutputFormat.class);
Thanks All for the help !
On Wed, Feb 11, 2015 at 1:38 PM, Sean Owen wrote:
> That kinda dodges the
Hi,
I am sometimes getting WARN from running Similarity calculation:
15/02/15 23:07:55 WARN BlockManagerMasterActor: Removing BlockManager
BlockManagerId(7, abc.com, 48419, 0) with no recent heart beats: 66435ms
exceeds 45000ms
Do I need to increase the default 45 s to larger values for cases wh
31 matches
Mail list logo