>
> But it's really weird to be setting SPARK_HOME in the environment of
> your node managers. YARN shouldn't need to know about that.
> On Thu, Oct 4, 2018 at 10:22 AM Jianshi Huang
> wrote:
> >
> >
> https://github.com/apache/spark/blob/88e7e87bd5c052e10f52d4
so it does not get
> expanded by the shell).
>
> But it's really weird to be setting SPARK_HOME in the environment of
> your node managers. YARN shouldn't need to know about that.
> On Thu, Oct 4, 2018 at 10:22 AM Jianshi Huang
> wrote:
> >
> >
> https://github.c
Yes, that's right.
On Fri, Oct 5, 2018 at 3:35 AM Gourav Sengupta
wrote:
> Hi Marcelo,
> it will be great if you illustrate what you mean, I will be interested to
> know.
>
> Hi Jianshi,
> so just to be sure you want to work on SPARK 2.3 while having SPARK 2.1
>
-installed 2.2.1 path.
I don't want to make any changes to worker node configuration, so any way
to override the order?
Jianshi
On Fri, Oct 5, 2018 at 12:11 AM Marcelo Vanzin wrote:
> Normally the version of Spark installed on the cluster does not
> matter, since Spark is uploade
'2g')
> ,('spark.executor.cores', '4')
> ,('spark.executor.instances', '2')
> ,('spark.executor.memory', '4g')
> ,('spark.network.timeout', '800')
> ,('spark.scheduler.mode
01:18:00.0,user_test,11)
(2016-10-19 01:17:00.0,2016-10-19 01:19:00.0,user_test,11)
(2016-10-19 01:18:00.0,2016-10-19 01:20:00.0,user_test,11)
(2016-10-19 01:19:00.0,2016-10-19 01:21:00.0,user_test,11)
(2016-10-19 01:20:00.0,2016-10-19 01:22:00.0,user_test,11)
Does anyone have an idea of the
Hmm...all files under the event log folder has permission 770 but strangely
my account cannot read other user's files. Permission denied.
I'll sort it out with our Hadoop admin. Thanks for then help!
Jianshi
On Thu, May 28, 2015 at 12:13 AM, Marcelo Vanzin
wrote:
> Then:
>
BTW, is there an option to set file permission for spark event logs?
Jianshi
On Thu, May 28, 2015 at 11:25 AM, Jianshi Huang
wrote:
> Hmm...all files under the event log folder has permission 770 but
> strangely my account cannot read other user's files. Permission denied.
>
>
Yes, all written to the same directory on HDFS.
Jianshi
On Wed, May 27, 2015 at 11:57 PM, Marcelo Vanzin
wrote:
> You may be the only one not seeing all the logs. Are you sure all the
> users are writing to the same log directory? The HS can only read from a
> single log directory.
&
No one using History server? :)
Am I the only one need to see all user's logs?
Jianshi
On Thu, May 21, 2015 at 1:29 PM, Jianshi Huang
wrote:
> Hi,
>
> I'm using Spark 1.4.0-rc1 and I'm using default settings for history
> server.
>
> But I can only see my own
Hi,
I'm using Spark 1.4.0-rc1 and I'm using default settings for history server.
But I can only see my own logs. Is it possible to view all user's logs? The
permission is fine for the user group.
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
t;= 2014-04-30))
PhysicalRDD [meta#143,nvar#145,date#147], MapPartitionsRDD[6] at
explain at :32
Jianshi
On Tue, May 12, 2015 at 10:34 PM, Olivier Girardot
wrote:
> can you post the explain too ?
>
> Le mar. 12 mai 2015 à 12:11, Jianshi Huang a
> écrit :
>
>> Hi,
s like https://issues.apache.org/jira/browse/SPARK-5446 is still open,
when can we have it fixed? :)
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
I'm using the default settings.
Jianshi
On Wed, May 6, 2015 at 7:05 PM, twinkle sachdeva wrote:
> Hi,
>
> Can you please share your compression etc settings, which you are using.
>
> Thanks,
> Twinkle
>
> On Wed, May 6, 2015 at 4:15 PM, Jianshi Huang
> wrot
I'm facing this error in Spark 1.3.1
https://issues.apache.org/jira/browse/SPARK-4105
Anyone knows what's the workaround? Change the compression codec for
shuffle output?
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
:BINARY GZIP DO:0 FPO:265489723 SZ:240588/461522/1.92 VC:62173
ENC:PLAIN_DICTIONARY,RLE
Jianshi
On Mon, Apr 27, 2015 at 12:40 PM, Cheng Lian wrote:
> Had an offline discussion with Jianshi, the dataset was generated by Pig.
>
> Jianshi - Could you please attach the output of "
Hi Huai,
I'm using Spark 1.3.1.
You're right. The dataset is not generated by Spark. It's generated by Pig
using Parquet 1.6.0rc7 jars.
Let me see if I can send a testing dataset to you...
Jianshi
On Sat, Apr 25, 2015 at 2:22 AM, Yin Huai wrote:
> oh, I missed that. It
at
parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:126)
at
parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:193)
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
Oh, I found it out. Need to import sql.functions._
Then I can do
table.select(lit("2015-04-22").as("date"))
Jianshi
On Wed, Apr 22, 2015 at 7:27 PM, Jianshi Huang
wrote:
> Hi,
>
> I want to do this in Spark SQL DSL:
>
> select '2015-04-22
Hi,
I want to do this in Spark SQL DSL:
select '2015-04-22' as date
from table
How to do this?
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
Hi,
I want to write this in Spark SQL DSL:
select map('c1', c1, 'c2', c2) as m
from table
Is there a way?
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
Thanks everyone for the reply.
Looks like foreachRDD + filtering is the way to go. I'll have 4 independent
Spark streaming applications so the overhead seems acceptable.
Jianshi
On Fri, Apr 17, 2015 at 5:17 PM, Evo Eftimov wrote:
> Good use of analogies J
>
>
>
> Yep fri
ne DStream
-> multiple DStreams)
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
Hi,
Anyone has similar request?
https://issues.apache.org/jira/browse/SPARK-6561
When we save a DataFrame into Parquet files, we also want to have it
partitioned.
The proposed API looks like this:
def saveAsParquet(path: String, partitionColumns: Seq[String])
--
Jianshi Huang
LinkedIn
Oh, by default it's set to 0L.
I'll try setting it to 3 immediately. Thanks for the help!
Jianshi
On Mon, Mar 16, 2015 at 11:32 PM, Jianshi Huang
wrote:
> Thanks Shixiong!
>
> Very strange that our tasks were retried on the same executor again and
Thanks Shixiong!
Very strange that our tasks were retried on the same executor again and
again. I'll check spark.scheduler.executorTaskBlacklistTime.
Jianshi
On Mon, Mar 16, 2015 at 6:02 PM, Shixiong Zhu wrote:
> There are 2 cases for "No space left on device":
>
>
I created a JIRA: https://issues.apache.org/jira/browse/SPARK-6353
On Mon, Mar 16, 2015 at 5:36 PM, Jianshi Huang
wrote:
> Hi,
>
> We're facing "No space left on device" errors lately from time to time.
> The job will fail after retries. Obvious in such case, retry w
he problematic datanode before retrying it.
And maybe dynamically allocate another datanode if dynamic allocation is
enabled.
I think there needs to be a class of fatal errors that can't be recovered
with retries. And it's best Spark can handle it nicely.
Thanks,
--
Jianshi Huang
LinkedIn:
Liancheng also found out that the Spark jars are not included in the
classpath of URLClassLoader.
Hmm... we're very close to the truth now.
Jianshi
On Fri, Mar 13, 2015 at 6:03 PM, Jianshi Huang
wrote:
> I'm almost certain the problem is the ClassLoader.
>
> So adding
I'm almost certain the problem is the ClassLoader.
So adding
fork := true
solves problems for test and run.
The problem is how can I fork a JVM for sbt console? fork in console :=
true seems not working...
Jianshi
On Fri, Mar 13, 2015 at 4:35 PM, Jianshi Huang
wrote:
> I gues
I guess it's a ClassLoader issue. But I have no idea how to debug it. Any
hints?
Jianshi
On Fri, Mar 13, 2015 at 3:00 PM, Eric Charles wrote:
> i have the same issue running spark sql code from eclipse workspace. If
> you run your code from the command line (with a packaged j
Forget about my last message. I was confused. Spark 1.2.1 + Scala 2.10.4
started by SBT console command also failed with this error. However running
from a standard spark shell works.
Jianshi
On Fri, Mar 13, 2015 at 2:46 PM, Jianshi Huang
wrote:
> Hmm... look like the console command st
Hmm... look like the console command still starts a Spark 1.3.0 with Scala
2.11.6 even I changed them in build.sbt.
So the test with 1.2.1 is not valid.
Jianshi
On Fri, Mar 13, 2015 at 2:34 PM, Jianshi Huang
wrote:
> I've confirmed it only failed in console started by SBT.
>
>
intln("Created spark context as sc.")
[info]
[info] def time[T](f: => T): T = {
[info] import System.{currentTimeMillis => now}
[info] val start = now
[info] try { f } finally { println("Elapsed: " + (now - start)/1000.0 + "
s") }
[info] }
[info]
[info]
BTW, I was running tests from SBT when I get the errors. One test turn a
Seq of case class to DataFrame.
I also tried run similar code in the console, but failed with same error.
I tested both Spark 1.3.0-rc2 and 1.2.1 with Scala 2.11.6 and 2.10.4
Any idea?
Jianshi
On Fri, Mar 13, 2015 at 2
Same issue here. But the classloader in my exception is somehow different.
scala.ScalaReflectionException: class
org.apache.spark.sql.catalyst.ScalaReflection in JavaMirror with
java.net.URLClassLoader@53298398 of type class java.net.URLClassLoader with
classpath
Jianshi
On Sun, Mar 1, 2015 at
Thanks Sean. I'll ask our Hadoop admin.
Actually I didn't find hadoop.tmp.dir in the Hadoop settings...using user
home is suggested by other users.
Jianshi
On Wed, Mar 11, 2015 at 3:51 PM, Sean Owen wrote:
> You shouldn't use /tmp, but it doesn't mean you should use
Unfortunately /tmp mount is really small in our environment. I need to
provide a per-user setting as the default value.
I hacked bin/spark-class for the similar effect. And spark-defaults.conf
can override it. :)
Jianshi
On Wed, Mar 11, 2015 at 3:28 PM, Patrick Wendell wrote:
> We do
Hi,
I need to set per-user spark.local.dir, how can I do that?
I tried both
/x/home/${user.name}/spark/tmp
and
/x/home/${USER}/spark/tmp
And neither worked. Looks like it has to be a constant setting in
spark-defaults.conf. Right?
Any ideas how to do that?
Thanks,
--
Jianshi Huang
Thanks. I was about to submit a ticket for this :)
Also there's a ticket for sort-merge based groupbykey
https://issues.apache.org/jira/browse/SPARK-3461
BTW, any idea why run with netty didn't output OOM error messages? It's
very confusing in troubleshooting.
Jianshi
On Thu, M
There're some skew.
6461640SUCCESSPROCESS_LOCAL200 / 2015/03/04 23:45:471.1 min6 s198.6 MB21.1
GB240.8 MB5961590SUCCESSPROCESS_LOCAL30 / 2015/03/04 23:45:4744 s5 s200.7
MB4.8 GB154.0 MB
But I expect this kind of skewness to be quite common.
Jianshi
On Thu, Mar 5, 2015 at 3:
I see. I'm using core's join. The data might have some skewness (checking).
I understand shuffle can spill data to disk but when consuming it, say in
cogroup or groupByKey, it still needs to read the whole group elements,
right? I guess OOM happened there when reading very large groups
27;s still open status. Does that mean now
groupByKey/cogroup/join(implemented as cogroup + flatmap) will still read
the group as a whole during consuming?
How can I deal with the key skewness in joins? Is there a skew-join
implementation?
Jianshi
On Thu, Mar 5, 2015 at 2:44 PM, Shao, Saisai
One really interesting is that when I'm using the
netty-based spark.shuffle.blockTransferService, there's no OOM error
messages (java.lang.OutOfMemoryError: Java heap space).
Any idea why it's not here?
I'm using Spark 1.2.1.
Jianshi
On Thu, Mar 5, 2015 at 1:56 PM, Jiansh
Checkpoint(RDD.scala:280)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
Is join/cogroup still memory bound?
Jianshi
On Wed, Mar 4, 2015
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
And I checked executor on container host-, everything is good.
Jianshi
On Wed, Mar 4, 2015 at 12:28 PM, Aaron
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at java.lang.Thread.run(Thread.java:724)
Jianshi
On Wed, Mar 4, 2015 at 3:25 AM, Aaron Davidson wrote:
> "Failed to connect" implies that the executor at that host died, please
> check
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
Jianshi
On Wed, Mar 4, 2015 at 2:55 AM, Jianshi Huang
wrote:
> Hi,
>
> I got this error message:
>
&
SNAPSHOT I built around Dec. 20. Is there any
bug fixes related to shuffle block fetching or index files after that?
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
serde?
Loading tables using parquetFile vs. loading tables from Hive metastore
with Parquet serde
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
: https://issues.apache.org/jira/browse/SPARK-5828
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
Get it. Thanks Reynold and Andrew!
Jianshi
On Thu, Feb 12, 2015 at 12:25 AM, Andrew Or wrote:
> Hi Jianshi,
>
> For YARN, there may be an issue with how a recently patch changes the
> accessibility of the shuffle files by the external shuffle service:
> https://issues.apache.
, 1.3.0)
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
Hi,
Anyone has implemented the default Pig Loader in Spark? (loading delimited
text files with .pig_schema)
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
Ah, thx Ted and Yin!
I'll build a new version. :)
Jianshi
On Wed, Jan 14, 2015 at 7:24 AM, Yin Huai wrote:
> Yeah, it's a bug. It has been fixed by
> https://issues.apache.org/jira/browse/SPARK-3891 in master.
>
> On Tue, Jan 13, 2015 at 2:41 PM, Ted Yu wrote:
>
>
org.apache.spark.sql.catalyst.plans.logical.Aggregate$$anonfun$output$6.apply(basicOperators.scala:143)
I'm using latest branch-1.2
I found in PR that percentile and percentile_approx are supported. A bug?
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
FYI,
Latest hive 0.14/parquet will have column renaming support.
Jianshi
On Wed, Dec 10, 2014 at 3:37 AM, Michael Armbrust
wrote:
> You might also try out the recently added support for views.
>
> On Mon, Dec 8, 2014 at 9:31 PM, Jianshi Huang
> wrote:
>
>> Ah... I see. T
Ah... I see. Thanks for pointing it out.
Then it means we cannot mount external table using customized column names.
hmm...
Then the only option left is to use a subquery to add a bunch of column
alias. I'll try it later.
Thanks,
Jianshi
On Tue, Dec 9, 2014 at 3:34 AM, Michael Armbrust
Hi Huai,
Exactly, I'll probably implement one using the new data source API when I
have time... I've found the utility functions in JsonRDD.
Jianshi
On Tue, Dec 9, 2014 at 3:41 AM, Yin Huai wrote:
> Hello Jianshi,
>
> You meant you want to convert a Map to a Struct, ri
I checked the source code for inferSchema. Looks like this is exactly what
I want:
val allKeys = rdd.map(allKeysWithValueTypes).reduce(_ ++ _)
Then I can do createSchema(allKeys).
Jianshi
On Sun, Dec 7, 2014 at 2:50 PM, Jianshi Huang
wrote:
> Hmm..
>
> I've created
Hmm..
I've created a JIRA: https://issues.apache.org/jira/browse/SPARK-4782
Jianshi
On Sun, Dec 7, 2014 at 2:32 PM, Jianshi Huang
wrote:
> Hi,
>
> What's the best way to convert RDD[Map[String, Any]] to a SchemaRDD?
>
> I'm currently converting ea
Hi,
What's the best way to convert RDD[Map[String, Any]] to a SchemaRDD?
I'm currently converting each Map to a JSON String and do
JsonRDD.inferSchema.
How about adding inferSchema support to Map[String, Any] directly? It would
be very useful.
Thanks,
--
Jianshi Huang
LinkedI
a> sql("select cre_ts from pmt limit 1").collect
res16: Array[org.apache.spark.sql.Row] = Array([null])
I created a JIRA for it:
https://issues.apache.org/jira/browse/SPARK-4781
Jianshi
On Sun, Dec 7, 2014 at 1:06 AM, Jianshi Huang
wrote:
> Hmm... another issue I found
Hmm... another issue I found doing this approach is that ANALYZE TABLE ...
COMPUTE STATISTICS will fail to attach the metadata to the table, and later
broadcast join and such will fail...
Any idea how to fix this issue?
Jianshi
On Sat, Dec 6, 2014 at 9:10 PM, Jianshi Huang
wrote:
> V
Very interesting, the line doing drop table will throws an exception. After
removing it all works.
Jianshi
On Sat, Dec 6, 2014 at 9:11 AM, Jianshi Huang
wrote:
> Here's the solution I got after talking with Liancheng:
>
> 1) using backquote `..` to wrap up all illegal characte
o drop and register table
val t = table(name)
val newSchema = StructType(t.schema.fields.map(s => s.copy(name =
s.name.replaceAll(".*?::", ""
sql(s"drop table $name")
applySchema(t, newSchema).registerTempTable(name)
I'm testing it for now.
Tha
e external table pmt (
sorted::id bigint
)
stored as parquet
location '...'
Obviously it didn't work, I also tried removing the identifier sorted::,
but the resulting rows contain only nulls.
Any idea how to create a table in HiveContext from these Parquet files?
Thanks,
xception in the logs, but that exception does not propogate to user code.
>>
>> On Thu, Dec 4, 2014 at 11:31 PM, Jianshi Huang
>> wrote:
>>
>> > Hi,
>> >
>> > I got exception saying Hive: NoSuchObjectException(message: table
>> > not found)
With Liancheng's suggestion, I've tried setting
spark.sql.hive.convertMetastoreParquet false
but still analyze noscan return -1 in rawDataSize
Jianshi
On Fri, Dec 5, 2014 at 3:33 PM, Jianshi Huang
wrote:
> If I run ANALYZE without NOSCAN, then Hive can successfully
If I run ANALYZE without NOSCAN, then Hive can successfully get the size:
parameters:{numFiles=0, EXTERNAL=TRUE, transient_lastDdlTime=1417764589,
COLUMN_STATS_ACCURATE=true, totalSize=0, numRows=1156, rawDataSize=76296}
Is Hive's PARQUET support broken?
Jianshi
On Fri, Dec 5, 2014 at 3:
Hi,
I got exception saying Hive: NoSuchObjectException(message: table
not found)
when running "DROP TABLE IF EXISTS "
Looks like a new regression in Hive module.
Anyone can confirm this?
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
.g.
CREATE EXTERNAL TABLE table1 (
code int,
desc string
)
STORED AS PARQUET
LOCATION '/user/jianshuang/data/dim_tables/table1.parquet'
Anyone knows what went wrong?
Thanks,
Jianshi
On Fri, Nov 28, 2014 at 1:24 PM, Cheng, Hao wrote:
> Hi Jianshi,
>
> I couldn’t repr
I created a ticket for this:
https://issues.apache.org/jira/browse/SPARK-4757
Jianshi
On Fri, Dec 5, 2014 at 1:31 PM, Jianshi Huang
wrote:
> Correction:
>
> According to Liancheng, this hotfix might be the root cause:
>
>
> https://github.com/a
Correction:
According to Liancheng, this hotfix might be the root cause:
https://github.com/apache/spark/commit/38cb2c3a36a5c9ead4494cbc3dde008c2f0698ce
Jianshi
On Fri, Dec 5, 2014 at 12:45 PM, Jianshi Huang
wrote:
> Looks like the datanucleus*.jar shouldn't appear in the hdfs
Looks like the datanucleus*.jar shouldn't appear in the hdfs path in
Yarn-client mode.
Maybe this patch broke yarn-client.
https://github.com/apache/spark/commit/a975dc32799bb8a14f9e1c76defaaa7cfbaf8b53
Jianshi
On Fri, Dec 5, 2014 at 12:02 PM, Jianshi Huang
wrote:
> Act
Actually my HADOOP_CLASSPATH has already been set to include
/etc/hadoop/conf/*
export
HADOOP_CLASSPATH=/etc/hbase/conf/hbase-site.xml:/usr/lib/hbase/lib/hbase-protocol.jar:$(hbase
classpath)
Jianshi
On Fri, Dec 5, 2014 at 11:54 AM, Jianshi Huang
wrote:
> Looks like somehow Spark failed
SPATH?
Jianshi
On Fri, Dec 5, 2014 at 11:37 AM, Jianshi Huang
wrote:
> I got the following error during Spark startup (Yarn-client mode):
>
> 14/12/04 19:33:58 INFO Client: Uploading resource
> file:/x/home/jianshuang/spark/spark-latest/lib/datanucleus-api-jdo-3.2.6.jar
> -&g
ter HEAD yesterday. Is this a bug?
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
Hi Hao,
I'm using inner join as Broadcast join didn't work for left joins (thanks
for the links for the latest improvements).
And I'm using HiveConext and it worked in a previous build (10/12) when
joining 15 dimension tables.
Jianshi
On Thu, Nov 27, 2014 at 8:35 AM, Cheng, Hao
se has met similar situation?
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
/usr/lib/hive/lib doesn’t show any of the parquet
jars, but ls /usr/lib/impala/lib shows the jar we’re looking for as
parquet-hive-1.0.jar
Is it removed from latest Spark?
Jianshi
On Wed, Nov 26, 2014 at 2:13 PM, Jianshi Huang
wrote:
> Hi,
>
> Looks like the latest SparkSQL with Hive 0
)
at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
Using the same DDL and Analyze script above.
Jianshi
On Sat, Oct 11, 2014 at 2:18 PM, Jianshi Huang
wrote:
> It works fine, thanks for the help Michael.
>
> Liancheng also told m
Ah yes. I found it too in the manual. Thanks for the help anyway!
Since BigDecimal is just a wrapper around BigInt, let's also convert to
BigInt to Decimal.
I created a ticket. https://issues.apache.org/jira/browse/SPARK-4549
Jianshi
On Fri, Nov 21, 2014 at 11:30 PM, Yin Huai wrote:
&g
Hi,
I got an error during rdd.registerTempTable(...) saying scala.MatchError:
scala.BigInt
Looks like BigInt cannot be used in SchemaRDD, is that correct?
So what would you recommend to deal with it?
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog:
ption saying that SparkContext is not serializable,
which is totally irrelevant to txnSentTo
I heard in Scala 2.11, there will be much better support in REPL to solve
this issue. Is that true?
Could anyone explain why we're having this problem?
Thanks,
--
Jianshi Huang
LinkedIn: jians
Ok, I'll wait until -Pscala-2.11 is more stable and used by more people.
Thanks for the help!
Jianshi
On Tue, Nov 18, 2014 at 3:49 PM, Ye Xianjin wrote:
> Hi Prashant Sharma,
>
> It's not even ok to build with scala-2.11 profile on my machine.
>
>
Any notable issues for using Scala 2.11? Is it stable now?
Or can I use Scala 2.11 in my spark application and use Spark dist build
with 2.10 ?
I'm looking forward to migrate to 2.11 for some quasiquote features.
Couldn't make it run in 2.10...
Cheers,
--
Jianshi Huang
LinkedI
I see. Agree that lazy eval is not suitable for proper setup and teardown.
We also abandoned it due to inherent incompatibility between implicit and
lazy. It was fun to come up this trick though.
Jianshi
On Tue, Nov 18, 2014 at 10:28 AM, Tobias Pfeiffer wrote:
> Hi,
>
> On Fri, Nov
wrap: scala.r
eflect.internal.MissingRequirementError: object scala.runtime in compiler
mirror not found. -> [Help 1]
Anyone knows what's the problem?
I'm building it on OSX. I didn't had this problem one month ago.
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshu
Ok, then we need another trick.
let's have an *implicit lazy var connection/context* around our code. And
setup() will trigger the eval and initialization.
The implicit lazy val/var trick is actually invented by Kevin. :)
Jianshi
On Fri, Nov 14, 2014 at 1:41 PM, Cheng Lian wrote:
So can I write it like this?
rdd.mapPartition(i => setup(); i).map(...).mapPartition(i => cleanup(); i)
So I don't need to mess up the logic and still can use map, filter and
other transformations for RDD.
Jianshi
On Fri, Nov 14, 2014 at 12:20 PM, Cheng Lian wrote:
> If you’
Where do you want the setup and cleanup functions to run? Driver or the
worker nodes?
Jianshi
On Fri, Nov 14, 2014 at 10:44 AM, Dai, Kevin wrote:
> HI, all
>
>
>
> Is there setup and cleanup function as in hadoop mapreduce in spark which
> does some initializatio
needs to be
collect to driver, is there a way to avoid doing this?
Thanks
Jianshi
On Mon, Oct 27, 2014 at 4:57 PM, Jianshi Huang
wrote:
> Sure, let's still focus on the streaming simulation use case. It's a very
> useful problem to solve.
>
> If we're going to use th
Hi Srinivas,
Here's the versions I'm using.
1.2.0-SNAPSHOT
1.3.2
1.3.0
org.spark-project.akka
2.3.4-spark
I'm using Spark built from master. so it's 1.2.0-SNAPSHOT.
Jianshi
On Tue, Nov 11, 2014 at 4:06 AM, Srinivas Chamart
Hi Preshant, Chester, Mohammed,
I switched to Spark's Akka and now it works well. Thanks for the help!
(Need to exclude Akka from Spray dependencies, or specify it as provided)
Jianshi
On Thu, Oct 30, 2014 at 3:17 AM, Mohammed Guller
wrote:
> I am not sure about that.
>
>
&
I'm using Spark built from HEAD, I think it uses modified Akka 2.3.4, right?
Jianshi
On Wed, Oct 29, 2014 at 5:53 AM, Mohammed Guller
wrote:
> Try a version built with Akka 2.2.x
>
>
>
> Mohammed
>
>
>
> *From:* Jianshi Huang [mailto:jianshi.hu...@gmail.co
org.spark-project.akka
2.3.4-spark
it should solve problem. Makes sense? I'll give it a shot when I have time,
now probably I'll just not using Spray client...
Cheers,
Jianshi
On Tue, Oct 28, 2014 at 6:02 PM, Jianshi Huang
wrote:
> Hi,
>
> I got the following exc
e exception.
Anyone has idea what went wrong? Need help!
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
Thanks Ted and Cheng for the in memory derby solution. I'll check it out. :)
And to me, using in-mem by default makes sense, if user wants a shared
metastore, it needs to be specified. An 'embedded' local metastore in the
working directory barely has a use case.
Jianshi
On Mo
Any suggestion? :)
Jianshi
On Thu, Oct 23, 2014 at 3:49 PM, Jianshi Huang
wrote:
> The Kafka stream has 10 topics and the data rate is quite high (~ 100K/s
> per topic).
>
> Which configuration do you recommend?
> - 1 Spark app consuming all Kafka topics
> - 10 separ
1 - 100 of 158 matches
Mail list logo