Hi,
I'm using Spark 1.4.0-rc1 and I'm using default settings for history server.
But I can only see my own logs. Is it possible to view all user's logs? The
permission is fine for the user group.
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
No one using History server? :)
Am I the only one need to see all user's logs?
Jianshi
On Thu, May 21, 2015 at 1:29 PM, Jianshi Huang
wrote:
> Hi,
>
> I'm using Spark 1.4.0-rc1 and I'm using default settings for history
> server.
>
> But I can only see my own
gt;
> On Wed, May 27, 2015 at 5:33 AM, Jianshi Huang
> wrote:
>
>> No one using History server? :)
>>
>> Am I the only one need to see all user's logs?
>>
>> Jianshi
>>
>> On Thu, May 21, 2015 at 1:29 PM, Jianshi Huang
>> wrote:
>&
BTW, is there an option to set file permission for spark event logs?
Jianshi
On Thu, May 28, 2015 at 11:25 AM, Jianshi Huang
wrote:
> Hmm...all files under the event log folder has permission 770 but
> strangely my account cannot read other user's files. Permission denied.
>
>
- Are all files readable by the user running the history server?
> - Did all applications call sc.stop() correctly (i.e. files do not have
> the ".inprogress" suffix)?
>
> Other than that, always look at the logs first, looking for any errors
> that may be thrown.
>
>
&
27;, 'FAIR')
> ,('spark.shuffle.service.enabled', 'true')
> ,('spark.dynamicAllocation.enabled', 'true')
> ])
> py_files =
> ['hdfs://emr-header-1.cluster-68492:9000/lib/py4j-0.10.7-src.zip']
> sc = pyspark.SparkContext(appName="Jianshi", master="yarn-client",
> conf=sparkConf, pyFiles=py_files)
>
>
Thanks,
--
Jianshi Huang
d from your gateway machine to YARN by
> default.
>
> You probably have some configuration (in spark-defaults.conf) that
> tells YARN to use a cached copy. Get rid of that configuration, and
> you can use whatever version you like.
> On Thu, Oct 4, 2018 at 2:19 AM Jianshi Huang
> wro
d to be setting SPARK_HOME in the environment of
>> your node managers. YARN shouldn't need to know about that.
>> On Thu, Oct 4, 2018 at 10:22 AM Jianshi Huang
>> wrote:
>> >
>> >
>> https://github.com/apache/spark/blob/88e7e87bd5c052e10f52d4bb97a9d
so it does not get
> expanded by the shell).
>
> But it's really weird to be setting SPARK_HOME in the environment of
> your node managers. YARN shouldn't need to know about that.
> On Thu, Oct 4, 2018 at 10:22 AM Jianshi Huang
> wrote:
> >
> >
> https://github.c
>
> But it's really weird to be setting SPARK_HOME in the environment of
> your node managers. YARN shouldn't need to know about that.
> On Thu, Oct 4, 2018 at 10:22 AM Jianshi Huang
> wrote:
> >
> >
> https://github.com/apache/spark/blob/88e7e87bd5c052e10f52d4
Hi,
Anyone has implemented the default Pig Loader in Spark? (loading delimited
text files with .pig_schema)
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
, 1.3.0)
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
eynold Xin :
>
> I think we made the binary protocol compatible across all versions, so you
>> should be fine with using any one of them. 1.2.1 is probably the best since
>> it is the most recent stable release.
>>
>> On Tue, Feb 10, 2015 at 8:43 PM, Jianshi Huang
&
: https://issues.apache.org/jira/browse/SPARK-5828
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
serde?
Loading tables using parquetFile vs. loading tables from Hive metastore
with Parquet serde
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
SNAPSHOT I built around Dec. 20. Is there any
bug fixes related to shuffle block fetching or index files after that?
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
Jianshi
On Wed, Mar 4, 2015 at 2:55 AM, Jianshi Huang
wrote:
> Hi,
>
> I got this error message:
>
&
its logs as well.
>
> On Tue, Mar 3, 2015 at 11:03 AM, Jianshi Huang
> wrote:
>
>> Sorry that I forgot the subject.
>>
>> And in the driver, I got many FetchFailedException. The error messages are
>>
>> 15/03/03 10:34:32 WARN TaskSetManager: Lost task 31.0 in
Davidson wrote:
> Drat! That doesn't help. Could you scan from the top to see if there were
> any fatal errors preceding these? Sometimes a OOM will cause this type of
> issue further down.
>
> On Tue, Mar 3, 2015 at 8:16 PM, Jianshi Huang
> wrote:
>
>> The failed
at 2:11 PM, Jianshi Huang
wrote:
> Hmm... ok, previous errors are still block fetch errors.
>
> 15/03/03 10:22:40 ERROR RetryingBlockFetcher: Exception while beginning
> fetch of 11 outstanding blocks
> java.io.IOException: Failed to connect to host-xxx
One really interesting is that when I'm using the
netty-based spark.shuffle.blockTransferService, there's no OOM error
messages (java.lang.OutOfMemoryError: Java heap space).
Any idea why it's not here?
I'm using Spark 1.2.1.
Jianshi
On Thu, Mar 5, 2015 at 1:56 PM, Jiansh
e issues when join key is skewed or key number is
> smaller, so you will meet OOM.
>
>
>
> Maybe you could monitor each stage or task’s shuffle and GC status also
> system status to identify the problem.
>
>
>
> Thanks
>
> Jerry
>
>
>
> *From:* Jianshi
park core side, all the shuffle related operations can spill the
> data into disk and no need to read the whole partition into memory. But if
> you uses SparkSQL, it depends on how SparkSQL uses this operators.
>
>
>
> CC @hao if he has some thoughts on it.
>
>
>
> Than
48 PM, Jianshi Huang
wrote:
> I see. I'm using core's join. The data might have some skewness
> (checking).
>
> I understand shuffle can spill data to disk but when consuming it, say in
> cogroup or groupByKey, it still needs to read the whole group elements,
> right? I gues
ar 5, 2015 at 4:01 PM, Shao, Saisai wrote:
> I think there’s a lot of JIRA trying to solve this problem (
> https://issues.apache.org/jira/browse/SPARK-5763). Basically sort merge
> join is a good choice.
>
>
>
> Thanks
>
> Jerry
>
>
>
> *From:* Jianshi Hua
Hi,
I need to set per-user spark.local.dir, how can I do that?
I tried both
/x/home/${user.name}/spark/tmp
and
/x/home/${USER}/spark/tmp
And neither worked. Looks like it has to be a constant setting in
spark-defaults.conf. Right?
Any ideas how to do that?
Thanks,
--
Jianshi Huang
n't support expressions or wildcards in that configuration. For
> each application, the local directories need to be constant. If you
> have users submitting different Spark applications, those can each set
> spark.local.dirs.
>
> - Patrick
>
> On Wed, Mar 11, 2015 at 12:14 AM, J
user home
> directories either. Typically, like in YARN, you would a number of
> directories (on different disks) mounted and configured for local
> storage for jobs.
>
> On Wed, Mar 11, 2015 at 7:42 AM, Jianshi Huang
> wrote:
> > Unfortunately /tmp mount is really small in ou
th boot classpath [.] not found
>>>
>>>
>>> Here's more info on the versions I am using -
>>>
>>> 2.11
>>> 1.2.1
>>> 2.11.5
>>>
>>> Please let me know how can I resolve this problem.
>>>
>>> Thanks
>>> Ashish
>>>
>>
>>
>
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
:23 AM, Jianshi Huang
wrote:
> Same issue here. But the classloader in my exception is somehow different.
>
> scala.ScalaReflectionException: class
> org.apache.spark.sql.catalyst.ScalaReflection in JavaMirror with
> java.net.URLClassLoader@53298398 of type class java.net.URLCla
@transient val sqlc = new org.apache.spark.sql.SQLContext(sc)
[info] implicit def sqlContext = sqlc
[info] import sqlc._
Jianshi
On Fri, Mar 13, 2015 at 3:10 AM, Jianshi Huang
wrote:
> BTW, I was running tests from SBT when I get the errors. One test turn a
> Seq of case class to Data
Hmm... look like the console command still starts a Spark 1.3.0 with Scala
2.11.6 even I changed them in build.sbt.
So the test with 1.2.1 is not valid.
Jianshi
On Fri, Mar 13, 2015 at 2:34 PM, Jianshi Huang
wrote:
> I've confirmed it only failed in console started by SBT.
>
>
Forget about my last message. I was confused. Spark 1.2.1 + Scala 2.10.4
started by SBT console command also failed with this error. However running
from a standard spark shell works.
Jianshi
On Fri, Mar 13, 2015 at 2:46 PM, Jianshi Huang
wrote:
> Hmm... look like the console command st
nction is throwing exception
>>>
>>> Exception in thread "main" scala.ScalaReflectionException: class
>>> org.apache.spark.sql.catalyst.ScalaReflection in JavaMirror with primordial
>>> classloader with boot classpath [.] not found
>>>
>>
I'm almost certain the problem is the ClassLoader.
So adding
fork := true
solves problems for test and run.
The problem is how can I fork a JVM for sbt console? fork in console :=
true seems not working...
Jianshi
On Fri, Mar 13, 2015 at 4:35 PM, Jianshi Huang
wrote:
> I gues
Liancheng also found out that the Spark jars are not included in the
classpath of URLClassLoader.
Hmm... we're very close to the truth now.
Jianshi
On Fri, Mar 13, 2015 at 6:03 PM, Jianshi Huang
wrote:
> I'm almost certain the problem is the ClassLoader.
>
> So adding
he problematic datanode before retrying it.
And maybe dynamically allocate another datanode if dynamic allocation is
enabled.
I think there needs to be a class of fatal errors that can't be recovered
with retries. And it's best Spark can handle it nicely.
Thanks,
--
Jianshi Huang
LinkedIn:
I created a JIRA: https://issues.apache.org/jira/browse/SPARK-6353
On Mon, Mar 16, 2015 at 5:36 PM, Jianshi Huang
wrote:
> Hi,
>
> We're facing "No space left on device" errors lately from time to time.
> The job will fail after retries. Obvious in such case, retry w
of our cases are the second one, we set
> "spark.scheduler.executorTaskBlacklistTime" to 3 to solve such "No
> space left on device" errors. So if a task runs unsuccessfully in some
> executor, it won't be scheduled to the same executor in 30 seconds.
>
>
> Best Regards,
> Shi
Oh, by default it's set to 0L.
I'll try setting it to 3 immediately. Thanks for the help!
Jianshi
On Mon, Mar 16, 2015 at 11:32 PM, Jianshi Huang
wrote:
> Thanks Shixiong!
>
> Very strange that our tasks were retried on the same executor again and
Hi,
Anyone has similar request?
https://issues.apache.org/jira/browse/SPARK-6561
When we save a DataFrame into Parquet files, we also want to have it
partitioned.
The proposed API looks like this:
def saveAsParquet(path: String, partitionColumns: Seq[String])
--
Jianshi Huang
LinkedIn
ne DStream
-> multiple DStreams)
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
m lime / the big picture – in some models,
> friction can be a huge factor in the equations in some other it is just
> part of the landscape
>
>
>
> *From:* Gerard Maas [mailto:gerard.m...@gmail.com]
> *Sent:* Friday, April 17, 2015 10:12 AM
>
> *To:* Evo Eftimov
> *Cc:* Tath
Hi,
I want to write this in Spark SQL DSL:
select map('c1', c1, 'c2', c2) as m
from table
Is there a way?
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
Hi,
I want to do this in Spark SQL DSL:
select '2015-04-22' as date
from table
How to do this?
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
Oh, I found it out. Need to import sql.functions._
Then I can do
table.select(lit("2015-04-22").as("date"))
Jianshi
On Wed, Apr 22, 2015 at 7:27 PM, Jianshi Huang
wrote:
> Hi,
>
> I want to do this in Spark SQL DSL:
>
> select '2015-04-22
at
parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:126)
at
parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:193)
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
; Fix Version
>>
>> On Fri, Apr 24, 2015 at 11:00 AM, Yin Huai wrote:
>>
>>> The exception looks like the one mentioned in
>>> https://issues.apache.org/jira/browse/SPARK-4520. What is the version
>>> of Spark?
t;> Fix Version of SPARK-4520 is not set.
>> I assume it was fixed in 1.3.0
>>
>> Cheers
>> Fix Version
>>
>> On Fri, Apr 24, 2015 at 11:00 AM, Yin Huai wrote:
>>
>>> The exception looks like the one mentioned in
>>> https://is
I'm facing this error in Spark 1.3.1
https://issues.apache.org/jira/browse/SPARK-4105
Anyone knows what's the workaround? Change the compression codec for
shuffle output?
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
I'm using the default settings.
Jianshi
On Wed, May 6, 2015 at 7:05 PM, twinkle sachdeva wrote:
> Hi,
>
> Can you please share your compression etc settings, which you are using.
>
> Thanks,
> Twinkle
>
> On Wed, May 6, 2015 at 4:15 PM, Jianshi Huang
> wrot
s like https://issues.apache.org/jira/browse/SPARK-5446 is still open,
when can we have it fixed? :)
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
t;= 2014-04-30))
PhysicalRDD [meta#143,nvar#145,date#147], MapPartitionsRDD[6] at
explain at :32
Jianshi
On Tue, May 12, 2015 at 10:34 PM, Olivier Girardot
wrote:
> can you post the explain too ?
>
> Le mar. 12 mai 2015 à 12:11, Jianshi Huang a
> écrit :
>
>> Hi,
01005082020.jar:META-INF/ECLIPSEF.RSA
[error]
/Users/jianshuang/.ivy2/cache/org.eclipse.jetty.orbit/javax.activation/orbits/javax.activation-1.1.0.v201105071233.jar:META-INF/ECLIPSEF.RSA
I googled it and looks like I need to exclude some JARs. Anyone has done
that? Your help is really appreciated.
Cheers,
Das
wrote:
> Hi
>
> Check in your driver programs Environment, (eg:
> http://192.168.1.39:4040/environment/). If you don't see this
> commons-codec-1.7.jar jar then that's the issue.
>
> Thanks
> Best Regards
>
>
> On Mon, Jun 16, 2014 at 5:07 PM, Jia
1.jar
gson.jar
guava.jar
joda-convert-1.2.jar
joda-time-2.3.jar
kryo-2.21.jar
libthrift.jar
quasiquotes_2.10-2.0.0-M8.jar
scala-async_2.10-0.9.1.jar
scala-library-2.10.4.jar
scala-reflect-2.10.4.jar
Anyone has hint what went wrong? Really confused.
Cheers,
--
Jianshi Huang
Linke
l.com:7077...
14/06/17 04:15:32 ERROR Worker: Worker registration failed: Attempted to
re-register worker at same address: akka.tcp://
sparkwor...@lvshdc5dn0321.lvs.paypal.com:41987
Is that a bug?
Jianshi
On Tue, Jun 17, 2014 at 5:41 PM, Jianshi Huang
wrote:
> Hi,
>
> I've
spark-submit from within the cluster, or
> outside of it? If the latter, could you try running it from within the
> cluster and see if it works? (Does your rtgraph.jar exist on the machine
> from which you run spark-submit?)
>
>
> 2014-06-17 2:41 GMT-07:00 Jianshi Huang :
>
> Hi
It would be convenient if Spark's textFile, parquetFile, etc. can support
path with wildcard, such as:
hdfs://domain/user/jianshuang/data/parquet/table/month=2014*
Or is there already a way to do it now?
Jianshi
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & B
rogram and it worked..
>> My code was like this
>> b = sc.textFile("hdfs:///path to file/data_file_2013SEP01*")
>>
>> Thanks & Regards,
>> Meethu M
>>
>>
>> On Wednesday, 18 June 2014 9:29 AM, Jianshi Huang <
>> jianshi.hu...@gmail.com
Hi all,
Thanks for the reply. I'm using parquetFile as input, is that a problem? In
hadoop fs -ls, the path
(hdfs://domain/user/jianshuang/data/parquet/table/month=2014*)
will get list all the files.
I'll test it again.
Jianshi
On Wed, Jun 18, 2014 at 2:23 PM, Jianshi Huang
wr
string as part of their name?
>
>
>
> On Wed, Jun 18, 2014 at 2:25 AM, Jianshi Huang
> wrote:
>
>> Hi all,
>>
>> Thanks for the reply. I'm using parquetFile as input, is that a problem?
>> In hadoop fs -ls, the path (hdfs://domain/user/
>>
he Spark User List mailing list archive at Nabble.com.
>
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
eSortReducer or PutSortReducer)
But in Spark, it seems I have to do the sorting and partition myself, right?
Can anyone show me how to do it properly? Is there a better way to ingest
data fast to HBase from Spark?
Cheers,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
27;re transformed to the a KeyValue to be insert in HBase, so I need to
do a .reduce(_.union(_)) to combine them into one RDD[(key, value)].
I cannot see what's wrong in my code.
Jianshi
On Fri, Jul 25, 2014 at 12:24 PM, Jianshi Huang
wrote:
> I can successfully run my code in local
transformed to the a KeyValue to be insert in HBase, so I need to
> do a .reduce(_.union(_)) to combine them into one RDD[(key, value)].
>
> I cannot see what's wrong in my code.
>
> Jianshi
>
>
>
> On Fri, Jul 25, 2014 at 12:24 PM, Jianshi Huang
> wrote:
>
This would be helpful. I personally like Yarn-Client mode as all the
running status can be checked directly from the console.
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
ems not working. What are the other possible reasons? How to fix
it?
Jianshi
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
the doc says it always
shuffles and recommends using coalesce for reducing partitions.
Anyone can help me here?
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
your current settings
> -n to set open files limit
> (and other limits also)
>
> And I set -n to 10240.
>
> I see spark.shuffle.consolidateFiles helps by reusing open files.
> (so I don't know to what extend does it help)
>
> Hope it helps.
>
> Larry
>
>
&
I created this JIRA issue, somebody please pick it up?
https://issues.apache.org/jira/browse/SPARK-2728
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
Looks like I cannot assign it.
On Thu, Jul 31, 2014 at 11:56 AM, Larry Xiao wrote:
> Hi
>
> Can you assign it to me? Thanks
>
> Larry
>
>
> On 7/31/14, 10:47 AM, Jianshi Huang wrote:
>
>> I created this JIRA issue, somebody please pick it up?
>>
>>
ion can reduce the total shuffle file numbers, but the
> concurrent opened file number is the same as basic hash-based shuffle.
>
>
>
> Thanks
>
> Jerry
>
>
>
> *From:* Jianshi Huang [mailto:jianshi.hu...@gmail.com]
> *Sent:* Thursday, July 31, 2014 10:34 AM
You could try something like the following:
>
> val rdd: (WrapWithComparable[(Array[Byte], Array[Byte], Array[Byte])],
> Externalizer[KeyValue]) = ...
> val rdd_coalesced = rdd.coalesce(Math.min(1000, rdd.partitions.length),
> false, null)
>
>
>
>
>
> Thanks
> B
1.1.
>
>
> On Thu, Jul 31, 2014 at 12:40 PM, Jianshi Huang
> wrote:
>
>> I got the number from the Hadoop admin. It's 1M actually. I suspect the
>> consolidation didn't work as expected? Any other reason?
>>
>>
>> On Thu, Jul 31, 2014 at
ns in my
select clause.
I made the duplication on purpose for my code to parse correctly. I think
we should allow users to specify duplicated columns as return value.
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
ker(MapOutputTracker.scala:81)
>>>>> ... 25 more
>>>>>
>>>>>
>>>>> Before the error I can see this kind of logs:
>>>>>
>>>>> 14/03/11 14:29:40 INFO MapOutputTracker: Don't have map outputs for
>&
To make my shell experience merrier, I need to import several packages, and
define implicit sparkContext and sqlContext.
Is there a startup file (e.g. ~/.sparkrc) that Spark shell will load when
it's started?
Cheers,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & B
Hi,
How can I list all registered tables in a sql context?
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
I se. Thanks Prashant!
Jianshi
On Wed, Sep 3, 2014 at 7:05 PM, Prashant Sharma
wrote:
> Hey,
>
> You can use spark-shell -i sparkrc, to do this.
>
> Prashant Sharma
>
>
>
>
> On Wed, Sep 3, 2014 at 2:17 PM, Jianshi Huang
> wrote:
>
>> To make my
Err... there's no such feature?
Jianshi
On Wed, Sep 3, 2014 at 7:03 PM, Jianshi Huang
wrote:
> Hi,
>
> How can I list all registered tables in a sql context?
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://hua
Thanks Tobias,
I also found this: https://issues.apache.org/jira/browse/SPARK-3299
Looks like it's been working on.
Jianshi
On Mon, Sep 8, 2014 at 9:28 AM, Tobias Pfeiffer wrote:
> Hi,
>
> On Sat, Sep 6, 2014 at 1:40 AM, Jianshi Huang
> wrote:
>
>> Err... there
un(Executor.scala:178)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
--
Jianshi Huang
LinkedIn: jianshi
Twit
cheduler.Task.run(Task.scala:54)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.la
tch$$
> anonfun$batchInsertEdges$3.apply(HbaseRDDBatch.scala:179)
>
> Can you reveal what HbaseRDDBatch.scala does ?
>
> Cheers
>
> On Wed, Sep 24, 2014 at 8:46 AM, Jianshi Huang
> wrote:
>
>> One of my big spark program always get stuck at 99% where a few tasks
>&
ark: have you checked region server (logs) to see if
> region server had trouble keeping up ?
>
> Cheers
>
> On Wed, Sep 24, 2014 at 8:51 AM, Jianshi Huang
> wrote:
>
>> Hi Ted,
>>
>> It converts RDD[Edge] to HBase rowkey and columns and insert them to
>>
to be balancedyou might have some skewness in
> row keys and one regionserver is under pressuretry finding that key and
> replicate it using random salt
>
> On Wed, Sep 24, 2014 at 8:51 AM, Jianshi Huang
> wrote:
>
>> Hi Ted,
>>
>> It converts RDD[Edge]
pressuretry finding that key
>> and replicate it using random salt
>>
>> On Wed, Sep 24, 2014 at 8:51 AM, Jianshi Huang
>> wrote:
>>
>>> Hi Ted,
>>>
>>> It converts RDD[Edge] to HBase rowkey and columns and insert them to
>>&g
Looks like it's a HDFS issue, pretty new.
https://issues.apache.org/jira/browse/HDFS-6999
Jianshi
On Thu, Sep 25, 2014 at 12:10 AM, Jianshi Huang
wrote:
> Hi Ted,
>
> See my previous reply to Debasish, all region servers are idle. I don't
> think it's caused by hot
op 2.6.0
>
> Any chance of deploying 2.6.0-SNAPSHOT to see if the problem goes away ?
>
> On Wed, Sep 24, 2014 at 10:54 PM, Jianshi Huang
> wrote:
>
>> Looks like it's a HDFS issue, pretty new.
>>
>> https://issues.apache.org/jira/browse/HDFS-6999
>
I cannot find it in the documentation. And I have a dozen dimension tables
to (left) join...
Cheers,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
wrote:
> Have you looked at SPARK-1800 ?
>
> e.g. see sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala
> Cheers
>
> On Sun, Sep 28, 2014 at 1:55 AM, Jianshi Huang
> wrote:
>
>> I cannot find it in the documentation. And I have a dozen dimension
>> tabl
ep 29, 2014 at 1:24 AM, Jianshi Huang
wrote:
> Yes, looks like it can only be controlled by the
> parameter spark.sql.autoBroadcastJoinThreshold, which is a little bit weird
> to me.
>
> How am I suppose to know the exact bytes of a table? Let me specify the
> join algorit
at 2:18 PM, Jianshi Huang
wrote:
> Looks like https://issues.apache.org/jira/browse/SPARK-1800 is not merged
> into master?
>
> I cannot find spark.sql.hints.broadcastTables in latest master, but it's
> in the following patch.
>
>
> https://github.com/apache/spark/commit/7
MAT 'parquet.hive.DeprecatedParquetInputFormat'
> |OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'
> |LOCATION '$file'""".stripMargin
> sql(ddl)
> setConf("spark.sql.hive.convertMetastoreParquet", "true"
dozen dim tables (using
HiveContext) and then map it to my class object. It failed a couple of
times and now I cached the intermediate table and currently it seems
working fine... no idea why until I found SPARK-3106
Cheers,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & B
Hmm... it failed again, just lasted a little bit longer.
Jianshi
On Mon, Oct 13, 2014 at 4:15 PM, Jianshi Huang
wrote:
> https://issues.apache.org/jira/browse/SPARK-3106
>
> I'm having the saming errors described in SPARK-3106 (no other types of
> errors confirmed), running a
Turned out it was caused by this issue:
https://issues.apache.org/jira/browse/SPARK-3923
Set spark.akka.heartbeat.interval to 100 solved it.
Jianshi
On Mon, Oct 13, 2014 at 4:24 PM, Jianshi Huang
wrote:
> Hmm... it failed again, just lasted a little bit longer.
>
> Jianshi
>
>
On Tue, Oct 14, 2014 at 4:36 AM, Jianshi Huang
wrote:
> Turned out it was caused by this issue:
> https://issues.apache.org/jira/browse/SPARK-3923
>
> Set spark.akka.heartbeat.interval to 100 solved it.
>
> Jianshi
>
> On Mon, Oct 13, 2014 at 4:24 PM, Jianshi Huang
1 - 100 of 157 matches
Mail list logo