from:"Taeyun Kim"

RE: YARN mode startup takes too long (10+ secs)

2015-05-08 Thread Taeyun Kim

y, May 08, 2015 4:25 PM To: Taeyun Kim; user@spark.apache.org Subject: Re: YARN mode startup takes too long (10+ secs) So is this sleep occurs before allocating resources for the first few executors to start the job? On Fri, May 8, 2015 at 6:23 AM Taeyun Kim wrote: I think I¡¯ve found t

RE: YARN mode startup takes too long (10+ secs)

2015-05-07 Thread Taeyun Kim

spark.yarn.scheduler.heartbeat.interval-ms. I hope that the additional overhead it incurs would be negligible. From: Zoltán Zvara [mailto:zoltan.zv...@gmail.com] Sent: Thursday, May 07, 2015 10:05 PM To: Taeyun Kim; user@spark.apache.org Subject: Re: YARN mode startup takes too long (10+ secs) Without

RE: Spark does not delete temporary directories

2015-05-07 Thread Taeyun Kim

are not marked. Is this really intentional? From: Haopu Wang [mailto:hw...@qilinsoft.com] Sent: Friday, May 08, 2015 11:37 AM To: Taeyun Kim; Ted Yu; Todd Nist; user@spark.apache.org Subject: RE: Spark does not delete temporary directories I think the temporary folders are used to store blo

RE: Spark does not delete temporary directories

2015-05-07 Thread Taeyun Kim

ers On Thu, May 7, 2015 at 6:19 AM, Todd Nist wrote: Have you tried to set the following? spark.worker.cleanup.enabled=true spark.worker.cleanup.appDataTtl=” On Thu, May 7, 2015 at 2:39 AM, Taeyun Kim wrote: Hi, After a spark program completes, there are 3 temporary directories

Spark does not delete temporary directories

2015-05-06 Thread Taeyun Kim

Hi, After a spark program completes, there are 3 temporary directories remain in the temp directory. The file names are like this: spark-2e389487-40cc-4a82-a5c7-353c0feefbb7 And the Spark program runs on Windows, a snappy DLL file also remains in the temp directory. The file name is like

YARN mode startup takes too long (10+ secs)

2015-05-06 Thread Taeyun Kim

Hi, I'm running a spark application with YARN-client or YARN-cluster mode. But it seems to take too long to startup. It takes 10+ seconds to initialize the spark context. Is this normal? Or can it be optimized? The environment is as follows: - Hadoop: Hortonworks HDP 2.2 (Hadoop 2.6) -

Task size is large when CombineTextInputFormat is used

2015-03-30 Thread Taeyun Kim

Hi, I used CombineTextInputFormat to read many small files. The Java code is as follows (I've written it as an utility function): public static JavaRDD combineTextFile(JavaSparkContext sc, String path, long maxSplitSize, boolean recursive) { Configuration conf = new Confi

RE: Is SPARK_CLASSPATH really deprecated?

2015-03-01 Thread Taeyun Kim

spark.executor.extraClassPath is especially useful when the output is written to HBase, since the data nodes on the cluster have HBase library jars. -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Friday, February 27, 2015 5:22 PM To: Kannan Rajah Cc: Marcelo Va

Error on JavaSparkContext.stop()

2014-12-11 Thread Taeyun Kim

(Sorry if this mail is duplicate, but it seems that my previous mail could not reach the mailing list.) Hi, When my spark program calls JavaSparkContext.stop(), the following errors occur. 14/12/11 16:24:19 INFO Main: sc.stop { 14/12/11 16:24:20 ERROR Conne

Error on JavaSparkContext.stop()

2014-12-11 Thread Taeyun Kim

Hi, When my spark program calls JavaSparkContext.stop(), the following errors occur. 14/12/11 16:24:19 INFO Main: sc.stop { 14/12/11 16:24:20 ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(cluster02,38918) not found

Error on JavaSparkContext.stop()

2014-12-10 Thread Taeyun Kim

Hi, When my spark program calls JavaSparkContext.stop(), the following errors occur. 14/12/11 16:24:19 INFO Main: sc.stop { 14/12/11 16:24:20 ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(cluster02,38918) not found

RE: Opening Spark on IntelliJ IDEA

2014-11-26 Thread Taeyun Kim

Hi, An information about the error. On File | Project Structure window, the following error message is displayed with pink background: Library 'Maven: org.scala-lang:scala-compiler-bundle:2.10.4' is not used Can it be a hint? From: Taeyun Kim [mailto:taeyun@innowi

Opening Spark on IntelliJ IDEA

2014-11-26 Thread Taeyun Kim

Hi, I'm trying to open the Spark source code with IntelliJ IDEA. I opened pom.xml on the Spark source code root directory. Project tree is displayed in the Project tool window. But, when I open a source file, say org.apache.spark.deploy.yarn.ClientBase.scala, a lot of red marks shows on the

RE: How to view log on yarn-client mode?

2014-11-20 Thread innowireless TaeYun Kim

), but it’s Ok. From: Sandy Ryza [mailto:sandy.r...@cloudera.com] Sent: Thursday, November 20, 2014 2:44 PM To: innowireless TaeYun Kim Cc: user Subject: Re: How to view log on yarn-client mode? While the app is running, you can find logs from the YARN web UI by navigating to containers

How to view log on yarn-client mode?

2014-11-19 Thread innowireless TaeYun Kim

Hi, How can I view log on yarn-client mode? When I insert the following line on mapToPair function for example, System.out.println("TEST TEST"); On local mode, it is displayed on console. But on yarn-client mode, it is not on anywhere. When I use yarn resource manager web UI, the siz

RE: Bulk-load to HBase

2014-09-22 Thread innowireless TaeYun Kim

(For that time, my program did not include the HBase export task.) BTW, I use Spark 1.0.0. Thank you. -Original Message- From: Sean Owen [mailto:so...@cloudera.com] Sent: Monday, September 22, 2014 6:26 PM To: innowireless TaeYun Kim Cc: user Subject: Re: Bulk-load to HBase On Mon, S

RE: Bulk-load to HBase

2014-09-22 Thread innowireless TaeYun Kim

2014 5:46 PM To: innowireless TaeYun Kim Cc: user Subject: Re: Bulk-load to HBase I see a number of potential issues: On Mon, Sep 22, 2014 at 8:42 AM, innowireless TaeYun Kim wrote: > JavaPairRDD rdd = > // MyKey has a byte[] member for rowkey Two byte[] with the same content

RE: Possibly a dumb question: differences between saveAsNewAPIHadoopFile and saveAsNewAPIHadoopDataset?

2014-09-22 Thread innowireless TaeYun Kim

@spark.apache.org; innowireless TaeYun Kim Subject: Re: Possibly a dumb question: differences between saveAsNewAPIHadoopFile and saveAsNewAPIHadoopDataset? File takes a filename to write to, while Dataset takes only a JobConf. This means that Dataset is more general (it can also save to storage systems that

RE: Bulk-load to HBase

2014-09-22 Thread innowireless TaeYun Kim

correction would be very helpful. Thanks. -Original Message- From: Soumitra Kumar [mailto:kumar.soumi...@gmail.com] Sent: Saturday, September 20, 2014 1:44 PM To: Ted Yu Cc: innowireless TaeYun Kim; user; Aniket Bhatnagar Subject: Re: Bulk-load to HBase I successfully did this once.

Possibly a dumb question: differences between saveAsNewAPIHadoopFile and saveAsNewAPIHadoopDataset?

2014-09-21 Thread innowireless TaeYun Kim

Hi, I'm confused with saveAsNewAPIHadoopFile and saveAsNewAPIHadoopDataset. What's the difference between the two? What's the individual use cases of the two APIs? Could you describe the internal flows of the two APIs briefly? I've used Spark several months, but I have no experience on M

RE: Bulk-load to HBase

2014-09-19 Thread innowireless TaeYun Kim

: innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr] Sent: Friday, September 19, 2014 9:20 PM To: user@spark.apache.org Subject: RE: Bulk-load to HBase Thank you for the example code. Currently I use foreachPartition() + Put(), but your example code can be used to clean up my code

RE: Bulk-load to HBase

2014-09-19 Thread innowireless TaeYun Kim

bypasses the write path. Thanks. From: Aniket Bhatnagar [mailto:aniket.bhatna...@gmail.com] Sent: Friday, September 19, 2014 9:01 PM To: innowireless TaeYun Kim Cc: user Subject: Re: Bulk-load to HBase I have been using saveAsNewAPIHadoopDataset but I use TableOutputFormat instead of

RE: Bulk-load to HBase

2014-09-19 Thread innowireless TaeYun Kim

Hi, Sorry, I just found saveAsNewAPIHadoopDataset. Then, Can I use HFileOutputFormat with saveAsNewAPIHadoopDataset? Is there any example code for that? Thanks. From: innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr] Sent: Friday, September 19, 2014 8:18 PM To: user

RE: Bulk-load to HBase

2014-09-19 Thread innowireless TaeYun Kim

Am I right? If so, is there another method to bulk-load to HBase from RDD? Thanks. From: innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr] Sent: Friday, September 19, 2014 7:17 PM To: user@spark.apache.org Subject: Bulk-load to HBase Hi, Is there a way to bulk-load to

Bulk-load to HBase

2014-09-19 Thread innowireless TaeYun Kim

Hi, Is there a way to bulk-load to HBase from RDD? HBase offers HFileOutputFormat class for bulk loading by MapReduce job, but I cannot figure out how to use it with saveAsHadoopDataset. Thanks.

Alternative to spark.executor.extraClassPath ?

2014-09-14 Thread innowireless TaeYun Kim

Hi, On Spark Configuration document, spark.executor.extraClassPath is regarded as a backwards-compatibility option. It also says that users typically should not need to set this option. Now, I must add a classpath to the executor environment (as well as to the driver in the future, but for

hdfs.BlockMissingException on Iterator.hasNext() in mapPartitionsWithIndex()

2014-07-28 Thread innowireless TaeYun Kim

Hi, I'm trying to split one large multi-field text file into many single-field text files. My code is like this: (somewhat simplified) final Broadcast bcSchema = sc.broadcast(schema); final String outputPathName = env.outputPathName; sc.textFile(env.inputFileName) .ma

RE: Strange exception on coalesce()

2014-07-27 Thread innowireless TaeYun Kim

e this was already fixed last week in SPARK-2414: https://github.com/apache/spark/commit/7c23c0dc3ed721c95690fc49f435d9de6952523c On Fri, Jul 25, 2014 at 1:34 PM, innowireless TaeYun Kim wrote: > Hi, > I'm using Spark 1.0.0. > > On filter() - map() - coalesce() - saveAsText() sequence,

RE: Strange exception on coalesce()

2014-07-25 Thread innowireless TaeYun Kim

(Sorry for resending, I've reformatted the text as HTML.) Hi, I'm using Spark 1.0.0. On filter() - map() - coalesce() - saveAsText() sequence, the following exception is thrown. Exception in thread "main" java.util.NoSuchElementException: None.get at scala.None$.get(Option.scal

Strange exception on coalesce()

2014-07-25 Thread innowireless TaeYun Kim

Hi, I'm using Spark 1.0.0. On filter() - map() - coalesce() - saveAsText() sequence, the following exception is thrown. Exception in thread "main" java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:313) at scala.None$.get(Option.scala:311) at org.apache.spark.r

RE: executor-cores vs. num-executors

2014-07-16 Thread innowireless TaeYun Kim

Thanks. Really, now I compare a stage data of the two jobs, ‘core7-exec3’ spends about 12.5 minutes more than ‘core2-exec12’ on GC. From: Nishkam Ravi [mailto:nr...@cloudera.com] Sent: Wednesday, July 16, 2014 5:28 PM To: user@spark.apache.org Subject: Re: executor-cores vs. num-executors

executor-cores vs. num-executors

2014-07-15 Thread innowireless TaeYun Kim

Hi, On running yarn-client mode, the following options can be specified: l --executor-cores l --num-executors If we have following machines: l 3 data nodes l 8 cores each node Which is the better? 1. --executor-cores 7 --num-executors 3 (more core for each executor, leavi

A Task failed with java.lang.ArrayIndexOutOfBoundsException at com.ning.compress.lzf.impl.UnsafeChunkDecoder.copyOverlappingLong

2014-07-10 Thread innowireless TaeYun Kim

Hi, A Task failed with with java.lang.ArrayIndexOutOfBoundsException at com.ning.compress.lzf.impl.UnsafeChunkDecoder.copyOverlappingLong. And the whole job was terminated after repeated task failures. It ran without any problem several days ago. Currently we suspect that the cluster i

RE: Kryo is slower, and the size saving is minimal

2014-07-09 Thread innowireless TaeYun Kim

Thank you for your response. Maybe that applies to my case. In my test case, The types of almost all of the data are either primitive types, joda DateTime, or String. But I'm somewhat disappointed with the speed. At least it should not be slower than Java default serializer... -Original Messa

Kryo is slower, and the size saving is minimal

2014-07-08 Thread innowireless TaeYun Kim

Hi, For my test case, using Kryo serializer does not help. It is slower than default Java serializer, and the size saving is minimal. I've registered almost all classes to the Kryo registrator. What is happening to my test case? Have Anyone experienced a case like this?

Help for the large number of the input data files

2014-07-07 Thread innowireless TaeYun Kim

Hi, A help for the implementation best practice is needed. The operating environment is as follows: - Log data file arrives irregularly. - The size of a log data file is from 3.9KB to 8.5MB. The average is about 1MB. - The number of records of a data file is from 13 lines to 22000 lines.

RE: The number of cores vs. the number of executors

2014-07-07 Thread innowireless TaeYun Kim

For your information, I've attached the Ganglia monitoring screen capture on the Stack Overflow question. Please see: http://stackoverflow.com/questions/24622108/apache-spark-the-number-of-cores -vs-the-number-of-executors From: innowireless TaeYun Kim [mailto:taeyun@innowireless.

The number of cores vs. the number of executors

2014-07-07 Thread innowireless TaeYun Kim

Hi, I'm trying to understand the relationship of the number of cores and the number of executors when running a Spark job on YARN. The test environment is as follows: - # of data nodes: 3 - Data node machine spec: - CPU: Core i7-4790 (# of cores: 4, # of threads: 8) - RAM: 32GB (8GB

RE: Help: WARN AbstractNioSelector: Unexpected exception in the selector loop. java.lang.OutOfMemoryError: Java heap space

2014-07-02 Thread innowireless TaeYun Kim

f the driver program constantly grows? -Original Message- From: innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr] Sent: Wednesday, July 02, 2014 6:05 PM To: user@spark.apache.org Subject: RE: Help: WARN AbstractNioSelector: Unexpected exception in the selector

RE: Help: WARN AbstractNioSelector: Unexpected exception in the selector loop. java.lang.OutOfMemoryError: Java heap space

2014-07-02 Thread innowireless TaeYun Kim

. -Original Message- From: innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr] Sent: Wednesday, July 02, 2014 5:58 PM To: user@spark.apache.org Subject: Help: WARN AbstractNioSelector: Unexpected exception in the selector loop. java.lang.OutOfMemoryError: Java heap space Hi

Help: WARN AbstractNioSelector: Unexpected exception in the selector loop. java.lang.OutOfMemoryError: Java heap space

2014-07-02 Thread innowireless TaeYun Kim

Hi, When running a Spark job, the following warning message displays and the job seems no longer progressing. (Detailed log message are at the bottom of this message.) --- 14/07/02 17:00:14 WARN AbstractNioSelector: Unexpected exception in the selector loop. java.lang.OutOfMemoryError

RE: Question about RDD cache, unpersist, materialization

2014-06-12 Thread innowireless TaeYun Kim

(I¡¯ve clarified the statement (1) of my previous mail. See below.) From: innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr] Sent: Friday, June 13, 2014 10:05 AM To: user@spark.apache.org Subject: RE: Question about RDD cache, unpersist, materialization Currently I use

RE: Question about RDD cache, unpersist, materialization

2014-06-12 Thread innowireless TaeYun Kim

, unpersist, materialization FYI: Here is a related discussion <http://apache-spark-user-list.1001560.n3.nabble.com/Persist-and-unpersist-td6437.html> about this. On Thu, Jun 12, 2014 at 8:10 PM, innowireless TaeYun Kim wrote: Maybe It would be nice that unpersist() ¡®triggers

RE: Question about RDD cache, unpersist, materialization

2014-06-12 Thread innowireless TaeYun Kim

014 at 2:26 AM, Nick Pentreath wrote: If you want to force materialization use .count() Also if you can simply don't unpersist anything, unless you really need to free the memory — Sent from Mailbox <https://www.dropbox.com/mailbox> On Wed, Jun 11, 2014 at 5:13 AM, innowireless

How to use SequenceFileRDDFunctions.saveAsSequenceFile() in Java?

2014-06-12 Thread innowireless TaeYun Kim

Hi, How to use SequenceFileRDDFunctions.saveAsSequenceFile() in Java? A simple example will be a great help. Thanks in advance.

How to read a snappy-compressed text file?

2014-06-12 Thread innowireless TaeYun Kim

Hi, Maybe this is a newbie question: How to read a snappy-compressed text file? The OS is Windows 7. Currently, I've done the following steps: 1. Built Hadoop 2.4.0 with snappy option. 'hadoop checknative' command displays the following line: snappy: true D:\hadoop-2.4.0\bin\snappy.d

RE: Question about RDD cache, unpersist, materialization

2014-06-10 Thread innowireless TaeYun Kim

BTW, it is possible that rdd.first() does not compute the whole partitions. So, first() cannot be uses for the situation below. -Original Message- From: innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr] Sent: Wednesday, June 11, 2014 11:40 AM To: user@spark.apache.org

Question about RDD cache, unpersist, materialization

2014-06-10 Thread innowireless TaeYun Kim

Hi, What I (seems to) know about RDD persisting API is as follows: - cache() and persist() is not an action. It only does a marking. - unpersist() is also not an action. It only removes a marking. But if the rdd is already in memory, it is unloaded. And there seems no API to forcefully materializ

RE: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-08 Thread innowireless TaeYun Kim

Without (C), what is the best practice to implement the following scenario? 1. rdd = sc.textFile(FileA) 2. rdd = rdd.map(...) // actually modifying the rdd 3. rdd.saveAsTextFile(FileA) Since the rdd transformation is 'lazy', rdd will not materialize until saveAsTextFile(), so FileA must still ex

How can I dispose an Accumulator?

2014-05-29 Thread innowireless TaeYun Kim

Hi, How can I dispose an Accumulator? It has no method like 'unpersist()' which Broadcast provides. Thanks.

Client cannot authenticate via:[TOKEN]

2014-05-12 Thread innowireless TaeYun Kim

I'm trying to run spark-shell on Hadoop yarn. Specifically, the environment is as follows: - Client - OS: Windows 7 - Spark version: 1.0.0-SNAPSHOT (git cloned 2014.5.8) - Server - Platform: hortonworks sandbox 2.1 I modified the spark code to apply https://issues.apache.org/jira/browse/YAR

51 matches

Mail list logo