Re: RDD block has negative value in Spark UI

2022-12-07 Thread Stelios Philippou
Already a know minor issue https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-10141 On Wed, 7 Dec 2022, 15:09 K B M Kaala Subhikshan, < kbmkaalasubhiks...@gmail.com> wrote: > Could you explain why the RDD block has a negat

Memory leak while caching in foreachBatch block

2022-08-10 Thread kineret M
Hi, We have a structured streaming application, and we face a memory leak while caching in the foreachBatch block. We do unpersist every iteration, and we also verify via "spark.sparkContext.getPersistentRDDs" that we don't have unnecessary cached data. We also noted in the pro

Block fetching fails due to change in local address

2020-08-17 Thread Samik R
Hello, Recently faced a strange problem. I was running a job on my laptop with deploy mode as client and context as local[*]. In between I lost connection to my router, and when I got back the connection, the laptop was assigned a different internal IP address. The j

[Debug] [Spark Core 2.4.4] org.apache.spark.storage.BlockException: Negative block size -9223372036854775808

2020-06-29 Thread Adam Tobey
Hi, I'm encountering a strange exception in spark 2.4.4 (on AWS EMR 5.29): org.apache.spark.storage.BlockException: Negative block size -9223372036854775808. I've seen this mostly from this line (for remote blocks) org.apache.spark.storage.ShuffleBlockFetcherIterato

Re: [pyspark 2.3+] read/write huge data with smaller block size (128MB per block)

2020-06-19 Thread Rishi Shah
; Yes you'll generally get 1 partition per block, and 1 task per partition. > The amount of RAM isn't directly relevant; it's not loaded into memory. > But you may nevertheless get some improvement with larger partitions / > tasks, though typically only if your tasks are very s

Re: [pyspark 2.3+] read/write huge data with smaller block size (128MB per block)

2020-06-19 Thread Sean Owen
Yes you'll generally get 1 partition per block, and 1 task per partition. The amount of RAM isn't directly relevant; it's not loaded into memory. But you may nevertheless get some improvement with larger partitions / tasks, though typically only if your tasks are very small and very

Re: Spark SQL met "Block broadcast_xxx not found"

2019-05-07 Thread Jacek Laskowski
Hi, I'm curious about "I found the bug code". Can you point me at it? Thanks. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https:

Re: Spark SQL met "Block broadcast_xxx not found"

2019-05-07 Thread Xilang Yan
Ok... I am sure it is a bug of spark, I found the bug code, but the code is removed in 2.2.3, so I just upgrade spark to fix the problem. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mai

Spark SQL met "Block broadcast_xxx not found"

2019-04-27 Thread Xilang Yan
- Error opening block StreamChunkId{streamId=365584526097, chunkIndex=0} for request from /10.33.46.33:19866 org.apache.spark.storage.BlockNotFoundException: Block broadcast_334_piece0 not found at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:361) ~[spark-core_2.11-2.2.1

Re: Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-21 Thread Shivam Sharma
Thanks Arnaud On Mon, Jan 21, 2019 at 2:07 PM Arnaud LARROQUE wrote: > Hi Shivam, > > At the end, the file is taking its own space regardless of the block size. > So if you're file is just a few ko bytes, it will take only this few ko > bytes. > But I've noticed t

Re: Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-21 Thread Arnaud LARROQUE
Hi Shivam, At the end, the file is taking its own space regardless of the block size. So if you're file is just a few ko bytes, it will take only this few ko bytes. But I've noticed that when the file is written, somehow a block is allocated and the Namenode consider that all the blo

Re: Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-21 Thread Shivam Sharma
Don't we have any property for it? One more quick question that if files created by Spark is less than HDFS block size then the rest of Block space will become unavailable and remain unutilized or it will be shared with other files? On Mon, Jan 21, 2019 at 1:30 PM Shivam Sharma <28s

Re: Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-19 Thread Hichame El Khalfi
You can do this in 2 passes (not one) A) save you dataset into hdfs with what you have. B) calculate number of partition, n= (size of your dataset)/hdfs block size Then run simple spark job to read and partition based on 'n'. Hichame From: felixcheun...@hotmail.com Sent: January 19, 20

Re: Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-19 Thread Felix Cheung
You can call coalesce to combine partitions.. From: Shivam Sharma <28shivamsha...@gmail.com> Sent: Saturday, January 19, 2019 7:43 AM To: user@spark.apache.org Subject: Persist Dataframe to HDFS considering HDFS Block Size. Hi All, I wanted to persist dat

Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-19 Thread Shivam Sharma
Hi All, I wanted to persist dataframe on HDFS. Basically, I am inserting data into a HIVE table using Spark. Currently, at the time of writing to HIVE table I have set total shuffle partitions = 400 so total 400 files are being created which is not even considering HDFS block size. How can I tell

Re: ALS block settings

2018-10-23 Thread evanzamir
I have the same question. Trying to figure out how to get ALS to complete with larger dataset. It seems to get stuck on "Count" from what I can tell. I'm running 8 r4.4xlarge instances on Amazon EMR. The dataset is 80 GB (just to give some idea of size). I assumed Spark could handle this, but maybe

Limit the block size of data received by spring streaming receiver

2018-01-07 Thread Xilang Yan
Hey, We use a customize receiver to receive data from our MQ. We used to use def store(dataItem: T) to store data however I found the block size can be very different from 0.5K to 5M size. So that data partition processing time is very different. Shuffle is an option, but I want to avoid it. I

java IllegalStateException: unread block data Exception - setBlockDataMode

2017-07-11 Thread Kanagha
TID 3, ..): java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2449) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1385) at java.io.ObjectInputStream.defaultReadF

Re: java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

2016-12-01 Thread Marco Mistroni
w what is a connection between RDD and blocks (I >> know that for every batch one RDD is produced)? what is a block in this >> context? is it a disk block ? if so, what is it default size? and Finally, >> why does the following error happens so often? >> >> java.lang.Except

Re: java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

2016-12-01 Thread kant kodali
nection between RDD and blocks (I know > that for every batch one RDD is produced)? what is a block in this context? > is it a disk block ? if so, what is it default size? and Finally, why does > the following error happens so often? > > java.lang.Exception: Could not comput

Re: java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

2016-12-01 Thread kant kodali
My batch interval is 1s slide interval is 1s window interval is 1 minute I am using a standalone alone cluster. I don't have any storage layer like HDFS. so I dont know what is a connection between RDD and blocks (I know that for every batch one RDD is produced)? what is a block in this co

Re: java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

2016-12-01 Thread kant kodali
public void call(JavaPairRDD >> stringIntegerJavaPairRDD) throws Exception { >> Map map = new HashMap<>(); >> Gson gson = new Gson(); >> stringIntegerJavaPairRDD >> .col

Re: java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

2016-11-30 Thread kant kodali
.forEach((Tuple2 KV) -> { > String status = KV._1(); > Integer count = KV._2(); > map.put(status, count); > } >

Re: java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

2016-11-30 Thread kant kodali
} ); NSQReceiver.send(producer, "output_777", gson.toJson(map).getBytes()); } }); Thanks, kant On Wed, Nov 30, 2016 at 2:11 PM, Marco Mistroni wrote: > Could you paste reproducible snippet code? > Kr > > On 30 Nov 2016 9

Re: java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

2016-11-30 Thread Marco Mistroni
Could you paste reproducible snippet code? Kr On 30 Nov 2016 9:08 pm, "kant kodali" wrote: > I have lot of these exceptions happening > > java.lang.Exception: Could not compute split, block input-0-1480539568000 > not found > > > Any ideas what this could be? >

java.lang.Exception: Could not compute split, block input-0-1480539568000 not found

2016-11-30 Thread kant kodali
I have lot of these exceptions happening java.lang.Exception: Could not compute split, block input-0-1480539568000 not found Any ideas what this could be?

WARN 1 block locks were not released with MLlib ALS

2016-11-04 Thread Mikael Ståldal
I get a few warnings like this in Spark 2.0.1 when using org .apache.spark.mllib.recommendation.ALS: WARN org.apache.spark.executor.Executor - 1 block locks were not released by TID = 1448: [rdd_239_0] What can be the reason for that? -- [image: MagineTV] *Mikael Ståldal* Senior software

Re: ALS.trainImplicit block sizes

2016-10-21 Thread Nick Pentreath
k nodes). All are 7.5 > gig machines. > > On Fri, Oct 21, 2016 at 12:15 AM, Nick Pentreath > wrote: > > How many nodes are you using in the cluster? > > > > On Fri, 21 Oct 2016 at 08:58 Nikhil Mishra > wrote: > > Thanks Nick. > > So we do partition U

Re: ALS.trainImplicit block sizes

2016-10-21 Thread Nick Pentreath
> > So we do partition U x I matrix into BxB matrices, each of size around U/B > and I/B. Is that correct? Do you know whether a single block of the matrix > is represented in memory as a full matrix or as sparse matrix? I ask this > because my job has been failing for block sizes

Re: ALS.trainImplicit block sizes

2016-10-21 Thread Nick Pentreath
How many nodes are you using in the cluster? On Fri, 21 Oct 2016 at 08:58 Nikhil Mishra wrote: > Thanks Nick. > > So we do partition U x I matrix into BxB matrices, each of size around U/B > and I/B. Is that correct? Do you know whether a single block of the matrix > is repres

Re: ALS.trainImplicit block sizes

2016-10-20 Thread Nick Pentreath
about the block size to be specified in > ALS.trainImplicit() in pyspark (Spark 1.6.1). There is only one block size > parameter to be specified. I want to know if that would result in > partitioning both the users as well as the items axes. > > For example, I am using the following c

ALS.trainImplicit block sizes

2016-10-20 Thread Nikhil Mishra
Hi, I have a question about the block size to be specified in ALS.trainImplicit() in pyspark (Spark 1.6.1). There is only one block size parameter to be specified. I want to know if that would result in partitioning both the users as well as the items axes. For example, I am using the following

Spark 2.0.0 Error Caused by: java.lang.IllegalArgumentException: requirement failed: Block broadcast_21_piece0 is already present in the MemoryStore

2016-10-11 Thread sandesh deshmane
nd.java:128) App > at py4j.commands.CallCommand.execute(CallCommand.java:79) App > at py4j.GatewayConnection.run(GatewayConnection.java:211) App > at java.lang.Thread.run(Thread.java:745) App > Caused by: java.io.IOException: java.lang.IllegalArgumentException: requirement failed: Block broad

Re: Fw: Spark + Parquet + IBM Block Storage at Bluemix

2016-09-25 Thread Mario Ds Briggs
14/09/2016 01:19 am Subject:Re: Fw: Spark + Parquet + IBM Block Storage at Bluemix Hi Mario, Thanks for your help, so I will keeping using CSVs Best, Daniel Lopes Chief Data and Analytics Officer | OneMatch c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes www.onematch.c

Re: Fw: Spark + Parquet + IBM Block Storage at Bluemix

2016-09-13 Thread Daniel Lopes
incase you've not seen this... > > From: Adam Roberts/UK/IBM > To: Mario Ds Briggs/India/IBM@IBMIN > Date: 12/09/2016 09:37 pm > Subject: Fw: Spark + Parquet + IBM Block Storage at Bluemix > -- > > > Mario, incase you've not seen th

Re: Spark + Parquet + IBM Block Storage at Bluemix

2016-09-13 Thread Steve Loughran
nematch.com.br>> wrote: Hi, someone can help I'm trying to use parquet in IBM Block Storage at Spark but when I try to load get this error: using this config credentials = { "name": "keystone", "auth_url": "https://ident

Re: Fw: Spark + Parquet + IBM Block Storage at Bluemix

2016-09-12 Thread Mario Ds Briggs
hanks Mario From: Adam Roberts/UK/IBM To: Mario Ds Briggs/India/IBM@IBMIN Date: 12/09/2016 09:37 pm Subject:Fw: Spark + Parquet + IBM Block Storage at Bluemix Mario, incase you've not

Re: Spark + Parquet + IBM Block Storage at Bluemix

2016-09-12 Thread Daniel Lopes
=daniel-lopes> On Sun, Sep 11, 2016 at 3:28 PM, Steve Loughran wrote: > > On 9 Sep 2016, at 17:56, Daniel Lopes wrote: > > Hi, someone can help > > I'm trying to use parquet in IBM Block Storage at Spark but when I try to > load get this error: > > using this config

Re: Spark + Parquet + IBM Block Storage at Bluemix

2016-09-11 Thread Steve Loughran
On 9 Sep 2016, at 17:56, Daniel Lopes mailto:dan...@onematch.com.br>> wrote: Hi, someone can help I'm trying to use parquet in IBM Block Storage at Spark but when I try to load get this error: using this config credentials = { "name": "keysto

Spark + Parquet + IBM Block Storage at Bluemix

2016-09-09 Thread Daniel Lopes
Hi, someone can help I'm trying to use parquet in IBM Block Storage at Spark but when I try to load get this error: using this config credentials = { "name": "keystone", *"auth_url": "https://identity.open.softlayer.com <https:/

Spark 2.0.0 RC 5 -- java.lang.AssertionError: assertion failed: Block rdd_[*] is not locked for reading

2016-07-24 Thread Ameen Akel
Hello, I'm working with Spark 2.0.0-rc5 on Mesos (v0.28.2) on a job with ~600 cores. Every so often, depending on the task that I've run, I'll lose an executor to an assertion. Here's an example error: java.lang.AssertionError: assertion failed: Block rdd_2659_0 is not lock

Specifying Fixed Duration (Spot Block) for AWS Spark EC2 Cluster

2016-07-04 Thread nsharkey
find anything that works with the Spark script. My current (working) script will allow me to get Spot requests but I can't specify a duration: ./spark-ec2 \ --key-pair= \ --identity-file= \ --instance-type=r3.8xlarge \ -s 2 \ --spot-price=0.75 \ --block-duration-minutes 12

Re: Dataframe to parquet using hdfs or parquet block size

2016-04-07 Thread Buntu Dev
I tried setting both the hdfs and parquet block size but write to parquet did not seem to have had any effect on the total number of blocks or the average block size. Here is what I did: sqlContext.setConf("dfs.blocksize", "134217728") sqlContext.setConf("parque

Dataframe to parquet using hdfs or parquet block size

2016-04-06 Thread bdev
I need to save the dataframe to parquet format and need some input on choosing the appropriate block size to help efficiently parallelize/localize the data to the executors. Should I be using parquet block size or hdfs block size and what is the optimal block size to use on a 100 node cluster

Re: Parquet block size from spark-sql cli

2016-01-28 Thread Ted Yu
Have you tried the following (sc is SparkContext)? sc.hadoopConfiguration.setInt("parquet.block.size", BLOCK_SIZE) On Thu, Jan 28, 2016 at 9:16 AM, ubet wrote: > Can I set the Parquet block size (parquet.block.size) in spark-sql. We are > loading about 80 table partitions in p

Parquet block size from spark-sql cli

2016-01-28 Thread ubet
Can I set the Parquet block size (parquet.block.size) in spark-sql. We are loading about 80 table partitions in parallel on 1.5.2 and run OOM. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Parquet-block-size-from-spark-sql-cli-tp26097.html Sent from the

[Spark Streaming] "Could not compute split, block input-0-1452563923800 not found” when trying to recover from checkpoint data

2016-01-13 Thread Collin Shi
Hi I was doing a simple updateByKey transformation and print on the data received from socket, and spark version is 1.4.0. The first submit went all right, but after I kill (CTRL + C) the job and submit again. Apparently spark was trying to recover from the checkpoint data , but then the except

Some tasks take a long time to find local block

2015-12-17 Thread patrick256
15/12/16 09:44:37 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 10 15/12/16 09:44:37 INFO storage.MemoryStore: ensureFreeSpace(1777) called with curMem=908793307, maxMem=5927684014 15/12/16 09:44:37 INFO storage.MemoryStore: Block broadcast_10_piece0 stored as bytes in m

Re: How to use collections inside foreach block

2015-12-10 Thread Madabhattula Rajesh Kumar
>>> one step. >>> >>> I'm planning to join this table in a chunks. For example, each step I >>> will join 5000 ids. >>> >>> Below code is not working. I'm not able to add result to ListBuffer. >>> Result s giving always ZER

Re: How to use collections inside foreach block

2015-12-09 Thread Ted Yu
gt; >> Below code is not working. I'm not able to add result to ListBuffer. >> Result s giving always ZERO >> >> *Code Block :-* >> >> var listOfIds is a ListBuffer with 2 records >> >> listOfIds.grouped(5000).foreach { x => >> { >&

Re: How to use collections inside foreach block

2015-12-09 Thread Rishi Mishra
ample, each step I will > join 5000 ids. > > Below code is not working. I'm not able to add result to ListBuffer. > Result s giving always ZERO > > *Code Block :-* > > var listOfIds is a ListBuffer with 2 records > > listOfIds.grouped(5000).foreach { x =>

How to use collections inside foreach block

2015-12-08 Thread Madabhattula Rajesh Kumar
0 ids. Below code is not working. I'm not able to add result to ListBuffer. Result s giving always ZERO *Code Block :-* var listOfIds is a ListBuffer with 2 records listOfIds.grouped(5000).foreach { x => { var v1 = new ListBuffer[String]() val r = sc.parallelize(x).toDF() r.registe

Re: Error in block pushing thread puts the KinesisReceiver in a stuck state

2015-12-02 Thread Akhil Das
r >> it encounters a the following exception "Error in block pushing thread - >> java.util.concurrent.TimeoutException: Futures timed out". >> I am running the application on spark-1.4.1 and using kinesis-asl-1.4. >> >> When this happens, the observation is th

Re: Error in block pushing thread puts the KinesisReceiver in a stuck state

2015-11-30 Thread Spark Newbie
Pinging again to see if anyone has any thoughts or prior experience with this issue. On Wed, Nov 25, 2015 at 3:56 PM, Spark Newbie wrote: > Hi Spark users, > > I have been seeing this issue where receivers enter a "stuck" state after > it encounters a the following exc

Error in block pushing thread puts the KinesisReceiver in a stuck state

2015-11-25 Thread Spark Newbie
Hi Spark users, I have been seeing this issue where receivers enter a "stuck" state after it encounters a the following exception "Error in block pushing thread - java.util.concurrent.TimeoutException: Futures timed out". I am running the application on spark-1.4.1 and u

streaming+sql with block has been removed error

2015-11-05 Thread ZhuGe
Hi all:I am trying to implement the "spark streaming +sql and dataframe" case described in this post https://databricks.com/blog/2015/07/30/diving-into-spark-streamings-execution-model.htmlI use rabbit mq as the datasource.My code sample is like this: countByValueAndWindow(Seconds(5), Second

Re: Spark Streaming: Some issues (Could not compute split, block —— not found) and questions

2015-08-25 Thread Akhil Das
You hit block not found issues when you processing time exceeds the batch duration (this happens with receiver oriented streaming). If you are consuming messages from Kafka then try to use the directStream or you can also set StorageLevel to MEMORY_AND_DISK with receiver oriented consumer. (This

Spark Streaming: Some issues (Could not compute split, block —— not found) and questions

2015-08-19 Thread jlg
gates (there are a lot of repeated keys across this time frame, and we want to combine them all -- we do this using reduceByKeyAndWindow). But even when trying to do 5 minute windows, we have issues with "Could not compute split, block —— not found". This is being run on a YARN cluster an

RE: Failed to fetch block error

2015-08-19 Thread java8964
e.org > Subject: Failed to fetch block error > > Hi, > > I see the following error in my Spark Job even after using like 100 cores > and 16G memory. Did any of you experience the same problem earlier? > > 15/08/18 21:51:23 ERROR shuffle.RetryingBlockFetcher: Failed to f

Failed to fetch block error

2015-08-18 Thread swetha
Hi, I see the following error in my Spark Job even after using like 100 cores and 16G memory. Did any of you experience the same problem earlier? 15/08/18 21:51:23 ERROR shuffle.RetryingBlockFetcher: Failed to fetch block input-0-1439959114400, and will not retry (0 retries

Unread block data error

2015-07-17 Thread Jem Tucker
Hi, I have been running a batch of data through my application for the last couple of days and this morning discovered it had fallen over with the following error. java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode

Re: java.lang.IllegalStateException: unread block data

2015-07-14 Thread Arthur Chan
I found the reason, it is about sc. Thanks On Tue, Jul 14, 2015 at 9:45 PM, Akhil Das wrote: > Someone else also reported this error with spark 1.4.0 > > Thanks > Best Regards > > On Tue, Jul 14, 2015 at 6:57 PM, Arthur Chan > wrote: > >> Hi, Below is the log form the worker. >> >> >> 15/07/14

Re: java.lang.IllegalStateException: unread block data

2015-07-14 Thread Akhil Das
Someone else also reported this error with spark 1.4.0 Thanks Best Regards On Tue, Jul 14, 2015 at 6:57 PM, Arthur Chan wrote: > Hi, Below is the log form the worker. > > > 15/07/14 17:18:56 ERROR FileAppender: Error writing stream to file > /spark/app-20150714171703-0004/5/stderr > > java.io.I

Re: java.lang.IllegalStateException: unread block data

2015-07-14 Thread Arthur Chan
Hi, Below is the log form the worker. 15/07/14 17:18:56 ERROR FileAppender: Error writing stream to file /spark/app-20150714171703-0004/5/stderr java.io.IOException: Stream closed at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170) at java.io.BufferedInputStream.read1(Buf

Re: java.lang.IllegalStateException: unread block data

2015-07-14 Thread Akhil Das
NFO MemoryStore: ensureFreeSpace(135360) called with > curMem=14724380, maxMem=280248975 > > 15/07/14 18:27:40 INFO MemoryStore: Block broadcast_256 stored as values > in memory (estimated size 132.2 KB, free 253.1 MB) > > 15/07/14 18:27:40 INFO MemoryStore: ensureFreeSpace(46231) ca

java.lang.IllegalStateException: unread block data

2015-07-14 Thread Arthur Chan
4 18:27:40 INFO MemoryStore: ensureFreeSpace(135360) called with curMem=14724380, maxMem=280248975 15/07/14 18:27:40 INFO MemoryStore: Block broadcast_256 stored as values in memory (estimated size 132.2 KB, free 253.1 MB) 15/07/14 18:27:40 INFO MemoryStore: ensureFreeSpace(46231) called with curMe

回复:Re: Re: Re: How to decrease the time of storing block in memory

2015-06-10 Thread luohui20001
w to decrease the time of storing block in memory 日期:2015年06月09日 18点05分 Hi 罗辉 I think you interpret the logs wrong. Your program actually runs from this point: (Rest of them are just starting up stuffs and connecting) 15/06/08 16:14:22 INFO broadcast.TorrentBroadcast: Started reading broad

Re: Re: Re: How to decrease the time of storing block in memory

2015-06-09 Thread Akhil Das
(1561) called with curMem=0, maxMem=370503843 15/06/08 16:14:23 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1561.0 B, free 353.3 MB) 15/06/08 16:14:23 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0 15/06/08 16:14:23 INFO

回复:Re: Re: How to decrease the time of storing block in memory

2015-06-09 Thread luohui20001
llcompare/data/user" + j + "/pgs/intermediateResult/result" + i + ".txt 600") pipeModify2.collect()sc.stop() } } ---- Thanks&Best regards! San.Luo - 原始邮件 - 发件人:Akhil Das 收件人:罗辉 抄送人:user 主题:Re: Re: How to decrease the time of storing block i

Re: Re: How to decrease the time of storing block in memory

2015-06-09 Thread Akhil Das
Best Regards On Tue, Jun 9, 2015 at 2:09 PM, wrote: > Only 1 minor GC, 0.07s. > > > > > Thanks&Best regards! > San.Luo > > - 原始邮件 - > 发件人:Akhil Das > 收件人:罗辉 > 抄送人:user > 主题:Re: How to decrease the time of storing

回复:Re: How to decrease the time of storing block in memory

2015-06-09 Thread luohui20001
Only 1 minor GC, 0.07s. Thanks&Best regards! San.Luo - 原始邮件 - 发件人:Akhil Das 收件人:罗辉 抄送人:user 主题:Re: How to decrease the time of storing block in memory 日期:2015年06月09日 15点02分 May be you should check in your driver UI and see if there's an

Re: How to decrease the time of storing block in memory

2015-06-09 Thread Akhil Das
t time-wasting part is below: > > 15/06/08 16:14:23 INFO storage.MemoryStore: Block broadcast_0 stored as > values in memory (estimated size 2.1 KB, free 353.3 MB) > 15/06/08 16:14:42 INFO executor.Executor: Finished task 0.0 in stage 0.0 > (TID 0). 693 bytes result sent to driver &

How to decrease the time of storing block in memory

2015-06-08 Thread luohui20001
hi there I am trying to descrease my app's running time in worker node. I checked the log and found the most time-wasting part is below:15/06/08 16:14:23 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 2.1 KB, free 353.3 MB) 15/06/08 16:14:42

Unread block data

2015-05-11 Thread Guy Needham
File at :24 scala> textInput take 10 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 17, hadoop-kn-t503.systems.private): java.lang.IllegalStateException: unread block data at java

Re: How to deal with code that runs before foreach block in Apache Spark?

2015-05-06 Thread Emre Sevinc
Imran, Gerard, Indeed your suggestions were correct and it helped me. Thank you for your replies. -- Emre On Tue, May 5, 2015 at 4:24 PM, Imran Rashid wrote: > Gerard is totally correct -- to expand a little more, I think what you > want to do is a solrInputDocumentJavaRDD.foreachPartition, in

Re: How to deal with code that runs before foreach block in Apache Spark?

2015-05-05 Thread Imran Rashid
Gerard is totally correct -- to expand a little more, I think what you want to do is a solrInputDocumentJavaRDD.foreachPartition, instead of solrInputDocumentJavaRDD.foreach: solrInputDocumentJavaRDD.foreachPartition( new VoidFunction>() { @Override public void call(Iterator docItr) {

Re: How to deal with code that runs before foreach block in Apache Spark?

2015-05-04 Thread Gerard Maas
I'm not familiar with the Solr API but provided that ' SolrIndexerDriver' is a singleton, I guess that what's going on when running on a cluster is that the call to: SolrIndexerDriver.solrInputDocumentList.add(elem) is happening on different singleton instances of the SolrIndexerDriver on diffe

How to deal with code that runs before foreach block in Apache Spark?

2015-05-04 Thread Emre Sevinc
I'm trying to deal with some code that runs differently on Spark stand-alone mode and Spark running on a cluster. Basically, for each item in an RDD, I'm trying to add it to a list, and once this is done, I want to send this list to Solr. This works perfectly fine when I run the following code in

Re: Spark SQL 1.3.1 "saveAsParquetFile" will output tachyon file with different block size

2015-04-28 Thread Calvin Jia
in core/pom.xml, make-distribution.sh and try to compile again, > many compilation errors raised. > > Thanks, > > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-SQL-1-3-1-saveAsParquetFile-will-output-tachyo

Re: Spark SQL 1.3.1 "saveAsParquetFile" will output tachyon file with different block size

2015-04-14 Thread Cheng Lian
loading a bunch of Parquet Files in Tachyon val ta3 =sqlContext.parquetFile("tachyon://tachyonserver:19998/apps/tachyon/zhangxf/parquetAdClick-6p-256m"); 2.Second, set the "fs.local.block.size" to 256M to make sure that block size of output file

Re: "Could not compute split, block not found" in Spark Streaming Simple Application

2015-04-13 Thread Saiph Kappa
ta Das >> wrote: >> >>> If it is deterministically reproducible, could you generate full DEBUG >>> level logs, from the driver and the workers and give it to me? Basically I >>> want to trace through what is happening to the block that is not being >

Re: "Could not compute split, block not found" in Spark Streaming Simple Application

2015-04-09 Thread Tathagata Das
Mar 27, 2015 at 5:32 PM, Tathagata Das > wrote: > >> If it is deterministically reproducible, could you generate full DEBUG >> level logs, from the driver and the workers and give it to me? Basically I >> want to trace through what is happening to the block that is not be

Re: Continuous WARN messages from BlockManager about block replication

2015-04-09 Thread Tathagata Das
I'm running a spark streaming job in local mode (--master local[4]), and > I'm seeing tons of these messages, roughly once every second - > > WARN BlockManager: Block input-0-1428527584600 replicated to only 0 > peer(s) instead of 1 peers > > We're using spark 1.

Continuous WARN messages from BlockManager about block replication

2015-04-09 Thread Nandan Tammineedi
Hi, I'm running a spark streaming job in local mode (--master local[4]), and I'm seeing tons of these messages, roughly once every second - WARN BlockManager: Block input-0-1428527584600 replicated to only 0 peer(s) instead of 1 peers We're using spark 1.2.1. Even with TRACE

Re: "Could not compute split, block not found" in Spark Streaming Simple Application

2015-04-09 Thread Saiph Kappa
ver and the workers and give it to me? Basically I > want to trace through what is happening to the block that is not being > found. > And can you tell what Cluster manager are you using? Spark Standalone, > Mesos or YARN? > > > On Fri, Mar 27, 2015 at 10:09 AM, Saiph Kappa &

Re: Spark Streaming Error in block pushing thread

2015-04-02 Thread Dean Wampler
e >>>> RDD 18 - Ask timed out on >>>> [Actor[akka.tcp:// >>>> sparkExecutor@10.1.242.221:43018/user/BlockManagerActor1#-1913092216]] >>>> after [3 ms]} >>>> WARN 2015-04-01 21:00:53,952 >>>> org.apache.spark.storage.BlockManagerMaster

Re: Spark Streaming Error in block pushing thread

2015-04-02 Thread Bill Young
ogWarning.71: Failed to >> remove >> RDD 17 - Ask timed out on >> [Actor[akka.tcp:// >> sparkExecutor@10.1.242.221:43018/user/BlockManagerActor1#-1913092216]] >> after [3 ms]} >> WARN 2015-04-01 21:00:53,952 >> org.apache.spark.storage.Bl

Re: Spark Streaming Error in block pushing thread

2015-04-02 Thread Bill Young
t;> WARN 2015-04-01 21:00:53,952 >>> org.apache.spark.storage.BlockManagerMaster.logWarning.71: Failed to >>> remove >>> RDD 17 - Ask timed out on >>> [Actor[akka.tcp:// >>> sparkExecutor@10.1.242.221:43018/user/BlockManagerActor1#-191309221

Re: Spark Streaming Error in block pushing thread

2015-04-02 Thread Dean Wampler
orage.BlockManagerMaster.logWarning.71: Failed to remove > RDD 16 - Ask timed out on > [Actor[akka.tcp:// > sparkExecutor@10.1.242.221:43018/user/BlockManagerActor1#-1913092216]] > after [3 ms]} > WARN 2015-04-01 21:00:54,151 > org.apache.spark.streaming.scheduler.ReceiverTr

Spark Streaming Error in block pushing thread

2015-04-02 Thread byoung
/BlockManagerActor1#-1913092216]] after [3 ms]} WARN 2015-04-01 21:00:54,151 org.apache.spark.streaming.scheduler.ReceiverTracker.logWarning.71: Error reported by receiver for stream 0: Error in block pushing thread - java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at

Spark streaming error in block pushing thread

2015-04-02 Thread Bill Young
che.spark.storage.BlockManagerMaster.logWarning.71: Failed to remove > RDD 16 - Ask timed out on [Actor[ > akka.tcp://sparkExecutor@10.1.242.221:43018/user/BlockManagerActor1#-1913092216]] > after [3 ms]} > WARN 2015-04-01 21:00:54,151 > org.apache.spark.streaming.sched

Re: "Could not compute split, block not found" in Spark Streaming Simple Application

2015-03-27 Thread Tathagata Das
If it is deterministically reproducible, could you generate full DEBUG level logs, from the driver and the workers and give it to me? Basically I want to trace through what is happening to the block that is not being found. And can you tell what Cluster manager are you using? Spark Standalone

"Could not compute split, block not found" in Spark Streaming Simple Application

2015-03-27 Thread Saiph Kappa
ID 140) 15/03/27 16:21:35 INFO MemoryStore: ensureFreeSpace(1886) called with curMem=47117, maxMem=280248975 15/03/27 16:21:35 INFO MemoryStore: Block broadcast_24_piece0 stored as bytes in memory (estimated size 1886.0 B, free 267.2 MB) 15/03/27 16:21:35 INFO BlockManagerMaster: Updated info of bloc

Re: ShuffleBlockFetcherIterator: Failed to get block(s)

2015-03-20 Thread Imran Rashid
e > beginning fetch of 10 outstanding blocks (after 3 retries) > > 15/03/19 23:29:45 ERROR storage.ShuffleBlockFetcherIterator: Failed to get > block(s) from : >

ShuffleBlockFetcherIterator: Failed to get block(s)

2015-03-20 Thread Eric Friedman
storage.ShuffleBlockFetcherIterator: Failed to get block(s) from :

Re: delay between removing the block manager of an executor, and marking that as lost

2015-03-04 Thread Akhil Das
You can look at the following - spark.akka.timeout - spark.akka.heartbeat.pauses from http://spark.apache.org/docs/1.2.0/configuration.html Thanks Best Regards On Tue, Mar 3, 2015 at 4:46 PM, twinkle sachdeva wrote: > Hi, > > Is there any relation between removing block mana

delay between removing the block manager of an executor, and marking that as lost

2015-03-03 Thread twinkle sachdeva
Hi, Is there any relation between removing block manager of an executor and marking that as lost? In my setup,even after removing block manager ( after failing to do some operation )...it is taking more than 20 mins, to mark that as lost executor. Following are the logs: *15/03/03 10:26:49

Re: SparkStreaming failing with exception Could not compute split, block input

2015-02-27 Thread Mukesh Jha
ng task 30.1 in >>> stage 451.0 (TID 22517, chsnmphbase30.usdc2.cloud.com, RACK_LOCAL, 1288 >>> bytes) >>> 15/02/25 05:32:43 INFO scheduler.TaskSetManager: Starting task 33.1 in >>> stage 451.0 (TID 22518, chsnmphbase26.usdc2.cloud.com, RACK_LOCAL, 1288 >>> b

Re: SparkStreaming failing with exception Could not compute split, block input

2015-02-27 Thread Mukesh Jha
k 35.1 in >> stage 451.0 (TID 22519, chsnmphbase19.usdc2.cloud.com, RACK_LOCAL, 1288 >> bytes) >> 15/02/25 05:32:43 INFO scheduler.TaskSetManager: Starting task 38.1 in >> stage 451.0 (TID 22520, chsnmphbase23.usdc2.cloud.com, RACK_LOCAL, 1288 >> bytes)

Re: SparkStreaming failing with exception Could not compute split, block input

2015-02-27 Thread Akhil Das
tarting task 38.1 in > stage 451.0 (TID 22520, chsnmphbase23.usdc2.cloud.com, RACK_LOCAL, 1288 > bytes) > 15/02/25 05:32:43 WARN scheduler.TaskSetManager: Lost task 32.1 in stage > 451.0 (TID 22511, chsnmphbase19.usdc2.cloud.com): java.lang.Exception: > Could not compute split, block inp

  1   2   >