Hi,
I followed the information on
https://www.mail-archive.com/reviews@spark.apache.org/msg141113.html to
save orc file with spark 1.2.1.
I can save data to a new orc file. I wonder how to save data to an
existing and partitioned orc file? Any suggestions?
BR,
Patcharee
current.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Best,
Patcharee
-
To unsubscribe, e-mail:
Hi,
What can be the cause of this ERROR cluster.YarnScheduler: Lost
executor? How can I fix it?
Best,
Patcharee
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h
at com.sun.proxy.$Proxy37.alter_partition(Unknown Source)
at
org.apache.hadoop.hive.ql.metadata.Hive.alterPartition(Hive.java:469)
... 26 more
BR,
Patcharee
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For addit
943,
chunkIndex=1},
buffer=FileSegmentManagedBuffer{file=/hdisk3/hadoop/yarn/local/usercache/patcharee/appcache/application_1432633634512_0213/blockmgr-12d59e6b-0895-4a0e-9d06-152d2f7ee855/09/shuffle_0_56_0.data,
offset=896, length=1132499356}} to /10.10.255.238:35430; closing connect
1.3.1, is the problem from the
https://issues.apache.org/jira/browse/SPARK-4516?
Best,
Patcharee
On 03. juni 2015 10:11, Akhil Das wrote:
Which version of spark? Looks like you are hitting this one
https://issues.apache.org/jira/browse/SPARK-4516
Thanks
Best Regards
On Wed, Jun 3, 2015 at 1
nt.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Best,
Patcharee
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
Hi,
I has this problem before, and in my case it is because the
executor/container was killed by yarn when it used more memory than
allocated. You can check if your case is the same by checking yarn node
manager log.
Best,
Patcharee
On 05. juni 2015 07:25, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
I see this
# partitions). at foreach there are > 1000 tasks as
well, but 50 tasks (same as the # all key combination) gets datasets.
How can I fix this problem? Any suggestions are appreciated.
BR,
Patcharee
-
To unsubscrib
Hi,
I try to insert data into a partitioned hive table. The groupByKey is to
combine dataset into a partition of the hive table. After the
groupByKey, I converted the iterable[X] to DB by X.toList.toDF(). But
the hiveContext.sql throws NullPointerException, see below. Any
suggestions? What c
Hi,
How can I expect to work on HiveContext on the executor? If only the
driver can see HiveContext, does it mean I have to collect all datasets
(very large) to the driver and use HiveContext there? It will be memory
overload on the driver and fail.
BR,
Patcharee
On 07. juni 2015 11:51
Hi,
Thanks for your guidelines. I will try it out.
Btw how do you know HiveContext.sql (and also
DataFrame.registerTempTable) is only expected to be invoked on driver
side? Where can I find document?
BR,
Patcharee
On 07. juni 2015 16:40, Cheng Lian wrote:
Spark SQL supports Hive dynamic
ot;:true,\"metadata\":{}},{\"name\":\"v\",\"type\":\"float\",\"nullable\":true,\"metadata\":{}},{\"name\":\"zone\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}}
hemaFor(ScalaReflection.scala:59)
at
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:28)
at
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:410)
at
org.apache.spark.sql.SQLContext$implicits$.rddToDataFrameHold
")
.mode(org.apache.spark.sql.SaveMode.Append).partitionBy("zone","z","year","month").saveAsTable("test4DimBySpark")
---
The table contains 23 columns (longer than Tuple maximum length), so I
use Row Object to store raw data, not Tupl
I found if I move the partitioned columns in schemaString and in Row to
the end of the sequence, then it works correctly...
On 16. juni 2015 11:14, patcharee wrote:
Hi,
I am using spark 1.4 and HiveContext to append data into a partitioned
hive table. I found that the data insert into the
Hi,
I am having this problem on spark 1.4. Do you have any ideas how to
solve it? I tried to use spark.executor.extraClassPath, but it did not help
BR,
Patcharee
On 04. mai 2015 23:47, Imran Rashid wrote:
Oh, this seems like a real pain. You should file a jira, I didn't see
an open
I can also use dataframe. Any suggestions?
Best,
Patcharee
On 20. april 2016 10:43, Gourav Sengupta wrote:
Is there any reason why you are not using data frames?
Regards,
Gourav
On Tue, Apr 19, 2016 at 8:51 PM, pth001 <mailto:patcharee.thong...@uni.no>> wrote:
Hi,
How ca
vance!
Patcharee
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Hi,
How to visualize realtime data (in graph/chart) from spark streaming?
Any tools?
Best,
Patcharee
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
raises up to 10,000, stays at
10,000 a while and drops to about 7000-8000.
- When clients = 20,000 the event rate raises up to 20,000, stays at
20,000 a while and drops to about 15000-17000. The same pattern
Processing time is just about 400 ms.
Any ideas/suggestions?
Thanks,
Patcharee
().print()
The problem is sometimes the data received from scc.textFileStream is
ONLY ONE line. But in fact there are multiple lines in the new file
found in that interval. See log below which shows three intervals. In
the 2nd interval, the new file is:
hdfs://helmhdfs/user/patcharee/cerdata
I moved them every interval to the monitored directory.
Patcharee
On 25. jan. 2016 22:30, Shixiong(Ryan) Zhu wrote:
Did you move the file into "hdfs://helmhdfs/user/patcharee/cerdata/",
or write into it directly? `textFileStream` requires that files must
be written to the monitored
Hi,
In pyspark how to filter if a column of dataframe is not empty?
I tried:
dfNotEmpty = df.filter(df['msg']!='')
It did not work.
Thanks,
Patcharee
-
To unsubscribe, e-mail: user-unsubscr...@
the topic's partitions). However some executors are given more than 1
tasks and work on these tasks sequentially.
Why Spark does not distribute these 10 tasks to 10 executors? How to do
that?
Thanks,
Patcharee
to force the
spark sql to use less tasks?
BR,
Patcharee
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
Hi,
I am using spark sql 1.5 to query a hive table stored as partitioned orc
file. We have the total files is about 6000 files and each file size is
about 245MB.
What is the difference between these two query methods below:
1. Using query on hive table directly
hiveContext.sql("select col1,
Yes, the predicate pushdown is enabled, but still take longer time than
the first method
BR,
Patcharee
On 08. okt. 2015 18:43, Zhan Zhang wrote:
Hi Patcharee,
Did you enable the predicate pushdown in the second method?
Thanks.
Zhan Zhang
On Oct 8, 2015, at 1:43 AM, patcharee wrote:
Hi
this time in the log pushdown predicate was generated but results was
wrong (no results at all)
15/10/09 18:36:06 INFO OrcInputFormat: ORC pushdown predicate: leaf-0 =
(EQUALS x 320)
expr = leaf-0
Any ideas What wrong with this? Why the ORC pushdown predicate is not
applied by the system?
BR
I set hiveContext.setConf("spark.sql.orc.filterPushdown", "true"). But
from the log No ORC pushdown predicate for my query with WHERE clause.
15/10/09 19:16:01 DEBUG OrcInputFormat: No ORC pushdown predicate
I did not understand what wrong with this.
BR,
Patcharee
On
Hi,
Is it possible to execute native system commands (in parallel) Spark,
like scala.sys.process ?
Best,
Patcharee
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h
lem is each group after filtered is handled by an executor one
by one. How to change the code to allow each group run in parallel?
I looked at groupBy, but seem only for aggregation.
Thanks,
Patcharee
Hi,
In spark streaming how to count the total number of message (from
Socket) in one batch?
Thanks,
Patcharee
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h
Hi,
On my history server UI, I cannot see "streaming" tab for any streaming
jobs? I am using version 1.5.1. Any ideas?
Thanks,
Patcharee
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional c
I meant there is no streaming tab at all. It looks like I need version 1.6
Patcharee
On 02. des. 2015 11:34, Steve Loughran wrote:
The history UI doesn't update itself for live apps (SPARK-7889) -though I'm
working on it
Are you trying to view a running streaming job?
On 2 Dec 2
Hi
How can I see the summary of data read / write, shuffle read / write,
etc of an Application, not per stage?
Thanks,
Patcharee
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail
need to configure
the history UI somehow to get such interface?
Thanks,
Patcharee
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
I ran streaming jobs, but no streaming tab appeared for those jobs.
Patcharee
On 04. des. 2015 18:12, PhuDuc Nguyen wrote:
I believe the "Streaming" tab is dynamic - it appears once you have a
streaming job running, not when the cluster is simply up. It does not
depend on 1.6 and h
log of these two input splits (check python.PythonRunner:
Times: total ... )
15/12/08 07:37:15 INFO rdd.NewHadoopRDD: Input split:
hdfs://helmhdfs/user/patcharee/ntap-raw-20151015-20151126/html2/budisansblog.blogspot.com.html:39728447488+134217728
15/12/08 08:49:30 INFO python.PythonRunner
y configuration explicitly? Any suggestions?
BR,
Patcharee
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
and low gc
time as others.
What can impact the executor computing time? Any suggestions what
parameters I should monitor/configure?
BR,
Patcharee
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additio
.scala:143
15/09/16 11:21:08 INFO DAGScheduler: Got job 2 (saveAsTextFile at
GenerateHistogram.scala:143) with 1 output partitions
15/09/16 11:21:08 INFO DAGScheduler: Final stage: ResultStage
2(saveAsTextFile at GenerateHistogram.scala:143)
BR,
ld not find function "rbga"
at org.apache.spark.api.r.RRunner.compute(RRunner.scala:108)
at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:51)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala
Any ide
differently on job submit and shell?
Best,
Patcharee
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
nts
[no.uni.computing.io.WRFIndex,no.uni.computing.io.WRFVariable,no.uni.computing.io.input.NetCDFFileInputFormat]
do not conform to method newAPIHadoopFile's type parameter bounds [K,V,F
<: org.apache.hadoop.mapreduce.InputFormat[K,V]]
What is the correct syntax for scala api?
Best
This is the declaration of my custom inputformat
public class NetCDFFileInputFormat extends ArrayBasedFileInputFormat
public abstract class ArrayBasedFileInputFormat extends
org.apache.hadoop.mapreduce.lib.input.FileInputFormat
Best,
Patcharee
On 25. feb. 2015 10:15, patcharee wrote:
Hi
complain. Please let me know if this solution is
not good enough.
Patcharee
On 25. feb. 2015 10:57, Sean Owen wrote:
OK, from the declaration you sent me separately:
public class NetCDFFileInputFormat extends ArrayBasedFileInputFormat
public abstract class ArrayBasedFileInputFormat extends
thread "main" org.apache.spark.SparkException: Job aborted
due to stage failure: Task 0.0 in stage 0.0 (TID 0) had a not
serializable result: no.uni.computing.io.WRFVariableText
Any ideas?
Best,
Patcharee
-
To unsubscribe, e-mail: user-unsubs
belongs to a method of a case class, it should be executed
sequentially? Any ideas?
Best,
Patcharee
---
java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:313)
at scala.None
Hi,
How can I insert an existing hive table with an RDD containing my data?
Any examples?
Best,
Patcharee
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h
Hi,
I guess that toDF() api in spark 1.3 which is required build from source
code?
Patcharee
On 03. mars 2015 13:42, Cheng, Hao wrote:
Using the SchemaRDD / DataFrame API via HiveContext
Assume you're using the latest code, something probably like:
val hc = new HiveContext(sc)
i
month, zone) is from user input. If I would
like to get the value of the partitioned column from the temporary
table, how can I do that?
BR,
Patcharee
I would like to insert the table, and the value of the partition column
to be inserted must be from temporary registered table/dataframe.
Patcharee
On 16. mars 2015 15:26, Cheng Lian wrote:
Not quite sure whether I understand your question properly. But if you
just want to read the
spark.yarn.historyServer.address sandbox.hortonworks.com:19888
But got Exception in thread "main" java.lang.ClassNotFoundException:
org.apache.spark.deploy.yarn.history.YarnHistoryProvider
What class is really needed? How to fix it?
Br,
ive Method)
at java.lang.Class.forName(Class.java:191)
at
org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:183)
at
org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
Patcharee
On 18. mars 2015 11:35, Akhil Das wrote:
You can simply
Hi,
My spark was compiled with yarn profile, I can run spark on yarn without
problem.
For the spark job history server problem, I checked
spark-assembly-1.3.0-hadoop2.4.0.jar and found that the package
org.apache.spark.deploy.yarn.history is missing. I don't know why
BR,
Patc
Hello,
How to override log4j.properties for a specific spark job?
BR,
Patcharee
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
table(key INT, value STRING) stored as
orc")
hiveContext.hql("INSERT INTO table orc_table select * from testtable")
--> Caused by:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
Permission denied: user=patcharee, access=WRITE,
inod
factor of time spending on these steps?
BR,
Patcharee
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
not sorted / indexed
- the split strategy hive.exec.orc.split.strategy
BR,
Patcharee
On 10/09/2015 08:01 PM, Zhan Zhang wrote:
That is weird. Unfortunately, there is no debug info available on this
part. Can you please open a JIRA to add some debug information on the
driver side?
Thanks.
Zhan
Hi Zhan Zhang,
Here is the issue https://issues.apache.org/jira/browse/SPARK-11087
BR,
Patcharee
On 10/13/2015 06:47 PM, Zhan Zhang wrote:
Hi Patcharee,
I am not sure which side is wrong, driver or executor. If it is
executor side, the reason you mentioned may be possible. But if the
?
Thanks,
Patcharee
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
Hi,
Is there a counter for data local read? I understood that it is locality
level counter, but it seems not.
Thanks,
Patcharee
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail
Hi,
In python how to use inputformat/custom recordreader?
Thanks,
Patcharee
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
)
Any ideas? I tested the same code on spark shell, it worked.
Best,
Patcharee
tory.value / "lib"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.0"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.3.0"
libraryDependencies += "org.apache.spark" %% "spark-sql&quo
der.java:177)
at
org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:102)
at org.apache.spark.launcher.Main.main(Main.java:74)
Any ideas?
Patcharee
67 matches
Mail list logo