Re: Spark1.3.1 build issue with CDH5.4.0 getUnknownFields

2015-05-29 Thread Chen Song
l.proto.ClientNamenodeProtocolProtos$SetOwnerRequestProto >> overrides final method *getUnknownFields* >> .()Lcom/google/protobuf/UnknownFieldSet; >> at java.lang.ClassLoader.defineClass1(Native Method) >> at java.lang.ClassLoader.defineClass(ClassLoader.java:800) >> >> >> >> -- Chen Song

(de)serialize DStream

2015-07-07 Thread Chen Song
In Spark Streaming, when using updateStateByKey, it requires the generated DStream to be checkpointed. It seems that it always use JavaSerializer, no matter what I set for spark.serializer. Can I use KryoSerializer for checkpointing? If not, I assume the key and value types have to be Serializable

Spark serialization in closure

2015-07-09 Thread Chen Song
ion is, how can I prevent Spark from serializing the parent class where RDD is defined, with still support of passing in function defined in other classes? -- Chen Song

Re: Spark serialization in closure

2015-07-09 Thread Chen Song
ing even though I only pass foo (another serializable > > object) in the closure? > > > > A more general question is, how can I prevent Spark from serializing the > > parent class where RDD is defined, with still support of passing in > > function defined in other classes? > > > > -- > > Chen Song > > > -- Chen Song

Re: Spark serialization in closure

2015-07-09 Thread Chen Song
t 4:09 PM, Chen Song wrote: > Thanks Erik. I saw the document too. That is why I am confused because as > per the article, it should be good as long as *foo *is serializable. > However, what I have seen is that it would work if *testing* is > serializable, even foo is not serializable,

Re: Spark serialization in closure

2015-07-09 Thread Chen Song
further in the foreachPartition closure. So following by >> that article, Scala will attempt to serialize the containing object/class >> testing to get the foo instance. >> >> On Thu, Jul 9, 2015 at 4:11 PM, Chen Song wrote: >> >>> Repost the code exa

Re: spark streaming with kafka reset offset

2015-07-14 Thread Chen Song
ce of data loss which can >>>>>>> alleviated using Write Ahead Logs (see streaming programming guide for >>>>>>> more >>>>>>> details, or see my talk [Slides PDF >>>>>>> <http://www.slideshare.net/SparkSummit/recipes-for-running-spark-streaming-apploications-in-production-tathagata-daspptx> >>>>>>> , Video >>>>>>> <https://www.youtube.com/watch?v=d5UJonrruHk&list=PL-x35fyliRwgfhffEpywn4q23ykotgQJ6&index=4> >>>>>>> ] from last Spark Summit 2015). But that approach can give >>>>>>> duplicate records. The direct approach gives exactly-once guarantees, so >>>>>>> you should try it out. >>>>>>> >>>>>>> TD >>>>>>> >>>>>>> On Fri, Jun 26, 2015 at 5:46 PM, Cody Koeninger >>>>>>> wrote: >>>>>>> >>>>>>>> Read the spark streaming guide ad the kafka integration guide for a >>>>>>>> better understanding of how the receiver based stream works. >>>>>>>> >>>>>>>> Capacity planning is specific to your environment and what the job >>>>>>>> is actually doing, youll need to determine it empirically. >>>>>>>> >>>>>>>> >>>>>>>> On Friday, June 26, 2015, Shushant Arora >>>>>>>> wrote: >>>>>>>> >>>>>>>>> In 1.2 how to handle offset management after stream application >>>>>>>>> starts in each job . I should commit offset after job completion >>>>>>>>> manually? >>>>>>>>> >>>>>>>>> And what is recommended no of consumer threads. Say I have 300 >>>>>>>>> partitions in kafka cluster . Load is ~ 1 million events per >>>>>>>>> second.Each >>>>>>>>> event is of ~500bytes. Having 5 receivers with 60 partitions each >>>>>>>>> receiver >>>>>>>>> is sufficient for spark streaming to consume ? >>>>>>>>> >>>>>>>>> On Fri, Jun 26, 2015 at 8:40 PM, Cody Koeninger < >>>>>>>>> c...@koeninger.org> wrote: >>>>>>>>> >>>>>>>>>> The receiver-based kafka createStream in spark 1.2 uses zookeeper >>>>>>>>>> to store offsets. If you want finer-grained control over offsets, >>>>>>>>>> you can >>>>>>>>>> update the values in zookeeper yourself before starting the job. >>>>>>>>>> >>>>>>>>>> createDirectStream in spark 1.3 is still marked as experimental, >>>>>>>>>> and subject to change. That being said, it works better for me in >>>>>>>>>> production than the receiver based api. >>>>>>>>>> >>>>>>>>>> On Fri, Jun 26, 2015 at 6:43 AM, Shushant Arora < >>>>>>>>>> shushantaror...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> I am using spark streaming 1.2. >>>>>>>>>>> >>>>>>>>>>> If processing executors get crashed will receiver rest the >>>>>>>>>>> offset back to last processed offset? >>>>>>>>>>> >>>>>>>>>>> If receiver itself got crashed is there a way to reset the >>>>>>>>>>> offset without restarting streaming application other than smallest >>>>>>>>>>> or >>>>>>>>>>> largest. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Is spark streaming 1.3 which uses low level consumer api, >>>>>>>>>>> stabe? And which is recommended for handling data loss 1.2 or 1.3 . >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Best Regards, >>>> Ayan Guha >>>> >>> >>> >> > -- Chen Song

Re: spark streaming with kafka reset offset

2015-07-14 Thread Chen Song
the offset ranges for a given rdd in the stream by >> typecasting to HasOffsetRanges. You can then store the offsets wherever >> you need to. >> >> On Tue, Jul 14, 2015 at 5:00 PM, Chen Song >> wrote: >> >>> A follow up question. >>> >>>

rest on streaming

2015-07-14 Thread Chen Song
batch interval. * Is that even possible? * The reason I think of this is because user can get a list of RDDs by DStream.window.slice but I cannot find a way to get the most recent RDD in the DSteam. -- Chen Song

Re: rest on streaming

2015-07-14 Thread Chen Song
n before asynchronous query has completed. Use this. > > streamingContext.remember( latest RDD>) > > > On Tue, Jul 14, 2015 at 6:57 PM, Chen Song wrote: > >> I have been POC adding a rest service in a Spark Streaming job. Say I >> create a stateful DStream X by us

Re: spark streaming with kafka reset offset

2015-07-14 Thread Chen Song
Thanks TD. As for 1), if timing is not guaranteed, how does exactly once semantics supported? It feels like exactly once receiving is not necessarily exactly once processing. Chen On Tue, Jul 14, 2015 at 10:16 PM, Tathagata Das wrote: > > > On Tue, Jul 14, 2015 at 6:42 PM, Chen So

NotSerializableException in spark 1.4.0

2015-07-15 Thread Chen Song
(StreamingContext.scala:586) -- Chen Song

Re: NotSerializableException in spark 1.4.0

2015-07-15 Thread Chen Song
fixed in 1.4.1 about to come out in the > next couple of days. That should help you to narrow down the culprit > preventing serialization. > > On Wed, Jul 15, 2015 at 1:12 PM, Ted Yu wrote: > >> Can you show us your function(s) ? >> >> Thanks >> >>

Re: NotSerializableException in spark 1.4.0

2015-07-15 Thread Chen Song
RK-8091] Fix a number of >> SerializationDebugger bugs and limitations >> ... >> Closes #6625 from tdas/SPARK-7180 and squashes the following commits: >> >> On Wed, Jul 15, 2015 at 2:37 PM, Chen Song >> wrote: >> >>> Thanks >>> >>> Can y

spark classloader question

2016-07-06 Thread Chen Song
Hi I ran into problems to use class loader in Spark. In my code (run within executor), I explicitly load classes using the ContextClassLoader as below. Thread.currentThread().getContextClassLoader() The jar containing the classes to be loaded is added via the --jars option in spark-shell/spark-s

Re: spark classloader question

2016-07-07 Thread Chen Song
Sorry to spam people who are not interested. Greatly appreciate it if anyone who is familiar with this can share some insights. On Wed, Jul 6, 2016 at 2:28 PM Chen Song wrote: > Hi > > I ran into problems to use class loader in Spark. In my code (run within > executor), I exp

Re: spark classloader question

2016-07-07 Thread Chen Song
marco > > On Thu, Jul 7, 2016 at 5:05 PM, Chen Song wrote: > >> Sorry to spam people who are not interested. Greatly appreciate it if >> anyone who is familiar with this can share some insights. >> >> On Wed, Jul 6, 2016 at 2:28 PM Chen Song wrote: >> &g

Re: spark classloader question

2016-07-07 Thread Chen Song
y > break other things (like: dependencies that Spark master required > initializing overridden by Spark app and so on) so, you will need to verify. > > [1] https://spark.apache.org/docs/latest/configuration.html > > On Thu, Jul 7, 2016 at 4:05 PM, Chen Song wrote: > >&

question on spark.streaming.kafka.maxRetries

2016-02-02 Thread Chen Song
For Kafka direct stream, is there a way to set the time between successive retries? From my testing, it looks like it is 200ms. Any way I can increase the time?

Access batch statistics in Spark Streaming

2016-02-08 Thread Chen Song
Apologize in advance if someone has already asked and addressed this question. In Spark Streaming, how can I programmatically get the batch statistics like schedule delay, total delay and processing time (They are shown in the job UI streaming tab)? I need such information to raise alerts in some

Re: Notification on Spark Streaming job failure

2015-09-28 Thread Chen Song
x27;s your opinion about my options? How do you people solve this > problem? Anything Spark specific? > I'll be grateful for any advice in this subject. > Thanks! > Krzysiek > > -- Chen Song

question on make multiple external calls within each partition

2015-10-05 Thread Chen Song
We have a use case with the following design in Spark Streaming. Within each batch, * data is read and partitioned by some key * forEachPartition is used to process the entire partition * within each partition, there are several REST clients created to connect to different REST services * for the

Re: question on make multiple external calls within each partition

2015-10-07 Thread Chen Song
6:07 PM, Ashish Soni wrote: > >> Need more details but you might want to filter the data first ( create >> multiple RDD) and then process. >> >> >> > On Oct 5, 2015, at 8:35 PM, Chen Song wrote: >> > >> > We have a use case with the followin

kerberos question

2015-11-03 Thread Chen Song
so, how to handle failover of namenodes for a streaming job in Spark. Thanks for your feedback in advance. -- Chen Song

Re: kerberos question

2015-11-04 Thread Chen Song
is right? The class relevant is below, https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/AMDelegationTokenRenewer.scala Chen On Tue, Nov 3, 2015 at 11:57 AM, Chen Song wrote: > We saw the following error happening in Spark Streaming job. Our job is

build spark 1.4.1 with JDK 1.6

2015-08-21 Thread Chen Song
that is due to the hadoop jar for cdh5.4.0 is built with JDK 7. Anyone has done this before? Thanks, -- Chen Song

Re: build spark 1.4.1 with JDK 1.6

2015-08-21 Thread Chen Song
Thanks Sean. So how PySpark is supported. I thought PySpark needs jdk 1.6. Chen On Fri, Aug 21, 2015 at 11:16 AM, Sean Owen wrote: > Spark 1.4 requires Java 7. > > On Fri, Aug 21, 2015, 3:12 PM Chen Song wrote: > >> I tried to build Spark 1.4.1 on cdh 5.4.0. Because

application logs for long lived job on YARN

2015-08-26 Thread Chen Song
gated logs on hdfs after the job is killed or failed. Is there a YARN config to keep the logs from being deleted for long-lived streaming job? -- Chen Song

Re: JDBC Streams

2015-08-26 Thread Chen Song
h spark streaming > (which kind of blocks your resources). On the other hand, if you use job > server then you can actually use the resources (CPUs) for other jobs also > when your dbjob is not using them. > > Thanks > Best Regards > > On Sun, Jul 5, 2015 at 5:28 AM, ayan guha wrote: > > Hi All > > I have a requireent to connect to a DB every few minutes and bring data to > HBase. Can anyone suggest if spark streaming would be appropriate for this > senario or I shoud look into jobserver? > > Thanks in advance > > -- > Best Regards, > Ayan Guha > > > > > > -- > Best Regards, > Ayan Guha > > > > > -- > Best Regards, > Ayan Guha > -- Chen Song

Re: JDBC Streams

2015-08-26 Thread Chen Song
nly changes every few days, why not restart the job every > few days, and just broadcast the data? > > Or you can keep a local per-jvm cache with an expiry (e.g. guava cache) to > avoid many mysql reads > > On Wed, Aug 26, 2015 at 9:46 AM, Chen Song wrote: > >> Piggyback on

Re: application logs for long lived job on YARN

2015-08-27 Thread Chen Song
Anyone has similar problem or thoughts on this? On Wed, Aug 26, 2015 at 10:37 AM, Chen Song wrote: > When running long-lived job on YARN like Spark Streaming, I found that > container logs gone after days on executor nodes, although the job itself > is still running. > > > I

Kafka Direct Stream join without data shuffle

2015-09-02 Thread Chen Song
data. Each partition in derivedStream only needs to be joined with the corresponding partition in the original parent inputStream it is generated from. My question is 1. Is there a Partitioner defined in KafkaRDD at all? 2. How would I preserve the partitioning scheme and avoid data shuffle? --

Re: Does it has a way to config limit in query on STS by default?

2016-08-30 Thread Chen Song
I tried both of the following with STS but neither works for me. Starting STS with --hiveconf hive.limit.optimize.fetch.max=50 and Setting common.max_count in Zeppelin Without setting such limits, a query that outputs lots of rows could cause the driver to OOM and makes TS unusable. Any workaro

Spark Streaming action not triggered with Kafka inputs

2015-01-23 Thread Chen Song
quot;Streaming" tab, it show messages below: Batch Processing Statistics No statistics have been generated yet. Am I doing anything wrong on the parallel receiving part? -- Chen Song

org.apache.spark.SparkException Error sending message

2015-03-13 Thread Chen Song
-- Chen Song

shuffle write size

2015-03-17 Thread Chen Song
I have a map reduce job that reads from three logs and joins them on some key column. The underlying data is protobuf messages in sequence files. Between mappers and reducers, the underlying raw byte arrays for protobuf messages are shuffled . Roughly, for 1G input from HDFS, there is 2G data outpu

Re: shuffle write size

2015-03-26 Thread Chen Song
Anyone can shed some light on this? On Tue, Mar 17, 2015 at 5:23 PM, Chen Song wrote: > I have a map reduce job that reads from three logs and joins them on some > key column. The underlying data is protobuf messages in sequence > files. Between mappers and reducers, the underlying

FetchFailedException during shuffle

2015-03-26 Thread Chen Song
Using spark 1.3.0 on cdh5.1.0, I was running a fetch failed exception. I searched in this email list but not found anything like this reported. What could be the reason for the error? org.apache.spark.shuffle.FetchFailedException: [EMPTY_INPUT] Cannot decompress empty stream at org.apach

spark streaming driver hang

2015-03-27 Thread Chen Song
g. Anyone has seen this before? -- Chen Song

Fwd: spark streaming questions

2014-06-16 Thread Chen Song
imply windowing scenario as which date the data belongs to is not the same thing when the data arrives). How can I handle this to tell spark to discard data for old dates? Thank you, Best Chen -- Chen Song

spark streaming questions

2014-06-17 Thread Chen Song
Hey I am new to spark streaming and apologize if these questions have been asked. * In StreamingContext, reduceByKey() seems to only work on the RDDs of the current batch interval, not including RDDs of previous batches. Is my understanding correct? * If the above statement is correct, what func

Re: spark streaming questions

2014-06-25 Thread Chen Song
Thanks Anwar. On Tue, Jun 17, 2014 at 11:54 AM, Anwar Rizal wrote: > > On Tue, Jun 17, 2014 at 5:39 PM, Chen Song wrote: > >> Hey >> >> I am new to spark streaming and apologize if these questions have been >> asked. >> >> * In StreamingContext, re

semi join spark streaming

2014-06-25 Thread Chen Song
job to load the set of values from disk and cache to each worker? -- Chen Song

spark streaming counter metrics

2014-06-30 Thread Chen Song
I am new to spark streaming and wondering if spark streaming tracks counters (e.g., how many rows in each consumer, how many rows routed to an individual reduce task, etc.) in any form so I can get an idea of how data is skewed? I checked spark job page but don't seem to find any. -- Chen Song

spark streaming rate limiting from kafka

2014-07-01 Thread Chen Song
ually for spark streaming and doesn't seem to be question in this group. But I am raising it anyway here.) I have looked at code example below but doesn't seem it is supported. KafkaUtils.createStream ... Thanks, All -- Chen Song

Re: spark streaming counter metrics

2014-07-02 Thread Chen Song
t a lot of pieces to > make sense of the flow.. > > Mayur Rustagi > Ph: +1 (760) 203 3257 > http://www.sigmoidanalytics.com > @mayur_rustagi <https://twitter.com/mayur_rustagi> > > > > On Mon, Jun 30, 2014 at 9:58 PM, Chen Song wrote: > >> I am new to

how to pass extra Java opts to workers for spark streaming jobs

2014-07-17 Thread Chen Song
. Any idea on how I should do this? -- Chen Song

Re: spark streaming rate limiting from kafka

2014-07-17 Thread Chen Song
Thanks Luis and Tobias. On Tue, Jul 1, 2014 at 11:39 PM, Tobias Pfeiffer wrote: > Hi, > > On Wed, Jul 2, 2014 at 1:57 AM, Chen Song wrote: >> >> * Is there a way to control how far Kafka Dstream can read on >> topic-partition (via offset for example). By setting th

Re: how to pass extra Java opts to workers for spark streaming jobs

2014-07-17 Thread Chen Song
t works. > Andrew > > > 2014-07-17 18:15 GMT-07:00 Tathagata Das : > > Can you check in the environment tab of Spark web ui to see whether this >> configuration parameter is in effect? >> >> TD >> >> >> On Thu, Jul 17, 2014 at 2:05 PM, Che

Re: how to pass extra Java opts to workers for spark streaming jobs

2014-07-18 Thread Chen Song
-XX:+UseConcMarkSweepGC -Dspark.config.one=value > -Dspark.config.two=value" > > (Please note that this is only for Spark 0.9. The part where we set Spark > options within SPARK_JAVA_OPTS is deprecated as of 1.0) > > > 2014-07-17 21:08 GMT-07:00 Chen Song : > > Thanks Andr

Re: spark streaming rate limiting from kafka

2014-07-18 Thread Chen Song
Jul 18, 2014 at 8:11 AM, Bill Jay >> wrote: >> >>> I also have an issue consuming from Kafka. When I consume from Kafka, >>> there are always a single executor working on this job. Even I use >>> repartition, it seems that there is still a single executor. Does an

Re: Distribute data from Kafka evenly on cluster

2014-07-18 Thread Chen Song
many connections are made to Kafka. Note that the number of >> Kafka partitions for that topic must be at least N for this to work. >> >> Thanks >> Tobias >> > > -- Chen Song

spark streaming multiple file output paths

2014-08-07 Thread Chen Song
/day. -- Chen Song

increase parallelism of reading from hdfs

2014-08-08 Thread Chen Song
, no matter what value I set dfs.blocksize on driver/executor. I am wondering if there is a way to increase parallelism, say let each map read 128M data and increase the number of map tasks? -- Chen Song

Re: saveAsTextFiles file not found exception

2014-08-11 Thread Chen Song
, Jul 25, 2014 at 1:32 PM, Bill Jay >> wrote: >> >>> Hi, >>> >>> I am running a Spark Streaming job that uses saveAsTextFiles to save >>> results into hdfs files. However, it has an exception after 20 batches >>> >>> >>> result-140631234/_temporary/0/task_201407251119__m_03 does not >>> exist. >>> >>> >>> When the job is running, I do not change any file in the folder. Does >>> anyone know why the file cannot be found? >>> >>> Thanks! >>> >>> Bill >>> >> >> > -- Chen Song

Re: saveAsTextFiles file not found exception

2014-08-11 Thread Chen Song
The exception was thrown out in application master(spark streaming driver) and the job shut down after this exception. On Mon, Aug 11, 2014 at 10:29 AM, Chen Song wrote: > I got the same exception after the streaming job runs for a while, The > ERROR message was complaining about a tem

Re: saveAsTextFiles file not found exception

2014-08-11 Thread Chen Song
Bill Did you get this resolved somehow? Anyone has any insight into this problem? Chen On Mon, Aug 11, 2014 at 10:30 AM, Chen Song wrote: > The exception was thrown out in application master(spark streaming driver) > and the job shut down after this exception. > > > On Mon,

Re: saveAsTextFiles file not found exception

2014-08-11 Thread Chen Song
n seeing similar stacktraces on Spark core (not streaming) > and have a theory it's related to spark.speculation being turned on. Do > you have that enabled by chance? > > > On Mon, Aug 11, 2014 at 8:10 AM, Chen Song wrote: > >> Bill >> >> Did you get this r

Re: increase parallelism of reading from hdfs

2014-08-11 Thread Chen Song
could probably set the system property > "mapreduce.input.fileinputformat.split.maxsize". > > Regards, > Paul Hamilton > > From: Chen Song > Date: Friday, August 8, 2014 at 9:13 PM > To: "user@spark.apache.org" > Subject: increase parallelism of reading from hdfs > > > In S

Re: saveAsTextFiles file not found exception

2014-08-12 Thread Chen Song
re's no reliable way to run large jobs on Spark >> right now! >> >> I'm going to file a bug for the _temporary and LeaseExpiredExceptions as >> I think these are widespread enough that we need a place to track a >> resolution. >> >> >&

spark job hung up

2014-09-20 Thread Chen Song
/16 06:48:22 INFO BlockManagerMaster: Trying to register BlockManager 14/09/16 06:48:22 INFO BlockManagerMaster: Registered BlockManager 14/09/16 06:48:22 INFO BlockManager: Reporting 63 blocks to the master. -- Chen Song

exception in spark 1.1.0

2014-09-20 Thread Chen Song
) -- Chen Song

Re: exception in spark 1.1.0

2014-09-22 Thread Chen Song
:56 libsnappy.so -> libsnappy.so.1.1.3 Chen On Sat, Sep 20, 2014 at 9:31 AM, Ted Yu wrote: > Can you tell us how you installed native snappy ? > > See D.3.1.5 in: > http://hbase.apache.org/book.html#snappy.compression.installation > > Cheers > > On Sat, Sep 20, 2014 at 2:1

spark time out

2014-09-22 Thread Chen Song
I am using Spark 1.1.0 and have seen a lot of Fetch Failures due to the following exception. java.io.IOException: sendMessageReliably failed because ack was not received within 60 sec at org.apache.spark.network.ConnectionManager$$anon$5$$anonfun$run$15.apply(ConnectionManager.scala:854)

Re: spark time out

2014-09-23 Thread Chen Song
I am running the job on 500 executors, each with 8G and 1 core. See lots of fetch failures on reduce stage, when running a simple reduceByKey map tasks -> 4000 reduce tasks -> 200 On Mon, Sep 22, 2014 at 12:22 PM, Chen Song wrote: > I am using Spark 1.1.0 and have seen a lot

Re: Spark SQL and Hive tables

2014-09-30 Thread Chen Song
t; val trainingDataTable = sql("""SELECT prod.prod_num, > demographics.gender, demographics.birth_year, demographics.income_group > FROM prod p JOIN demographics d ON d.user_id = p.user_id""") > > 14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch > MultiInstanceRelations > 14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch > CaseInsensitiveAttributeReferences > java.lang.RuntimeException: Table Not Found: prod. > > I have these tables in hive. I used show tables command to confirm this. > Can someone please let me know how do I make them accessible here? > > > > > > -- Chen Song

Does SparkSQL work with custom defined SerDe?

2014-10-13 Thread Chen Song
ion.executedPlan(SQLContext.scala:406) at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438) at com.appnexus.data.spark.sql.Test$.main(Test.scala:23) at com.appnexus.data.spark.sql.Test.main(Test.scala) ... 5 more -- Chen Song

Re: Does SparkSQL work with custom defined SerDe?

2014-10-14 Thread Chen Song
attributes) when running simple select on valid column names and malformed column names. This lead me to suspect that syntactical breaks somewhere. select [valid_column] from table limit 5; select [malformed_typo_column] from table limit 5; On Mon, Oct 13, 2014 at 6:04 PM, Chen Song wrote: > In H

Re: Does SparkSQL work with custom defined SerDe?

2014-10-14 Thread Chen Song
Looks like it may be related to https://issues.apache.org/jira/browse/SPARK-3807. I will build from branch 1.1 to see if the issue is resolved. Chen On Tue, Oct 14, 2014 at 10:33 AM, Chen Song wrote: > Sorry for bringing this out again, as I have no clue what could have > caused this.

Re: Shuffle files

2014-10-20 Thread Chen Song
his message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-files-tp15185p15869.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> - >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >>> - >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >> > -- Chen Song

serialize protobuf messages

2014-12-23 Thread Chen Song
Silly question, what is the best way to shuffle protobuf messages in Spark (Streaming) job? Can I use Kryo on top of protobuf Message type? -- Chen Song

Re: "Fetch Failure"

2014-12-23 Thread Chen Song
User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Chen Song

Re: serialize protobuf messages

2014-12-30 Thread Chen Song
Anyone has suggestions? On Tue, Dec 23, 2014 at 3:08 PM, Chen Song wrote: > Silly question, what is the best way to shuffle protobuf messages in Spark > (Streaming) job? Can I use Kryo on top of protobuf Message type? > > -- > Chen Song > > -- Chen Song