date:20170104

Re: Skip Corrupted Parquet blocks / footer.

2017-01-04 Thread Liang-Chi Hsieh

Hi, The method readAllFootersInParallel is implemented in Parquet's ParquetFileReader. So the spark config "spark.sql.files.ignoreCorruptFiles" doesn't work for it. Reading all footers in parallel can speed up the task. However, we can't control if ignoring corrupt files or not. Of course we ca

Re: Skip Corrupted Parquet blocks / footer.

2017-01-04 Thread Liang-Chi Hsieh

Forget to say, another option is we can replace readAllFootersInParallel with our parallel reading logic, so we can ignore corrupt files. Liang-Chi Hsieh wrote > Hi, > > The method readAllFootersInParallel is implemented in Parquet's > ParquetFileReader. So the spark config > "spark.sql.files.

Re: Why ShuffleMapTask has transient locs and preferredLocs?!

2017-01-04 Thread Jacek Laskowski

Hi Imran, Yes, you're right. I stand corrected! Thanks. This is the part that opened my eyes: > By the time that task has been assigned a location, and its running on an > executor, it doesn't matter anymore. That's why a task does not have to have it after deserialization (!) Thanks a lot. O

Re: Apache Hive with Spark Configuration

2017-01-04 Thread Chetan Khatri

Ryan, I agree that Hive 1.2.1 work reliably with Spark 2.x , but i went through with current stable version of Hive which is 2.0.1 and I am working with that. seems good but i want to make sure the which version of Hive is more reliable with Spark 2.x and i think @Ryan you replied the same which i

Re: Dependency Injection and Microservice development with Spark

2017-01-04 Thread Chetan Khatri

Lars, Thank you, I want to use DI for configuring all the properties (wiring) for below architectural approach. Oracle -> Kafka Batch (Event Queuing) -> Spark Jobs( Incremental load from HBase -> Hive with Transformation) -> Spark Transformation -> PostgreSQL Thanks. On Thu, Dec 29, 2016 at 3:2

Re: Approach: Incremental data load from HBASE

2017-01-04 Thread Chetan Khatri

Ted Yu, You understood wrong, i said Incremental load from HBase to Hive, individually you can say Incremental Import from HBase. On Wed, Dec 21, 2016 at 10:04 PM, Ted Yu wrote: > Incremental load traditionally means generating hfiles and > using org.apache.hadoop.hbase.mapreduce.LoadIncrementa

Re: Dependency Injection and Microservice development with Spark

2017-01-04 Thread Jiří Syrový

Hi, another nice approach is to use instead of it Reader monad and some framework to support this approach (e.g. Grafter - https://github.com/zalando/grafter). It's lightweight and helps a bit with dependencies issues. 2016-12-28 22:55 GMT+01:00 Lars Albertsson : > Do you really need dependency

Quick request: prolific PR openers, review your open PRs

2017-01-04 Thread Sean Owen

Just saw that there are many people with >= 8 open PRs. Some are legitimately in flight but many are probably stale. To set a good example, would (everyone) mind flicking through what they've got open and see if some PRs are stale and should be closed? https://spark-prs.appspot.com/users Username

Re: Quick request: prolific PR openers, review your open PRs

2017-01-04 Thread Liang-Chi Hsieh

Ok. I will go through and check my open PRs. Sean Owen wrote > Just saw that there are many people with >= 8 open PRs. Some are > legitimately in flight but many are probably stale. To set a good example, > would (everyone) mind flicking through what they've got open and see if > some PRs are st

Clarification about typesafe aggregations

2017-01-04 Thread geoHeil

Hi I would like to know more about typeface aggregations in spark. http://stackoverflow.com/questions/40596638/inquiries-about-spark-2-0-dataset/40602882?noredirect=1#comment70139481_40602882 An example of these is https://blog.codecentric.de/en/2016/07/spark-2-0-datasets-case-classes/ ds.groupByK

Converting an InternalRow to a Row

2017-01-04 Thread Andy Dang

Hi all, (cc-ing dev since I've hit some developer API corner) What's the best way to convert an InternalRow to a Row if I've got an InternalRow and the corresponding Schema. Code snippet: @Test public void foo() throws Exception { Row row = RowFactory.create(1); StructType

Re: Tests failing with GC limit exceeded

2017-01-04 Thread shane knapp

preliminary findings: seems to be transient, and affecting 4% of builds from late december until now (which is as far back as we keep build records for the PRB builds). 408 builds 16 builds.gc <--- failures it's also happening across all workers at about the same rate. and best of all, the

Re: Clarification about typesafe aggregations

2017-01-04 Thread Reynold Xin

Your understanding is correct - it is indeed slower due to extra serialization. In some cases we can get rid of the serialization if the value is already deserialized. On Wed, Jan 4, 2017 at 7:19 AM, geoHeil wrote: > Hi I would like to know more about typeface aggregations in spark. > > http://

Re: Clarification about typesafe aggregations

2017-01-04 Thread geoHeil

Thanks for the clarification. rxin [via Apache Spark Developers List] < ml-node+s1001551n20462...@n3.nabble.com> schrieb am Mi. 4. Jan. 2017 um 23:37: > Your understanding is correct - it is indeed slower due to extra > serialization. In some cases we can get rid of the serialization if the > valu

Re: Quick request: prolific PR openers, review your open PRs

2017-01-04 Thread Hyukjin Kwon

Let me double-check mind too. 2017-01-04 21:57 GMT+09:00 Liang-Chi Hsieh : > > Ok. I will go through and check my open PRs. > > > Sean Owen wrote > > Just saw that there are many people with >= 8 open PRs. Some are > > legitimately in flight but many are probably stale. To set a good > example, >

Re: Converting an InternalRow to a Row

2017-01-04 Thread Liang-Chi Hsieh

You need to resolve and bind the encoder. ExpressionEncoder enconder = RowEncoder.apply(struct).resolveAndBind(); Andy Dang wrote > Hi all, > (cc-ing dev since I've hit some developer API corner) > > What's the best way to convert an InternalRow to a Row if I've got an > InternalRow and the co

Re: Skip Corrupted Parquet blocks / footer.

2017-01-04 Thread Liang-Chi Hsieh

After checking the codes, I think there are few issues regarding this ignoreCorruptFiles config, so you can't actually use it with Parquet files now. I opened a JIRA https://issues.apache.org/jira/browse/SPARK-19082 and also submitted a PR for it. khyati wrote > Hi Reynold Xin, > > In spark 2.

Re: Approach: Incremental data load from HBASE

2017-01-04 Thread ayan guha

Hi Chetan What do you mean by incremental load from HBase? There is a timestamp marker for each cell, but not at Row level. On Wed, Jan 4, 2017 at 10:37 PM, Chetan Khatri wrote: > Ted Yu, > > You understood wrong, i said Incremental load from HBase to Hive, > individually you can say Incrementa

Re: Dependency Injection and Microservice development with Spark

2017-01-04 Thread darren

We've been able to use ipopo dependency injection framework in our pyspark system and deploy .egg pyspark apps that resolve and wire up all the components (like a kernel architecture. Also similar to spring) during an initial bootstrap sequence; then invoke those components across spark. Just re

Re: Why is spark.shuffle.sort.bypassMergeThreshold 200?

2017-01-04 Thread Kay Ousterhout

I believe that these two were indeed originally related. In the old hash-based shuffle, we wrote objects out immediately to disk as they were generated by an RDD's iterator. On the other hand, with the original version of the new sort-based shuffle, Spark buffered a bunch of objects before writing

Tests failing with GC limit exceeded

2017-01-04 Thread Kay Ousterhout

I've noticed a bunch of the recent builds failing because of GC limits, for seemingly unrelated changes (e.g. 70818, 70840, 70842). Shane, have there been any recent changes in the build configuration that might be causing this? Does anyone else have any ideas about what's going on here? -Kay

Cannot pass broker list parameter from Scala to Kafka: Property bootstrap.servers is not valid

2017-01-04 Thread Dino

I have spent a lot of time trying to figure out the following problem. I need to consume messages from the topic of remote Kafka queue using Scala and Spark. By default the port of Kafka on remote machine is set to `7072`, not `9092`. Also, on remote machine there are the following versions install

Re: Skip Corrupted Parquet blocks / footer.

Re: Skip Corrupted Parquet blocks / footer.

Re: Why ShuffleMapTask has transient locs and preferredLocs?!

Re: Apache Hive with Spark Configuration

Re: Dependency Injection and Microservice development with Spark

Re: Approach: Incremental data load from HBASE

Re: Dependency Injection and Microservice development with Spark

Quick request: prolific PR openers, review your open PRs

Re: Quick request: prolific PR openers, review your open PRs

Clarification about typesafe aggregations

Converting an InternalRow to a Row

Re: Tests failing with GC limit exceeded

Re: Clarification about typesafe aggregations

Re: Clarification about typesafe aggregations

Re: Quick request: prolific PR openers, review your open PRs

Re: Converting an InternalRow to a Row

Re: Skip Corrupted Parquet blocks / footer.

Re: Approach: Incremental data load from HBASE

Re: Dependency Injection and Microservice development with Spark

Re: Why is spark.shuffle.sort.bypassMergeThreshold 200?

Tests failing with GC limit exceeded

Cannot pass broker list parameter from Scala to Kafka: Property bootstrap.servers is not valid

22 matches

Site Navigation

Mail list logo

Footer information