Re: Whether Spark is appropriate for our use case.

2015-11-07 Thread Igor Berman
1. if you have join by some specific field(e.g. user id or account-id or whatever) you may try to partition parquet file by this field and then join will be more efficient. 2. you need to see in spark metrics what is performance of particular join, how much partitions is there, what is shuffle size

Re: Whether Spark is appropriate for our use case.

2015-10-20 Thread Adrian Tanase
Can you share your approximate data size? all should be valid use cases for spark, wondering if you are providing enough resources. Also - do you have some expectations in terms of performance? what does "slow down" mean? For this usecase I would personally favor parquet over DB, and sql/datafr