@Ayan - Creating temp table dynamically based on dataset name. I will
explore df.saveAsTable option.
On Mon, Apr 17, 2017 at 9:53 PM, Ryan wrote:
> It shouldn't be a problem then. We've done the similar thing in scala. I
> don't have much experience with python thread but maybe the code related
It shouldn't be a problem then. We've done the similar thing in scala. I
don't have much experience with python thread but maybe the code related
with reading/writing temp table isn't thread safe.
On Mon, Apr 17, 2017 at 9:45 PM, Amol Patil wrote:
> Thanks Ryan,
>
> Each dataset has separate hiv
What happens if you do not use the temp table, but directly do
df.saveAsTsble with mode append? If i have to guess without looking at the
code of your task function, i would think the name if temp table is
evaluated statically, so all threads are refering to same tsble. In other
words your app is n
Thanks Ryan,
Each dataset has separate hive table. All hive tables belongs to same hive
database.
The idea is to ingest data in parallel in respective hive tables.
If I run code sequentially for each data source, it works fine but I will
take lot of time. We are planning to process around 30-40
I don't think you can parallel insert into a hive table without dynamic
partition, for hive locking please refer to
https://cwiki.apache.org/confluence/display/Hive/Locking.
Other than that, it should work.
On Mon, Apr 17, 2017 at 6:52 AM, Amol Patil wrote:
> Hi All,
>
> I'm writing generic pys