Dear Spark Users,
If you want to search a list of phrases, approx. 10,000 each having words
between 1 to 6, in a large amount of text (approximately 10GB) how do you
go about it?
I ended up wiring a small RDD based libraries:
https://github.com/cloudxlab/phrasesearch
I would like to get feedback
Ok that worked thanks for the suggestion.
Sent from my iPhone
> On May 24, 2019, at 11:53 AM, SNEHASISH DUTTA
> wrote:
>
> Hi,
> All the keys are similar so they are going to same partition.
> Key->Partition distribution is dependent upon hash calculation add some
> random number to your key
After a while it's possible to see this error too:
9/05/28 11:11:18 ERROR executor.Executor: Exception in task 35.1 in
stage 0.0 (TID 265)
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
Failed 122 actions: my_table: 122 times,
at
org.apache.hadoop.hbase.client.AsyncP
I'm executing a load process into HBase with spark. (around 150M record).
At the end of the process there are a lot of fail tasks.
I get this error:
19/05/28 11:02:31 ERROR client.AsyncProcess: Failed to get region location
org.apache.hadoop.hbase.TableNotFoundException: my_table
at
org.
Hi,
How many files do you read ? Are they splittable ?
If you have 4 files non splittable, your dataset would have 4 partitions
and you will only see one task per partition handle by on executor
Regards,
Arnaud
On Tue, May 28, 2019 at 10:06 AM Sachit Murarka
wrote:
> Hi All,
>
> I am using spa
Hi All,
I am using spark 2.2
I have enabled spark dynamic allocation with executor cores 4, driver cores
4 and executor memory 12GB driver memory 10GB.
In Spark UI, I see only 1 task is launched per executor.
Could anyone please help on this?
Kind Regards,
Sachit Murarka