Hi,
I produced avro data to kafka topic using schema registry and now I want to
use spark streaming to read that data and do some computation in real time.
Can some one please give a sample code for doing that . I couldn't find any
working code online. I am using spark version 2.2.0 and
spark-stre
Hi, I want to know is it possible to customize the logic of TF_IDF in
Apache Spark?
In typical TF_IDF the TF is computed for each word regarding its documents.
For example, the TF of word "A" can be differentiated in documents D1 and
D2, but I want to see the TF as term frequency among whole docume
Any clue? Thanks.
On Wed, Oct 31, 2018 at 8:29 PM Lian Jiang wrote:
> We have jsonl files each of which is compressed as gz file. Is it possible
> to make SSS to handle such files? Appreciate any help!
>
Hi,
I want to compute the cosine similarities of vectors using apache spark. In
a simple example, I created a vector from each document using built-in
tf-idf. Here is the code:
hashingTF = HashingTF(inputCol="tokenized", outputCol="tf")
tf = hashingTF.transform(df)
idf = IDF(inputCol="tf", outpu
Got it, thanks!
On Fri, Nov 2, 2018 at 7:18 PM Eike von Seggern
wrote:
> Hi,
>
> Soheil Pourbafrani schrieb am Fr., 2. Nov. 2018
> um 15:43 Uhr:
>
>> Hi, I have an RDD of the form (((a), (b), (c), (d)), (e)) and I want to
>> transform every row to a dictionary of the form a:(b, c, d, e)
>>
>> H
Hi,
Soheil Pourbafrani schrieb am Fr., 2. Nov. 2018 um
15:43 Uhr:
> Hi, I have an RDD of the form (((a), (b), (c), (d)), (e)) and I want to
> transform every row to a dictionary of the form a:(b, c, d, e)
>
> Here is my code, but it's errorful!
>
> map(lambda row : {row[0][0] : (row[1], row[0][1
Hi, I have an RDD of the form (((a), (b), (c), (d)), (e)) and I want to
transform every row to a dictionary of the form a:(b, c, d, e)
Here is my code, but it's errorful!
map(lambda row : {row[0][0] : (row[1], row[0][1], row[0][2], row[0][3]))
Is it possible to do such a transformation?
Hello,
Is there a way spark streaming application will get to know during the
start and end of the data read from a dataset partition?
I want to create partition specific cache during the start and delete
during the partition is read completely.
Thanks for you help in advance.
regards,
Robin Kut
Agree. Spark is not designed for embedding in business applications(those
traditional J2EE) for real-time interaction.
Thanks,
Gabriel
On Fri, Nov 2, 2018 at 2:36 PM 张万新 wrote:
> I think you should investigate apache zeppelin and livy
> 崔苗(数据与人工智能产品开发部) <0049003...@znv.com>于2018年11月2日 周五11:01