ate instead of a pivot, and assembling the vector using
> a UDF.
>
> On Thu, Oct 29, 2020 at 10:19 PM Daniel Chalef
> wrote:
>
>> Hello,
>>
>> I have a very large long-format dataframe (several billion rows) that I'd
>> like to pivot and vectorize (using th
Hello,
I have a very large long-format dataframe (several billion rows) that I'd
like to pivot and vectorize (using the VectorAssembler), with the aim to
reduce dimensionality using something akin to TF-IDF. Once pivoted, the
dataframe will have ~130 million columns.
The source, long-format schem