subject:"streaming pdf"

Re: streaming pdf

2018-11-19 Thread Jörn Franke

And you have to write your own input format, but this is not so complicated (probably anyway recommended for the PDF case) > Am 20.11.2018 um 08:06 schrieb Jörn Franke : > > Well, I am not so sure about the use cases, but what about using > StreamingContext.fileStream? > https://spark.apache.or

Re: streaming pdf

2018-11-19 Thread Jörn Franke

Well, I am not so sure about the use cases, but what about using StreamingContext.fileStream? https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/streaming/StreamingContext.html#fileStream-java.lang.String-scala.Function1-boolean-org.apache.hadoop.conf.Configuration-scala.reflect.ClassTa

Re: streaming pdf

2018-11-19 Thread Nicolas Paris

On Mon, Nov 19, 2018 at 07:23:10AM +0100, Jörn Franke wrote: > Why does it have to be a stream? > Right now I manage the pipelines as spark batch processing. Mooving to stream would add some improvements such: - simplification of the pipeline - more frequent data ingestion - better resource manag

Re: streaming pdf

2018-11-18 Thread Jörn Franke

Why does it have to be a stream? > Am 18.11.2018 um 23:29 schrieb Nicolas Paris : > > Hi > > I have pdf to load into spark with at least > format. I have considered some options: > > - spark streaming does not provide a native file stream for binary with > variable size (binaryRecordStream sp

streaming pdf

2018-11-18 Thread Nicolas Paris

Hi I have pdf to load into spark with at least format. I have considered some options: - spark streaming does not provide a native file stream for binary with variable size (binaryRecordStream specifies a constant size) and I would have to write my own receiver. - Structured streaming allow

Re: streaming pdf

Re: streaming pdf

Re: streaming pdf

Re: streaming pdf

streaming pdf

5 matches

Site Navigation

Mail list logo

Footer information