Hi,

similar to Dominiks approach of checking the file base for parsing errors,
I'd like to scan for certain file constellations, for the typically "left over 
bytes" error
or other record combinations which I can't reproduce with my MS/Libre office 
versions.

I haven't thought about how it's actually done, but I think logging the 
location in the
integration tests and later manually checking the corresponding files should be
sufficient.

Best wishes,
Andi



On 03.04.2015 17:51, Dominik Stadler wrote:
> Hi,
>
> I am very interested as I am following the Common Crawl activity for
> some time already. It sounds like a neat idea to do the check already
> when the crawl is done, are the binary documents already part of the
> crawl-data?
>
> ...
>
> Dominik.
>
> On Fri, Apr 3, 2015 at 4:28 PM, Allison, Timothy B. <[email protected]> 
> wrote:
>> All,
>>
>> What do you think?
>>
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to