Thank Jane for providing examples to make discussions clearer.
Thank Lincoln and Xuyang for your feedback,I agree with you wholeheartedly that 
it is better to throw an error instead of ignoring it directly.
Extending datagen to generate variable length values is really an excelent 
idea, I will create another jira to follow up.

Taking the example provided,

  1.  For fixed-length data types (char, binary), two DDLs which custom length 
should throw exception like 'User-defined length of the fixed-length field f0 
is not supported.'

  1.
CREATE TABLE foo (
f0 CHAR(5)
) WITH ('connector' = 'datagen', 'fields.f0.length' = '10');

CREATE TABLE bar (
f0 CHAR(5)
) WITH ('connector' = 'datagen', 'fields.f0.length' = '1');

  1.  For variable-length data types (varchar, varbinary),the first DDL can be 
executed legally, if illegal user-defined length configured, will throw 
exception like 'User-defined length of the VARCHAR field %s should be shorter 
than the schema definition.'

  1.
CREATE TABLE meow (
f0 VARCHAR(20)
) WITH ('connector' = 'datagen', 'fields.f0.length' = '10');

  1.  For special variable-length data types, since the length of String and 
Bytes is very large (2^31 - 1), when users does not specify a smaller field 
length, Fields that occupy a huge amount of memory (estimated to be more than 
2GB) will be generated by default, which can easily lead to 
"java.lang.OutOfMemoryError: Java heap space", so I recommend that the default 
length of these two fields is 100 just like before, but the length can be 
configured to less than 2^31-1.

  1.
CREATE TABLE purr (
f0 STRING
) WITH ('connector' = 'datagen', 'fields.f0.length' = '10');

Updates have been synchronized to the merge request [1]

WDYT?

[1] https://github.com/apache/flink/pull/23678


Best!
Yubin

Reply via email to