On Feb 19, 2007, at 8:45 AM, Yonik Seeley wrote:

If I had to do it over again, I'd be tempted to further restrict the
patterns so that they could be looked up from a Map rather than
linearly.

Awesome.  I know exactly how I'm going to implement this now.

This hasn't proved to be a problem so far though, as the
number of field-types for dynamic fields normally remains small.

For KS, there will be only one abstract class dedicated to multi- dimensional data. Users will subclass to provide their own arbitrary field definitions. The field definition itself won't be dynamic -- only the suffix on the field name will be.

For a hashmap lookup, a prefix pattern could be restricted one of two ways: fixed length, or terminal character. I'm inclined to go with a terminating underscore in the field name -- that allows the users to choose their own prefix for maximum readability, at the cost of an additional scan.

Here's how the schema for your CNET index might look.

   # ./CNETSchema.pm

   package CNETSchema::name;
   use base 'KinoSearch::Schema::FieldSpec';

   package CNETSchema::description;
   use base 'KinoSearch::Schema::FieldSpec';
   sub similarity {
       return KinoSearch::Contrib::LongFieldSim->new;
   }

   package CNETSchema::product_id;
   use base 'KinoSearch::Schema::FieldSpec';
   sub analyzed { 0 }

   package CNETSchema::attr;
   use base 'KinoSearch::Schema::DeepFieldSpec';
   sub analyzed { 0 }
   sub stored   { 0 }

   package CNETSchema;
   use base 'KinoSearch::Schema';
   use KinoSearch::Analyzer::PolyAnalyzer;
   sub analyzer {
return KinoSearch::Analysis::PolyAnalyzer->new( language => 'en' );
   }
   __PACKAGE__->load_fields(qw( name description product_id attr ));

   1;

Then, at index time, you'll be able to do this:

   $index_writer->add_doc({
       name                         => 'Acme LT-1 Laptop',
       description                  => 'blah blah blah...',
       product_id                   => 'acme-lt-1',
       attr_weight                  => 6.3,
       attr_heat_dissipation_factor => 20,
   });

I'll need to make a few backend tweaks, but this API pretty much solves the multi-dimensional data problem. :)

Thoughts?

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to