First, let me say that I ran a few tests to determine the behavior, so it's entirely possible someone who actually understands the code will tell me I'm all wet.
The problem here is that for every scenario in which deleting on partial field matches would be good, I can create one where it would be bad. Or worse, disastrous <G>.... The issue isn't deleting multiple documents at once, it's deleting on partial term matches. In your example, you could have, say, two IDs for the attachments, one the unique ID and one the parent ID. This would allow you to remove all the attachments in one delete, and use a separate delete for the original. Or you could have a "group" id for an e-mail and it's attachments, and delete them all with a single delete on the "group" term. Or..... But I'd *much* rather have a system that required me to jump through a hoop when I wanted this kind of behavior than have a system that allowed me to shoot myself in the foot. Or blow off the whole leg. And deleting on, say, a text field with a partial term match on "the" is just too terrible to contemplate <G>.... Best Erick On 7/8/07, Nadav Har'El <[EMAIL PROTECTED]> wrote:
On Wed, Jul 04, 2007, Erick Erickson wrote about "Re: problems with deleteDocuments": > Consider what would happen otherwise. Say you have documents > with the following values for a field (call it blah). > some data > some data I put in the index > lots of data > data > > Then I don't want deleting on the term blah:data to remove all > of them. Which seems to be what you're asking. Even if > you restricted things to "phrases", then deleting on the term > 'blah:some data' would remove two documents. > > So, while UN_TOKENIZED isn't a *requirement*, exact total term > matches *is* the requirement. By that, I meant that whatever > goes into the field must not be broken into pieces by the indexing > tokenizer for deletes to work as you expect. I disagree, and frankly, am very surprised that "exact total term matches" is actually a requirement (I never tried it, so you may be absolutely right, I just hope you aren't). Let me give you just one example where id fields containing multiple words, and the ability for a delete query to match several documents, are useful. Consider an application for indexing emails with attachments. The email text, and each document attachment, is indexed as a separate document. When an email is deleted, we also need to delete its attachments. How shall we do this? One simple implementation is to have an "id" field for each document; The email text document will have a unique id, and the attachment document will have two ids: its own unique id, and the containing email's id. When we need to remove an email and all its attachments, we just remove all documents that match the email's id - and this will include the main text and the attachments. By the way, the method is called "deleteDocuments" - doesn't that imply that it's perfectly acceptable to delete many documents with one term? -- Nadav Har'El | Sunday, Jul 8 2007, 22 Tammuz 5767 IBM Haifa Research Lab |----------------------------------------- |I am not a complete idiot - some parts http://nadav.harel.org.il |are missing. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]