[CHECKER] 28 potential interrupt errors
Hi, Here are yet more results from the MC project. This checker looks for inconsistent usage of interrupt functions. For example, it notices when interrupts can be either on or off when a function exits. It tracks cli(), sti(), save_flags() and so forth. We've hand-inspected the results to ensure that the ones you see here are likely to be errors. As usual, please CC us at [EMAIL PROTECTED] if you can verify these potential errors or show that these bugs are false positives. Important: The code snippets with each bug here were automatically culled from the source, not manually selected, so they are sometimes inaccurate as to the actual location of the bug. We've included a comment before each bug to help in understanding what the checker found, but the only way to know for sure is to inspect the source. -Junfeng & Andy Where the errors are: +--++ | file | fn | +--++ | drivers/cdrom/cm206.c| receive_byte | | drivers/cdrom/cm206.c| send_command | | drivers/char/ip2/i2lib.c | i2QueueCommands| | drivers/char/n_r3964.c | add_msg| | drivers/char/rio/rioroute.c | RIOFixPhbs | | drivers/char/rio/riotable.c | RIODeleteRta | | drivers/i2o/i2o_block.c | i2ob_del_device| | drivers/ide/ide.c| ide_spin_wait_hwgroup | | drivers/isdn/hisax/config.c | checkcard | | drivers/isdn/hisax/isar.c| isar_load_firmware | | drivers/isdn/isdn_ppp.c | isdn_ppp_bind | | drivers/net/appletalk/cops.c | cops_rx| | drivers/net/hamradio/soundmodem/sm_wss.c | wss_set_codec_fmt | | drivers/net/irda/irport.c| irport_net_ioctl | | drivers/net/irda/irtty.c | irtty_net_ioctl| | drivers/net/irda/nsc-ircc.c | nsc_ircc_net_ioctl | | drivers/net/irda/toshoboe.c | toshoboe_net_ioctl | | drivers/net/irda/w83977af_ir.c | w83977af_net_ioctl | | drivers/net/pcmcia/wavelan_cs.c | wavelan_get_wireless_stats | | drivers/net/tokenring/smctr.c| smctr_open_tr | | drivers/net/wan/comx-hw-mixcom.c | MIXCOM_open| | drivers/net/wan/lmc/lmc_main.c | lmc_watchdog | | drivers/scsi/eata_dma.c | eata_queue | | drivers/scsi/qla1280.c | qla1280_intr_handler | | drivers/sound/ad1848.c | ad1848_resume | | drivers/sound/emu10k1/midi.c | emu10k1_midi_callback | | drivers/sound/sscape.c | sscape_pnp_upload_file | | net/irda/irttp.c | irttp_proc_read| +--++ Listing: [BUG] sleep_or_timeout will call interruptible_sleep_on, which will save disabled flags and then restore them. /u2/acc/oses/linux/2.4.1/drivers/cdrom/cm206.c:474:send_command: ERROR:INTR:462:474: Interrupts inconsistent, severity `20':474 if (!(inw(r_line_status) & ls_transmitter_buffer_empty)) { cd->command = command; Start ---> cli(); /* don't interrupt before sleep */ outw(dc_mask_sync_error | dc_no_stop_on_error | (inw(r_data_status) & 0x7f), r_data_control); /* interrupt routine sends command */ Save & Restore flags here ---> if (sleep_or_timeout(&cd->uart, UART_TIMEOUT)) { debug(("Time out on write-buffer\n")); stats(write_timeout); ... DELETED 2 lines ... } debug(("Write commmand delayed\n")); } else outw(command, r_uart_transmit); Error ---> } uch receive_byte(int timeout) { uch ret; - [BUG] sleep_or_timeout will call interruptible_sleep_on, which will save disabled flags and then restore them. /u2/acc/oses/linux/2.4.1/drivers/cdrom/cm206.c:499:receive_byte: ERROR:INTR:479:494: Interrupts inconsistent, severity `30':494 { uch ret; Start ---> cli(); debug(("cli\n")); ret = cd->ur[cd->ur_r]; if (cd->ur_r != cd->ur_w) { sti(); debug(("returning #%d: 0x%x\n", cd->ur_r, cd->ur[cd->ur_r])); cd->ur_r++; cd->ur_r %= UR_SIZE; ... DELETED 5 lines ... #ifdef STATISTICS if (timeout==UART_TIMEOUT) stats(receive_timeout) /* no `;'! */ else stats(dsb_timeout); #endif Error ---> return 0xda; } ret = cd->ur[cd->ur_r]; debug(("slept; returning #%d: 0x%x\n", cd->ur_r, cd->ur[cd->ur_r])); cd->ur_r++; cd->ur_r %= UR_SIZE; -
[CHECKER] 120 potential dereference to invalid pointers errors forlinux 2.4.1
Hi, This checker warns when the pointer returned by a "plausibly" failing routine is not checked before being used. It automatically builds up the list of failing routines by examining all callsites. If a function's returned pointer is checked at more than one callsite, the checker ensures it is always checked. (Functions like strtok or hash-table lookups are culled from this list by hand.) Sometimes we are unaware of preconditions that make such checks unnecessary, so the "errors" might still have false positives. Junfeng & Dawson Where the errors are: --+-+ | file | fn | +--+-+ | arch/i386/kernel/irq.c | init_irq_proc | | arch/i386/kernel/irq.c | register_irq_proc | | arch/i386/kernel/mtrr.c | mtrr_init | | drivers/acpi/dispatcher/dswload.c| acpi_ds_load2_end_op| | drivers/acpi/interpreter/amutils.c | acpi_aml_build_copy_internal_package_object | | drivers/acpi/parser/psparse.c| acpi_ps_parse_loop | | drivers/atm/fore200e.c | fore200e_get_esi| | drivers/atm/zatm.c | zatm_detect | | drivers/block/DAC960.c | DAC960_V1_ExecuteType3 | | drivers/block/DAC960.c | DAC960_V1_ExecuteType3D | | drivers/block/DAC960.c | DAC960_V2_ControllerInfo| | drivers/block/DAC960.c | DAC960_V2_DeviceOperation | | drivers/block/DAC960.c | DAC960_V2_GeneralInfo | | drivers/block/DAC960.c | DAC960_V2_LogicalDeviceInfo | | drivers/block/DAC960.c | DAC960_V2_PhysicalDeviceInfo| | drivers/block/DAC960.c | DAC960_V2_ReadDeviceConfiguration | | drivers/block/ll_rw_blk.c| blk_init_free_list | | drivers/char/drm/context.c | drm_alloc_queue | | drivers/char/drm/fops.c | drm_open_helper | | drivers/char/drm/proc.c | drm_proc_init | | drivers/char/ip2main.c | old_ip2_init| | drivers/char/pc_keyb.c | psaux_init | | drivers/char/rio/rio_linux.c | rio_init_datastructures | | drivers/i2o/i2o_core.c | i2o_core_evt| | drivers/ide/ide-probe.c | init_gendisk| | drivers/ide/ide-probe.c | init_irq| | drivers/ide/ide-tape.c | idetape_onstream_read_back_buffer | | drivers/isdn/avmb1/avm_cs.c | avmcs_attach| | drivers/isdn/avmb1/capi.c| capinc_raw_write| | drivers/isdn/avmb1/capi.c| capi_write | | drivers/isdn/avmb1/capidrv.c | if_readstat | | drivers/isdn/avmb1/capidrv.c | if_sendbuf | | drivers/md/raid5.c | grow_buffers| | drivers/md/raid5.c | __check_consistency | | drivers/media/video/i2c-parport.c| i2c_parport_attach | | drivers/media/video/videodev.c | videodev_proc_create_dev| | drivers/net/3c505.c | receive_packet | | drivers/net/3c515.c | corkscrew_found_device | | drivers/net/aironet4500_card.c | awc4500_isa_probe | | drivers/net/aironet4500_card.c | awc4500_pnp_probe | | drivers/net/defxx.c | dfx_rcv_init| | drivers/net/dgrs.c | dgrs_found_device | | drivers/net/pcmcia/aironet4500_cs.c | awc_attach | | drivers/net/pcmcia/wavelan_cs.c | wavelan_attach | | drivers/net/pcmcia/xircom_tulip_cb.c | tulip_probe1| | drivers/net/skfp/ess.c | ess_raf_received_pack | | drivers/net/skfp/ess.c | ess_send_response | | drivers/net/smc9194.c| smc_rcv
Re: [CHECKER] 28 potential interrupt errors
> Your reporting is a little misleading here. Thanks for verifying these bugs ;) The interrupt checker checks for inconsistent interrupt states. For example, if a function has one exit point with interrupt disabled, and another exit point with interrupt enabled, the checker will report an error at the second exit point. The code snippets are automatically culled from the source based on the line number in the error report. So the reporting is sometimes misleading. I'll put the actuall line number in the comments. > > Yes, there's a bug in this function - the `return -EPERM' > doesn't do a `restore_flags()'. But there is no bug > in the place you've reported. > > (Personally, I think *any* C function which has more than > one `return' statement is a bug, and we see a classic > instance here of one of the problems which this practice > can cause. Religious issue. ) It may be better to have two exit points, one for normal path and one for error path. All error handling code can be put at the end of the function. > > > > ... > > > [BUG] error path > > > > /u2/acc/oses/linux/2.4.1/drivers/net/wan/comx-hw-mixcom.c:505:MIXCOM_open: >ERROR:INTR:514:562: Interrupts inconsistent, severity `20':562 > > > > } > > > > Start ---> > > save_flags(flags); cli(); > > > > if(hw->channel==1) { > > request_region(dev->base_addr, MIXCOM_IO_EXTENT, dev->name); > > } > > > > if(hw->channel==0 && !(ch->init_status & IRQ_ALLOCATED)) { > > > > ... DELETED 38 lines ... > > > > procfile->mode = S_IFREG | 0444; > > } > > } > > > > Error ---> > > return 0; > > } > > There's another problem here. We're calling request_region() > inside cli(). request_region() can sleep. > > On SMP, cli() does all sorts of bizarre things which are > quite different between different architectures. I don't > know if this practice is actually unsafe on any architectures, > but it is really bad practice. It's certainly the case that > schedue() will enable interrupts for you, so whatever you're > protecting won't be protected! > > So I'd add `sleep inside cli()' to your list of things to > look out for. > > Does your tool have the ability to track which functions > can and can't sleep? This is a very common source of bug. > Grab a spinlock, then call a function which calls a function > which calls a function which calls kmalloc(GFP_KERNEL). Unless > the spinlock is always protected by a semaphore, this can deadlock. > > > > > /u2/acc/oses/linux/2.4.1/drivers/scsi/eata_dma.c:490:eata_queue: >ERROR:INTR:464:506: Interrupts inconsistent, severity `20':506 > > > > save_flags(flags); > > Start ---> > > cli(); > > > > #if 0 > > for (x = 1, sh = first_HBA; x <= registered_HBAs; x++, sh = SD(sh)->next) { > > if(inb((uint)sh->base + HA_RAUXSTAT) & HA_AIRQ) { > > printk("eata_dma: scsi%d interrupt pending in eata_queue.\n" > >" Calling interrupt handler.\n", sh->host_no); > > > > ... DELETED 32 lines ... > > > > printk(KERN_CRIT "eata_queue pid %ld, HBA QUEUE FULL..., " > >"returning DID_BUS_BUSY\n", cmd->pid)); > > done(cmd); > > restore_flags(flags); > > Error ---> > > return(0); > > } > > ccb = &hd->ccb[y]; > > Something's gone a little wrong here. The bug is in fact about > 20 lines higher up. > > > Generally: yes, everything you've found needs fixing. > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[CHECKER] 4 warnings in kernel/module.c
Hi, we modified the block checker and run it again on linux 2.4.1. (The block checker flags an error when blocking functions are called with either interrupts disabled or a spin lock held. ) It gave us 4 warnings in kernel/module.c. Because we are unaware of the contexts where these functions are called, we are not sure if these 4 warnings are real errors or false positives. Please help us to verify them or show that they are false positives. As usual, please CC us at [EMAIL PROTECTED] Any help will be appreciated. - [UNKNOWN] get_mod_name->__get_free_page(GFP_KERNEL). This is in the KERNEL. Definitely need to verify /u2/acc/oses/linux/2.4.1/kernel/module.c:290:sys_create_module: ERROR:BLOCK:289:290:calling blocking fn 'get_mod_name' w/ spin lock held [type=GLOBAL]:289 Start ---> lock_kernel(); Error ---> if ((namelen = get_mod_name(name_user, &name)) < 0) { error = namelen; - [UNKNOWN] get_mod_name->__get_free_page(GFP_KERNEL) This is in the KERNEL. Definitely need to verify /u2/acc/oses/linux/2.4.1/kernel/module.c:599:sys_delete_module: ERROR:BLOCK:597:599:calling blocking fn 'get_mod_name' w/ spin lock held [type=GLOBAL]:597 Start ---> lock_kernel(); if (name_user) { Error ---> if ((error = get_mod_name(name_user, &name)) < 0) goto out; - [UNKNOWN] need to verify. in the KERNEL! /u2/acc/oses/linux/2.4.1/kernel/module.c:376:sys_init_module: ERROR:BLOCK:342:376:calling blocking fn 'copy_from_user' w/ spin lock held [type=LOCAL]:342 Start ---> lock_kernel(); Error ---> if ((namelen = get_mod_name(name_user, &name)) < 0) { error = namelen; goto err0; } ... DELETED 26 lines ... goto err1; } strcpy(name_tmp, mod->name); Error ---> error = copy_from_user(mod, mod_user, mod_user_size); if (error) { - [UNKNOWN] need to verify. in the KERNEL! /u2/acc/oses/linux/2.4.1/kernel/module.c:888:sys_query_module: ERROR:BLOCK:881:888:calling blocking fn 'get_mod_name' w/ spin lock held [type=GLOBAL]:881 Start ---> lock_kernel(); if (name_user == NULL) mod = &kernel_module; else { long namelen; char *name; Error ---> if ((namelen = get_mod_name(name_user, &name)) < 0) { err = namelen; - A few questions: 1. Is it OK to call blocking functions in the functions like /init/main.c:init and init/main.c:start_kernel with a spin lock held? It seems OK because the system is booting when these functions are called. 2. Can functions like kmem_cache_create, kmem_cache_alloc, alloc_page block? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CHECKER] 4 warnings in kernel/module.c
On Fri, 23 Mar 2001, Keith Owens wrote: > On Fri, 23 Mar 2001 02:41:40 -0800 (PST), > Junfeng Yang <[EMAIL PROTECTED]> wrote: > >Hi, we modified the block checker and run it again on linux 2.4.1. (The > >block checker flags an error when blocking functions are called with > >either interrupts disabled or a spin lock held. ) > > > >It gave us 4 warnings in kernel/module.c. Because we are unaware of the > >contexts where these functions are called, we are not sure if these 4 > >warnings are real errors or false positives. Please help us to verify them > >or show that they are false positives. > > All false positives. The big kernel lock is a special case, you are > allowed to sleep while holding that lock. See release_kernel_lock() > and reacquire_kernel_lock() in sched(). Thanks for pointing this out. We'll modify the checker again and remove "lock_kernel" from the patterns. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CHECKER] 4 warnings in kernel/module.c
On Fri, 23 Mar 2001, Alan Cox wrote: > > Hi, we modified the block checker and run it again on linux 2.4.1. (The > > block checker flags an error when blocking functions are called with > > either interrupts disabled or a spin lock held. ) > > lock_kernel() isnt a spinlock as such. Thanks a lot. We just figured out that it is ok to block within lock_kernel() unlock_kernel() scope. That will help us to eliminate some false positives. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[CHECKER] Questions about *_do_scsi & create_proc_entry
Hi, I have a question about *_do_scsi(Scsi_Request *SRpnt, ...). If *SRpnt is not NULL, *_do_scsi will not return NULL. I'm not quite sure about the precondition in the following three 'errors' flaged by the NULL checker. In these cases, can *_do_scsi return NULL? Another question is that by inspecting the NULL checker's result, I found that *_do_scsi is always used in the following way "SRpnt = *_do_scsi(SRPnt, ...)" no matther SRPnt is NULL or not. If SRpnt is not NULL, why don't just use *_do_scsi(SRPnt, ...); The same thing happens to init_etherdev. Last question: we found 3 potential errors in arch/i386/kernel. It seems that create_proc_entry could return NULL. Please help us to verify if they are bugs or not. As usual, please CC us at [EMAIL PROTECTED] Any help will be appreciated. -Junfeng - [UNKNOWN] osst_do_scsi will never return NULL if argument SRpnt isn't NULL. But they copy SRpnt back by *aSRpnt, implies it could be NULL /u2/acc/oses/linux/2.4.1/drivers/scsi/osst.c:1145:osst_read_back_buffer_and_rewrite: ERROR:NULL:1042:1145: Using unknown ptr "SRpnt" illegally! set by 'osst_do_scsi':1042 for (i = 0, p = buffer; i < frames; i++, p += OS_DATA_SIZE) { memset(cmd, 0, MAX_COMMAND_SIZE); cmd[0] = 0x3C; /* Buffer Read */ cmd[1] = 6; /* Retrieve Faulty Block */ cmd[7] = 32768 >> 8; cmd[8] = 32768 & 0xff; Start ---> SRpnt = osst_do_scsi(SRpnt, STp, cmd, OS_FRAME_SIZE, SCSI_DATA_READ, STp->timeout, MAX_RETRIES, TRUE); if ((STp->buffer)->syscall_result) { printk(KERN_ERR "osst%d: Failed to read block back from OnStream buffer\n", dev); vfree((void *)buffer); *aSRpnt = SRpnt; return (-EIO); } osst_copy_from_buffer(STp->buffer, p); // memcpy(p, STp->buffer->b_data, OS_DATA_SIZE); #if DEBUG if (debugging) printk(OSST_DEB_MSG "osst%d: Read back logical block %d, data %x %x %x %x\n", dev, logical_blk_num + i, p[0], p[1], p[2], p[3]); #endif } SRpnt is copied back through *aSRpnt here --> *aSRpnt = SRpnt; - [UNKNOWN] /u2/acc/oses/linux/2.4.1/drivers/scsi/osst.c:1145:osst_read_back_buffer_and_rewrite: ERROR:NULL::1145: Using unknown ptr "SRpnt" illegally! set by 'osst_do_scsi': - [UNKNOWN] osst_do_scsi can return NULL /u2/acc/oses/linux/2.4.1/drivers/scsi/osst.c:1243:osst_reposition_and_retry: ERROR:NULL:1237:1243: Using unknown ptr "SRpnt" illegally! set by 'osst_do_scsi':1237 #if DEBUG printk(OSST_DEB_MSG "osst%d: About to write pending lblk %d at frame %d\n", dev, STp->logical_blk_num-1, STp->first_frame_position); #endif Start---> SRpnt = osst_do_scsi(SRpnt, STp, cmd, OS_FRAME_SIZE, SCSI_DATA_WRITE, STp->timeout, MAX_WRITE_RETRIES, TRUE); Copied back here ---> *aSRpnt = SRpnt; if (STp->buffer->syscall_result) { /* additional write error */ if ((SRpnt->sr_sense_buffer[ 2] & 0x0f) == 13 && SRpnt->sr_sense_buffer[12] == 0 && SRpnt->sr_sense_buffer[13] == 2) { - [UNKNOWN] create_proc_entry /u2/acc/oses/linux/2.4.1/arch/i386/kernel/irq.c:1160:init_irq_proc: ERROR:NULL:1158:1160: Using unknown ptr "entry" illegally! set by 'create_proc_entry':1158 root_irq_dir = proc_mkdir("irq", 0); /* create /proc/irq/prof_cpu_mask */ Start--> entry = create_proc_entry("prof_cpu_mask", 0600, root_irq_dir); Error--> entry->nlink = 1; entry->data = (void *)&prof_cpu_mask; entry->read_proc = prof_cpu_mask_read_proc; entry->write_proc = prof_cpu_mask_write_proc; - [UNKNOWN] create_proc_entry can return NULL /u2/acc/oses/linux/2.4.1/arch/i386/kernel/irq.c:1139:register_irq_proc: ERROR:NULL:1137:1139: Using unknown ptr "entry" illegally! set by 'create_proc_entry':1137 irq_dir[irq] = proc_mkdir(name, root_irq_dir); /* create /proc/irq/1234/smp_affinity */ Start--> entry = create_proc_entry("smp_affinity", 0600, irq_dir[irq]); Error--> entry->nlink = 1; entry->data = (void *)(long)irq; entry->read_proc = irq_affinity_read_proc;
Re: [CHECKER] null bugs in 2.4.4 and 2.4.4-ac8
On Thu, 24 May 2001, Willem Riede wrote: > Dawson Engler wrote: > > > > Hi All, > > > > Enclosed are 103 potential errors where code gets a pointer from a > > possibly-failing routine (kmalloc, etc) and dereferences it without > > > > [BUG] osst_do_scsi will never return NULL if argument SRpnt isn't NULL. But they >copy SRpnt back by *aSRpnt, implies it could be NULL > > No. It implies SRpnt could have changed. The functions flagged > (osst_read_back_buffer_and_rewrite and osst_reposition_and_retry) > cannot be reached with SRpnt == NULL. So these are false alarms. these are false positives if osst_read_back_buffer_and rewrite can't be reached with SRpnt == NULL. It seems that osst_do_scsi will not change SRpnt unless it is NULL though. In other words, SRpnt is changed by osst_do_scsi <=> the initial argument SRpnt == NULL. Probabaly the pointer aSRpnt is useless. > > > >/u2/engler/mc/oses/linux/2.4.4/drivers/scsi/osst.c:1163:osst_read_back_buffer_and_rewrite: > ERROR:NULL::1163: Using unknown ptr "SRpnt" illegally! set by >'osst_do_scsi':1163 [nbytes = 216] > > #if DEBUG > > if (debugging) > > printk(OSST_DEB_MSG "osst%d: About to attempt to write to >frame %d\n", dev, new_block+i); > > #endif > > SRpnt = osst_do_scsi(SRpnt, STp, cmd, OS_FRAME_SIZE, >SCSI_DATA_WRITE, > > Start ---> > > STp->timeout, MAX_WRITE_RETRIES, TRUE); > > > > ... DELETED 46 lines ... > > > > } > > } > > if (flag) { > > if ((SRpnt->sr_sense_buffer[ 2] & 0x0f) == 13 && > > SRpnt->sr_sense_buffer[12] == 0 && > > Error ---> > > SRpnt->sr_sense_buffer[13] == 2) { > > printk(KERN_ERR "osst%d: Volume overflow in write >error recovery\n", dev); > > vfree((void *)buffer); > > return (-EIO); /* hit end of tape >= fail */ > > > > Regards. Willem Riede. > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CHECKER] crash + fsck cause file systems to contain loops (msdos and vfat, 2.6.11)
> Interesting. > > $ /devel/linux/works/fatfs/fatfstools/dosfstools-2.10/dosfsck/dosfsck -a > bug10/crash.img > dosfsck 2.10, 22 Sep 2003, FAT32, LFN > /0006 > Directory does not have any cluster ("." and ".."). > Dropping it. > Reclaimed 3 unused clusters (6144 bytes) in 3 chains. > Performing changes. > crash.img: 8 files, 3/8167 clusters > > My fixed dosfsck found the above corruption in bug10/crash.img (bug7 > has same corruption). And probably you can see root directory via 0006 > directory, I guess your testing tree didn't have my patches yet (seems > old behavior). I'm using dosfsck 2.10, 22 Sep 2003, FAT32, LFN, and yes, I do see root directory after I run dosfsck on the crashed disk image. I'm checking 2.6.11. By "your testing tree didn't have my patches yet", you mean you have the patch but haven't made it public? This "testing tree" is the Linux source tree? Can you be a little bit more specific? > BTW, what mount options did you use? I just used default mount. mount -t msdos source target no -o Thanks, -Junfeng - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CHECKER] sync doesn't flush everything out (msdos and vfat, 2.6.11)
> vfat and msdos doesn't support the link(), and the truncate() which > extends size is not supported yet. > > This test seems to calling abort(0) by CHECK(ret)... I updated the test case (basically just set CHECk to be a NOP). Can you please download and re-run the test case? After reboot, run dosfsck -a on the crashed disk, you'll see some output like: dosfsck 2.10, 22 Sep 2003, FAT32, LFN /å004 and /0005 share clusters. Truncating second to 0 bytes. /0005 File size is 4 bytes, cluster chain length is 0 bytes. Truncating file to 0 bytes. Performing changes. /dev/sbd0: 5 files, 4/8167 clusters This causes file /0005 to be truncated to 0. -Junfeng - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CHECKER] inconsistent NFS stat cache (NFS on ext3, 2.6.11)
> This is a known problem. Turn off the (default - grrr) subtree checking > export option on the server, and it will all work properly. The subtree > checking option violates the NFS standards for filehandle generation in > so many ways, that it isn't even funny. Thanks Trond. no_subtree_check fixes the problem. -Junfeng - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[CHECKER] NFS on JFS doesn't sync all file system operations (NFS on JFS, 2.6.11)
Hi, FiSC found that at link operation and unlink operation on a NFS partion on top of JFS are not sync'ed. These warnings show up in JFS but not in ext2, ext3, so I suspect it's a potential JFS problem. cat /etc/exports shows: /mnt/sbd0-export localhost(rw,sync,no_subtree_check) /mnt/sbd1-export localhost(rw,sync,no_subtree_check) test cases can be found at: http://fisc.stanford.edu/bug17/crash.c http://fisc.stanford.edu/bug17/crash-unlink.c Thanks, -Junfeng - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[CHECKER] Need help on mmap on FUSE (linux user-land file system)
Does anyone know how to set up mmap on FUSE (linux user-land file system)? Or is it even possible to have mmap on FUSE? Our file system checker can potentially check a lot more things if we can have mmap working on a FUSE file system. Your help on this are well appreciated! -Junfeng - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [MC] [CHECKER] Need help on mmap on FUSE (linux user-land file system)
Forget to mention, we are checking linux 2.6. It appears to us that mmap doesnt' work for FUSE in linux 2.6. -Junfeng On Sat, 12 Mar 2005, Junfeng Yang wrote: > > Does anyone know how to set up mmap on FUSE (linux user-land file system)? > Or is it even possible to have mmap on FUSE? > > Our file system checker can potentially check a lot more things if we can > have mmap working on a FUSE file system. Your help on this are well > appreciated! > > -Junfeng > > > ___ > MC mailing list > [EMAIL PROTECTED] > http://keeda.stanford.edu/cgi-bin/mailman/listinfo/mc > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[CHECKER] Does sys_sync (ext2, 2.6.x) flush metadata?
Hi, We're working on a file system checker and have a question regarding what sys_sync actually does. It appears to us that sys_sync should sync both data and metadata, and wait until both data and metadata hit the disk before it returns. Is this true for all the file systems (especially ext2) for kernel 2.6.x? I've gotten many "error" traces for ext2, where directory entries are not flushed to disk after sys_sync. In other words, even if users do call sys_sync, a crash after sys_sync call can still cause file losses. Is this intended? Thanks, -Junfeng - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
kernel debugger for 2.6
Hi, I'm working on a linux file system checker right now. Can anyone recommend a good kernel debugger for 2.6? I googled a bit before I post to this mailing. It looks like kgdb and kdg are two choices. Which one is better? Appears to me kgdb requires two machines. Thanks, -Junfeng - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
O_DIRECT on 2.4 ext3
Hi, I tried to read from a regular ext3 file opened as O_DIRECT, but got the "Invalid argument" error. Running the same test program on a block device succeeded. uname -a shows Linux *** 2.4.27-2-686-smp #1 SMP Thu Jan 20 11:02:39 JST 2005 i686 GNU/Linux My test case is #include #include #include #include #include #define BLK (4096U) main() { char buf[BLK * 2]; char *p = (char*)unsigned)buf) + (BLK-1)) & ~(BLK-1)); int fd, l; fprintf(stderr, "buf = %p, p = %p\n", buf, p); if((fd=open("sbd0", O_RDONLY|O_DIRECT)) < 0) { perror("open"); assert(0); } if((l=pread(fd, p, BLK, 0)) < 0) { perror("pread"); assert(0); } fprintf(stderr, "pread returns %d\n", l); close (fd); } Does anyone know what's going on? Thanks, -Junfeng - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?
Hi, FiSC (our file system checker) emits several warnings on ext2, jfs and reiserfs, complaining that diretories or files are lost while FiSC believes they should already be persistent on disk. (ext3 behaves correctly.) All warnings boil down to a single cause: when these file systems are mounted -o sync or dirsync, dirty blocks are still written out asynchronously. It appears to me that these mount options don't have any effect on these file systems. Is this the intended behavior? man mount shows: sync All I/O to the file system should be done synchronously. dirsync All directory updates within the file system should be done synchronously. This affects the following system calls: creat, link, unlink, symlink, mkdir, rmdir, mknod and rename. Any clafirication on this would be very helpful, -Junfeng - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?
> It may happen that FISC reads the disk before the write command even finished. > With all the HD head movement optimization in the kernel (block layer, > boiling down to TCQ/NCQ), this sounds possible. FiSC "crashes" the kernel immediately after a file system operation (creat, mkdir, write, etc) returns. Presumably, if a file system is mounted -o sync, all the FS operations should be done synchronously. i.e., if creat("foo") returns, the file "foo" better be on disk. It turns out not the case for ext2, jfs and reiserfs. -Junfeng - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?
On Thu, 3 Mar 2005, Junfeng Yang wrote: > > Hi, > > FiSC (our file system checker) emits several warnings on ext2, jfs and > reiserfs, complaining that diretories or files are lost while FiSC > believes they should already be persistent on disk. (ext3 behaves > correctly.) I forget to mention, we are mainly looking for crash-recovery bugs. The warnings can trigger this way: 1. do several file system operations 2. "crash" the test machine 3. get the crashed disk image, run fsck to recover 4. mount the recovered disk image I'm able to reproduce the same warnings on ext2 using the following program: main() { system("sudo umount /dev/hda9"); system("/sbin/mke2fs /dev/hda9"); system("sudo mount -t ext2 /dev/hda9 /mnt/sbd1 -o sync,dirsync"); creat("/mnt/sbd1/0002", 0777); mkdir("/mnt/sbd1/0003", 0777); // unplug your power cord here :) then use e2fsck to recover } uname -a shows Linux notus 2.6.8-1-686 #1 Thu Nov 25 04:34:30 UTC 2004 i686 GNU/Linux - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?
> That would be a bug. Please send the e2fsck output. Here is the trace 1. file system is made with sbin/mkfs.ext2 -F -b 1024 /dev/hda9 60 and mounted with -o sync,dirsync 1. operations FiSC did: creat(/mnt/sbd0/0001) write(/mnt/sbd0/0001) rename(/mnt/sbd0/0001, /mnt/sbd0/0002) mkdir(/mnt/sbd0/0003) 2. FiSC "crashed" the test machine after mkdir returns. Crashed disk image can be downloaded at: http://fisc.stanford.edu/bug2/crash.img.bz2 e2fsck output is: e2fsck 1.36 (05-Feb-2005) /dev/hda9 was not cleanly unmounted, check forced. Pass 1: Checking inodes, blocks, and sizes Inode 12, i_blocks is 16, should be 2. Fix? yes Pass 2: Checking directory structure Entry '0003' in / (2) has deleted/unused inode 13. Clear? yes Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Block bitmap differences: -21 Fix? yes Free blocks count wrong for group #0 (38, counted=39). Fix? yes Free blocks count wrong (38, counted=39). Fix? yes Inode bitmap differences: -13 Fix? yes Free inodes count wrong for group #0 (3, counted=4). Fix? yes Directories count wrong for group #0 (3, counted=2). Fix? yes Free inodes count wrong (3, counted=4). Fix? yes /dev/hda9: * FILE SYSTEM WAS MODIFIED * /dev/hda9: 12/16 files (0.0% non-contiguous), 21/60 blocks > > > It would be much better to test vaguely contemporary kernels. > I'm going to check 2.6.11 tonight. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [MC] [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?
> >From a quick parse, ext2 seems to be full of MS_SYNCHRONOUS holes, and > there might be some O_SYNC ones there as well. I should be able to easily add O_SYNC check to FiSC. Several questions: 1. Does O_SYNC apply to directory as well? 2. For the same file, if I open twice, once with O_SYNC and another time without, only writes through the O_SYNC fd will be sychonous, right? 3. I open a file w/o O_SYNC, issue a bunch of writes, then call ioctl(FIOASYNC) to set the fd sync, then issure a second set of writes. Only the second set of writes are synchronous? btw, man page show that O_DSYNC and O_RSYNC are just O_SYNC. Is this true for current linux kernel (2.6)? > So this wild scattergun patch probably does extra work and possibly extra > I/O all over the place, but I'd be interested if Junfeng could give it a > quick test. It's against 2.6.11. I checked 2.6.11 with your patch just now. Looks like the problem is still there. If you need more information, let me know. Image is at http://fisc.stanford.edu/bug2/crash-1.img.bz2. Below is the output from e2fsck. e2fsck 1.36 (05-Feb-2005) /dev/ide/host0/bus0/target0/lun0/part9 was not cleanly unmounted, check forced. Pass 1: Checking inodes, blocks, and sizes Inode 13, i_blocks is 16, should be 2. Fix? yes Inode 15 is a zero-length directory. Clear? yes Pass 2: Checking directory structure Entry '0005' in / (2) has deleted/unused inode 15. Clear? yes Pass 3: Checking directory connectivity Pass 4: Checking reference counts Inode 2 ref count is 4, should be 3. Fix? yes Pass 5: Checking group summary information Block bitmap differences: -21 Fix? yes Free blocks count wrong for group #0 (38, counted=39). Fix? yes Free blocks count wrong (38, counted=39). Fix? yes Inode bitmap differences: -15 Fix? yes Free inodes count wrong for group #0 (1, counted=2). Fix? yes Directories count wrong for group #0 (3, counted=2). Fix? yes Free inodes count wrong (1, counted=2). Fix? yes /dev/ide/host0/bus0/target0/lun0/part9: * FILE SYSTEM WAS MODIFIED * /dev/ide/host0/bus0/target0/lun0/part9: 14/16 files (0.0% non-contiguous), 21/60 blocks - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[CHECKER] crash after fsync causing serious FS corruptions (ext2, 2.6.11)
Hi, FiSC (our FS checker) issues a warning on ext2, complaining that crash after fsync causes file system to corrupt. FS corrupts in two different ways: 1. file contains illegal blocks (such as block # -2) 2. one block owned by two different files. I diagnosed the warning a little bit and it appears that this warning can be triggered by the following steps: 1. a file is truncated, so several blocks are freed 2. a new file is created, and the blocks freed in step 1 are reused 3. fsync on the new file 4. crash and run fsck to recover. fsync should guarantee that a specific file is persistent on disk. Presumably, operations on other files should not mess up with the file we just fsync (true ?) However, I also understand that ext2 by default relies on e2fsck to provide file system consistency. Do you guys consider the above warning as a bug or not? Any clarification on this will be very helpful. To reproduce the warning, please download the test case at http://fisc.stanford.edu/bug3/crash.tar.bz2, untar, compile and run the executable ./crash This test case is semi-automatically generated. It may contain more than enough FS operations to trigger the warning. **NOTE**: it'll run mke2fs on and reboot your machine! e2fsck output: e2fsck 1.36 (05-Feb-2005) /dev/ide/host0/bus0/target0/lun0/part9 was not cleanly unmounted, check forced. Pass 1: Checking inodes, blocks, and sizes Inode 12 has illegal block(s). Clear? yes Illegal block #-2 (2305145833) in inode 12. CLEARED. Inode 12, i_blocks is 24, should be 16. Fix? yes Duplicate blocks found... invoking duplicate block passes. Pass 1B: Rescan for duplicate/bad blocks Duplicate/bad block(s) in inode 12: 24 Duplicate/bad block(s) in inode 13: 24 Pass 1C: Scan directories for inodes with dup blocks. Pass 1D: Reconciling duplicate blocks (There are 2 inodes containing duplicate/bad blocks.) File ... (inode #12, mod time Mon Mar 7 01:27:12 2005) has 1 duplicate block(s), shared with 1 file(s): ... (inode #13, mod time Mon Mar 7 01:27:14 2005) Clone duplicate/bad blocks? yes File ... (inode #13, mod time Mon Mar 7 01:27:14 2005) has 1 duplicate block(s), shared with 1 file(s): ... (inode #12, mod time Mon Mar 7 01:27:12 2005) Duplicated blocks already reassigned or cloned. Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Unattached inode 12 Connect to /lost+found? yes Inode 12 ref count is 2, should be 1. Fix? yes Unattached inode 13 Connect to /lost+found? yes Inode 13 ref count is 2, should be 1. Fix? yes Pass 5: Checking group summary information Block bitmap differences: +(21--22) +(29--31) Fix? yes Free blocks count wrong for group #0 (37, counted=31). Fix? yes Free blocks count wrong (37, counted=31). Fix? yes /dev/ide/host0/bus0/target0/lun0/part9: * FILE SYSTEM WAS MODIFIED * /dev/ide/host0/bus0/target0/lun0/part9: 13/16 files (7.7% non-contiguous), 29/60 blocks running cmd "sudo mount -t ext2 /dev/ide/host0/bus0/target0/lun0/part9 /mnt/sbd1 -Junfeng - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CHECKER] Do ext2, jfs and reiserfs respect mount -o sync/dirsync option?
FiSC can still get those warnings with hdparm -W 0, or with a simple ramdisk that serves the disk requests whenever they are submitted. Thanks, -Junfeng On Mon, 7 Mar 2005, Alan Cox wrote: > The IDE layer default is still unfortunately broken and leaves write > caching enabled. Turn it off with hdparm. > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CHECKER] crash after fsync causing serious FS corruptions (ext2, 2.6.11)
> fsync on ext2 only really guarantees that the data has reached > the disk, what the disk does it outside the realm of the fs. > If the ide drive has write back caching enabled, the data just > might only be in cache. If the power is removed right after fsync > returns, the drive might not get a chance to actually commit the > write to platter. Hi Jens, Thanks for the reply. I tried your patch, and also setting hdparm -W0. The warning is still there. This warning and the previous ones I reported should be irrelevant to IDE drivers, as FiSC (our FS checker) doesn't actually crash the machine but simulates a crash using a ramdisk. It appears to me that this warning can be triggered by the following steps: 1. create a file A with several data blocks. fsync(A) to disk 2. truncate A to a smaller size, causing a few blocks to be freed. However, they are only freed in memory. The corresponding changes in bitmaps haven't yet hit the disk. 3. create a file B with several data blocks. ext2 will re-use the freed blocks from step 2. 4. fsync(B). Once fsync returns, crash. At this moment, the truncate in step 2 hasn't reached the disk yet, so the file A on disk still contains pointers to the freed blocks. However, the fsync(B) in step 4 flushes B's inode and other metadata to disk. Now we end up with a file system where a block is shared by two files. I'm not sure how the invalid block number warning is triggered. -Junfeng - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Ext2-devel] Re: [CHECKER] crash after fsync causing serious FS corruptions (ext2, 2.6.11)
Thanks a lot Andreas. Your message clarifies everything. > In ext3 this case is handled because the filesystem won't reallocate the > metadata blocks freed from file A before they have been committed to disk. > Also, the operations on file A are guaranteed to complete before or with > operations on file B so fsync(B) will also cause the changes from A to > be flushed to disk at the same time (this is guaranteed to complete before > fsync(B) returns). In order words, each fsync essentailly triggers a jbd commit, right? -Junfeng - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CHECKER] crash after fsync causing serious FS corruptions (ext2, 2.6.11)
> I believe the warning should go away if you mount -o sync (but then > the filesystem will perform very slowly :-). > I do agree with you, Andreas and other ppl on that this is expected behavior on ext2, and ext3 should be chosen over ext2 when such corruptions are under consideration. However, mount -o sync won't fix the problem for ext2 either :) I sent a report last week about that ext2 doesn't actually sync writes even if an ext2 partition is mounted -o sync,dirsync. Andrew confirmed that ext2 has MS_SYNCHONOUS holes (and possibly O_SYNC holes). check out http://www.uwsg.iu.edu/hypermail/linux/kernel/0503.0/1252.html or google "Junfeng mount sync". Thanks, -Junfeng - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
--update-- Re: [CHECKER] crash after fsync causing serious FS corruptions (ext2, 2.6.11)
> the mouth.) It is *expected* behaviour, yes, and it is mitigated by > two factors. (1) Metadata for ext2 is synced out every 5 seconds, > while data is synced out every 60, so the max window for this race > described above is 5 seconds, and in practice rarely shows up if you > are not using fsync. (2) Unlike BSD's fsck, when a block is owned by > two different files, we offer an option to clone the affected files so > data isn't lost, while BSD's fsck shoots both files and asks questions > later. FiSC detects another related warning: a file's data are not what they should be after fsync. Turns out that e2fsck -y clears invalid indirect blocks before it clones the shared blocks, causing file to lose data when these invalid blocks are also shared blocks. Considering the following case: file A has block 100 as indirect block on disk file B has block 100 as data block, and user writes garbage to block 100, then fsync(B). Now clearing block 100 before cloning in e2fsck would cause file B to loose its data block. Possible fix would be to clone duplicate blocks before clear invalid blocks? Any thoughts? Or user has to run e2fsck twice in this case? to reproduce the warning, get http://fisc.stanford.edu/bug5/crash.c. it uses fixed mount point /mnt/sbd0. e2fsck output: e2fsck 1.36 (05-Feb-2005) /dev/ide/host0/bus0/target0/lun0/part9 was not cleanly unmounted, check forced. Pass 1: Checking inodes, blocks, and sizes Inode 12 has illegal block(s). Clear? yes Illegal block #-2 (2517328384) in inode 12. CLEARED. Inode 12, i_blocks is 24, should be 16. Fix? yes Duplicate blocks found... invoking duplicate block passes. Pass 1B: Rescan for duplicate/bad blocks Duplicate/bad block(s) in inode 12: 22 25 26 27 Duplicate/bad block(s) in inode 13: 22 Duplicate/bad block(s) in inode 16: 25 26 27 Pass 1C: Scan directories for inodes with dup blocks. Pass 1D: Reconciling duplicate blocks (There are 3 inodes containing duplicate/bad blocks.) File ... (inode #12, mod time Tue Mar 8 21:32:38 2005) has 4 duplicate block(s), shared with 2 file(s): ... (inode #16, mod time Tue Mar 8 21:32:40 2005) ... (inode #13, mod time Tue Mar 8 21:32:39 2005) Clone duplicate/bad blocks? yes File ... (inode #13, mod time Tue Mar 8 21:32:39 2005) has 1 duplicate block(s), shared with 1 file(s): ... (inode #12, mod time Tue Mar 8 21:32:38 2005) Duplicated blocks already reassigned or cloned. File ... (inode #16, mod time Tue Mar 8 21:32:40 2005) has 3 duplicate block(s), shared with 1 file(s): ... (inode #12, mod time Tue Mar 8 21:32:38 2005) Duplicated blocks already reassigned or cloned. Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Unattached inode 12 Connect to /lost+found? yes Inode 12 ref count is 2, should be 1. Fix? yes Unattached inode 13 Connect to /lost+found? yes Inode 13 ref count is 2, should be 1. Fix? yes Unattached inode 16 Connect to /lost+found? yes Inode 16 ref count is 2, should be 1. Fix? yes Pass 5: Checking group summary information Free blocks count wrong for group #0 (28, counted=24). Fix? yes Free blocks count wrong (28, counted=24). Fix? yes Inode bitmap differences: -(14--15) Fix? yes Free inodes count wrong for group #0 (0, counted=2). Fix? yes Directories count wrong for group #0 (4, counted=2). Fix? yes Free inodes count wrong (0, counted=2). Fix? yes -Junfeng - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[CHECKER] fsync doesn't sync data properly (JFS, Linux 2.6.11)
Hi, FiSC founds a potential error on JFS (Linux 2.6.11) where fsync doesn't properly flushes out file data. Crash after this fsync causes data loss. The test case can be found at http://fisc.stanford.edu/bug9/crash.c To reproduce it, download and compile crash.c, and run it on a fresh jfs partition. File /mnt/sbd0/0006/0010/0029/0033 should contain -23,-69,101,-119, However, the crash-recovered version contains all 0s. Please let me know if you need more information. As usual, confirmations/clarifications are appreciated, -Junfeng - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[CHECKER] XFS doesn't respect mount -o sync (XFS, 2.6.11)
Hi, We are from the Stanford Checker team and are working on a file system checker called FiSC. We checked XFS and found that even when a XFS partition is mounted -o sync, file system operations are still not sync'ed correctly. A simple test case would be something like this: mkdir 0001 reboot -f -n After reboot, directory 0001 is lost. Let me know if you need any more information to reproduce the warning. We are not sure if this is the expected behavior on XFS or not, so your inputs on this are well appreciated. -Junfeng - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[CHECKER] crash + fsck cause file systems to contain loops (msdos and vfat, 2.6.11)
Hi, We are from the Stanford Checker team and are currently developing a file system checker call FiSC. FiSC mainly focuses on finding crash-recovery errors. We applied it to FiSC and found a serious error where crash then recovery cause the file system to contain loops. To reproduce the warning, download and run our test cases at http://fisc.stanford.edu/bug7/crash.c (for msdos) http://fisc.stanford.edu/bug10/crash.c (for vfat) you can also find the crashed disk images in the corresponding directories. We are not sure if these are bugs or not. Your confirmations/clarifications on this are well appreciated. -Junfeng - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[CHECKER] sync doesn't flush everything out (msdos and vfat, 2.6.11)
Hi, This is yet another report from FiSC :) This time FiSC complains that sync on msdos and vfat doesn't flush everything out. Crash after sync still causes data loss. Test cases and crashed disk images can be found at http://fisc.stanford.edu/bug8(msdos) http://fisc.stanford.edu/bug11 (vfat) Confirmations/clarifications are apprecitaed. -Junfeng - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CHECKER] crash + fsck cause file systems to contain loops (msdos and vfat, 2.6.11)
> Linus's current tree includes support for `mount -o sync' on the msdos and > vfat filesystems. Thanks Andrew. I can just do a bk clone from http://linux.bkbits.net/linux-2.6 to get Linus's current tree, right? The warning reported here doesn't need mount -o sync to trigger though. A simple crash on a default mounted FS can usually cause the FS loop. (Also, I realized I made many typos in my report --- this implies I'm tired and should probably get some sleep :) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[CHECKER] inconsistent NFS stat cache (NFS on ext3, 2.6.11)
Hi, We checked NFS on top of ext3 using FiSC (our file system model checker) and found a case where NFS stat cache can contain inconsistent entries. Basically, to trigger this inconsistency, just do the following steps: 1. create a file A1, write a few bytes to it, so A1 is 4 words 2. create a hard link A2, pointing to A1 3. stat on A2. A2's size is 4 words 4. truncate A1 to a larger size, write a few bytes at the end. now it's 1031 words. 5. stat on A2. it's size is still 4 words, which should be 1031 words We have a test case to re-create this warning. You can download it at http://fisc.stanford.edu/bug16/crash.c. It includes some sudo commands to mount nfs partitions, which you might want to change according to your local settings. cat /etc/exports shows: /mnt/sbd0-export localhost(rw,sync) /mnt/sbd1-export localhost(rw,sync) Let me know if you have any problems reproducing the warning. We'd appreciate any confirmations/clarifications. -Junfeng - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[CHECKER] sync, fsync and mount -o sync all not flush things out properly (hfsplus, 2.6.11)
Hi, We developed a file system checker called FiSC and recently applied it to hfsplus. It complains 3 things about hfsplus: 1. sync on hfsplus doesn't actually flush everything out. Immediate crash after sync still causes data-loss (testcase: http://fisc.stanford.edu/bug13/crash.c) 2. fsync on hfsplus doesn't actually flush out the file. (http://fisc.stanford.edu/bug14/crash.c) 3. mount -o sync doesn't cause file system operations to be synchronous. (http://fisc.stanford.edu/bug15/crash.c) To reproduce these warnings, download the test case and run it. You might need to customize the test case according to your local settings. Let me know if you need any more information to reproduce the warnings. Any confirmations/clarifications are appreciated. -Junfeng - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUG] 2.6.* pktgen doesn't set ethnet header properly
Hi, I tried to use pktgen module from 2.6.* kernels and found out that I couldn't receive any packets generated by pktgen. I did not even see a "packet dropped by kernel" message. It turned out that function setup_inject in net/core/pktgen.c doesn't setup the ethernet header field correctly. Below is a patch that fixes the problem. --- kernel-source-2.6.8-orig/net/core/pktgen.c 2004-08-13 22:37:26.0 -0700 +++ kernel-source-2.6.8/net/core/pktgen.c 2005-01-19 17:54:46.0 -0800 @@ -259,6 +259,9 @@ /* Set up Dest MAC */ memcpy(&(info->hh[0]), info->dst_mac, 6); + + /* Set up protocol */ + ((struct ethhdr *)(info->hh))->h_proto = htons(ETH_P_IP); info->saddr_min = 0; info->saddr_max = 0; -Junfeng - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/