Reset errors in iostat output - Systems Maintenance(Archived)

I would like to reset errors in iostat output. Is it possible without rebooting server? I am running Solaris 9 - SunOS imcfoxy 5.9 Generic_122300-14 sun4u sparc SUNW,UltraAX-i2. 

Hi,
Would you please precise about the error msg you are getting? 

I assume the OP is talking about the counts like "Soft Errors" and "Hard Errors" for devices.
I don't know of any way to reset the counts short of unloading the drivers. And I don't think you will be able to unload any of your storage drivers without a reboot.
--
Darren 

Do you have a method for resetting iostat -En errors, like Predictive Failure Analysis, counts to zero without rebooting the machine?
Thanks in advance,
Rev. 

The answer is still no. Theres no way except to reboot the machine.

Related

Strange boot problems after patch 118844-20

Hello,
I am seeing weird behaviour after installation of patch 118844-20 and its required patch 118344-05
I am running Solaris 10 x86 3/05 release inside VMWare workstation build 5.0.0.-13124
Here are the symptoms:
I did all patch installation in single-user mode using patchadd.
After 118344-05 is installed, the reboot comes up just fine.
After 118844-20 is installed, the reboot comes up, but very quickly a stacktrace of some sort is seen before the system auto-reboots again (and the BIOS screen is shown). The time the trace is shown is too short to see exactly what is said. The system now keeps on trying to boot, then going back to BIOS init, etc
Attempting to see the trace, I boot with flags: 'b kmdb -d -s' but to my astonishment, the system booted just fine into single-user mode. The patches show as installed.
I rebooted again, this time trying with flag 'b mkdb' only, and again I see no boot problems and I got into the Xserver just fine. Very strange.
Does anybody know what is going on? Has anybody experienced this as well? Does anybody know how to intercept the stacktrace?
Thanks a lot!
Gert-Jan (ej_bartelds#hotmail.com)
I 
I am running Solaris 10 x86 3/05 release inside
VMWare workstation build 5.0.0.-13124
After 118844-20 is installed, the reboot comes up,
but very quickly a stacktrace of some sort is seen
before the system auto-reboots again (and the BIOS
screen is shown). The time the trace is shown is too
short to see exactly what is said. The system now
keeps on trying to boot, then going back to BIOS
init, etcYep.
http://groups.yahoo.com/group/solarisx86/message/30010
http://groups.yahoo.com/group/solarisx86/message/30287
http://groups.yahoo.com/group/solarisx86/message/30406
It appears to be a know bug, 6342422:
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6342422
Attempting to see the trace, I boot with flags: 'b
kmdb -d -s' but to my astonishment, the system
booted just fine into single-user mode. The patches
show as installed.
I rebooted again, this time trying with flag 'b mkdb'
only, and again I see no boot problems and I got into
the Xserver just fine.Yep.
Very strange.
Does anybody know what is going on?Loading the kernel debugger changes the way memory is
allocated from boot.bin.
Has anybody experienced this as well?Yes.
Does anybody know how to intercept the stacktrace?You can try to boot with "b -asv" and take real screenshots using
a digital camera. 
Juergen,
thanks for your answer. Only after I posted here did I see the messages at http://groups.yahoo.com/group/solarisx86 but thanks anyway.
That "b -k" circumvents the problem is something I found out myself when trying to intercept the stacktrace ;-)
It appears to be a know bug, 6342422:I didn't know it was logged as such already.
Loading the kernel debugger changes the way memory
is allocated from boot.bin.Nifty.
You can try to boot with "b -asv" and take real
screenshots using a digital camera.Emmmm...I may try that. For now "-k" works for me.
Once again, thank you for replying. --GJ                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               

Error message on Solaris booting

It was troubled, I have a Question.
When starting Solaris2.5.1(x86), the following error message is always displayed.
In order not to display this message, what should it carry out?
Moreover, what is a cause to display?
Cause is Hardware? or software?
-------boot message-----
Hostname : TEST01
The file system (/dev/rdsk/c1d0s0) is being checked.
/dev/rdsk/c1d0s0: INCORRECT BLOCK COUNT I=1053783 (2 should be 0) (CORRECTED)
/dev/rdsk/c1d0s0: FREE BLK COUNT(S) WRONG IN SUPERBLK (SALVAGED)
/dev/rdsk/c1d0s0: 25250 files, 419799 used, 1833112 free
/dev/rdsk/c1d0s0: (3424 frags, 228711 blocks, 0.1% fragmentation)
The system is coming up. please wait.
-----end----------------
Usually, "init 0" (or "shutdown -y -g0 -i0") is used when shutting off a power supply.
In this case, an error message(incorrect block ,,,,,) is displayed.
But often, when I use "halt" , an error message may not be displayed
It cannot repair even if it uses the "fsck" command.
(Ex. fsck -F ufs -o b=32 /dev/rdsk/c1d0s0)
I have no idea about a cause.
please let me know If there are informations.
----system environment----
Solaris 2.5.1 (for Intel)
Pentium MMX 233MHz, 64MB memory, NIC:3C509B,
SCSI:AHA-1510B, Serial Multiport:aurora 8000P,
HDD: WD200BB (20GB, IDE)
--------------------------
Most likely, this is nothing to worry about. The message is fairly normal. There were some problems with sync'ing disks when such old Solaris x86 systems shut down, but it gets cleaned up when the system boots (which is what the message is telling you).
If reliability is important to you, upgrade to a modern version of Solaris and turn on file system logging.
Richard 
Hi Richard.
Thank you for your advice.
I get to understand that is nothing to worry about.
In conclusion,
I guess this cause is maybe sync'ing disks.
But I cannot upgrade this system,
for customer's demand.
I will try some other version of Solaris.
I expect that the message is not displayed.
Best Regards thank you.
re-kam

Fast Data Access MMU Miss error at the {ok} prompt

Hi guys,
I have a really nasty problem with one of our SunFire V280.
I'm trying to reinstall its OS and when try to issue most of the commands at the OK prompt I get the Fast Data Access MMU Miss error and don't know how to proceed.
I've found this topic on the forums - http://forums.sun.com/thread.jspa?forumID=841&threadID=5116563 ,
but when I try to change the auto-boot? variable to false it doesn't changes and stays true. And this is with all other variables - I can't change them.
What would you advise me to get this machine running?
Thanks 
sounds like faulty mem, we had it even worse on a 280. came up to an ok prompt, but not openboot ok... had to swap out the bad mem 
It is not a faulty memory.
I've tried today to replace all memory banks, hard disks and starting it with every single CPU separately with no success. When poweron it the messages are "CPU seeprom .... 0000.0000...", after it "Aborted due to RSC command" or similar and gives me an ok prompt and whatever I try to do, I get the MMU error message.
I suspect there is something dead on the mainboard, if not the whole board. 
do you have maint on this server? only other similar item i recall was a random system halt on a 480. turned out one of our processor boards in it was bad. used the serial port to catch the hw faults and we did have maint so sun was able to use the logs/serial output to track down the part.
can you turn up the openboot diag level or is it already set pretty high? 
I can't change any of the variables in the open boot :( 
Sounds like a hardware error to me. 
I had 2 old systems, an Ultra 10 and an Ultra 2, that had not been used for a while. Both of them had the "Fast Data Access MMU Miss" error, even when I tried to boot from the CD.
Both had bad hard drives which have been replaced. The machines run fine now. 
Hi,
I had the same problem but in my case it appeared that the CD was written on speed x48 which was to high for old type cdrom.
Try to write solaris iso image on CD with speed x16.
BR

URGENT!! Fast Data Access MMU Miss

Hello there!
I have SB 2000 running Solaris 9. One morning when I came to the the datacenter the SB 2000 server stucked at the ok prompt showing the following.
ERROR: Last Trap: Corrected ECC Error
Error -256
ERROR: Last Trap: Fast Data Access MMU Miss
Error -256
I thought that was due to power failure and I tried to reinstall the server with Solaris 9 and 10 but it persisted showing me the above error. The server was running backend services for my Sun JES messaging server. So I need your prompt reply ASAP. 
Sounds like a memory error.
Try the eeprom self tests.
Do or test-all at the eeprom prompt or boot with the key in the diag position. 
I run the test-all command at the obdiag prompt and tall devices passed the test except the gpio#1,300600 which showed me failed status.
obdiag> test 9
Testing /pci#8,700000/ebus#5/gpio#1,300600
ERROR : GPIO PORT1 DATA register
...
Error: /pci#8,700000/ebus#5/gpio#1,300600 selftest failed, return code = 1
selftest at /pci#8,700000/ebus#5/gpio#1,300600 (errors=1) ..........failed
Any idea please? 
I'm guessing you don't have a support contract or you wouldnt be asking here.
I thought that test-all would test memory. But apparently it only test all IO devices.
So I guess you need the eeprom "post" command.
But the GPIO error is disturbing anyway.
I'm not sure what the GPIO device is. Maybe its a card you can remove. What cards do you have installed.
What does the eeprom devalias command show.
And if you have a LOM or SC. What does a showfru
command show. 
I am very sorry that I am late to reply to what you asked me. I was like a busy.
The server has only VGA card installed. Unfortunately It doesnt have SC more over when I run reset-all command to get into the obdiag, the machine tunrned off but couldnt get in to the ok prompt again. It only displays a blank screen. The problem is getting worse...
So now I can't post the out put of the eeprom command....
Any more suggestion please.. 
She's dead jim, send it in for service.
If thats not an option, there are a couple of things you can try.
But their clutching at straws.
Pull out the vga card and try a serial console.
Swap the ram. The system handbook should tell you the specs of ram you need, or swap ram from a known good system.

Problem booting cluster node with SATA (AHCI) boot drive

Hi,
I am pretty much new to Sun Cluster, so apologies in advance if that is something well known.
I am installing SC 3.2 on Solaris 10 08/07 x86 (IBM x3550). The machine boots off
internal SATA drive. It works quite well, up until the point it tries to boot in cluster
mode. When doing so everything hangs when cluster tries to get access to the global
devices. After some timeout there is a Recoverable Error message spitted on the
console (pointing to the quite far block (last ?) on the root disk. After that system
just sits there and nothing happens.
It runs happily, though, when booted with -x flag.
I am scratching my head trying to figure out what else to try.
Anyone seeing similar behavior ?
Thanks in advance,
Cyril 
It's difficult to tell what might be happening. I assume you installed a single node cluster cluster configuration as it sounds like you only have one node.
Booting -v might help to see where it is failing as I don't think there is enough information to diagnose it otherwise.
Regards,
Tim
--- 
Hi,
I am not sure if this is the same problem, but when I recently tried to boot a system with a SATA boot device, S10 panic'ed. Root cause is a problem with an IOCTL in the SATA driver that is used by Sun Cluster to detect all the devices while trying to build the global device tree.
What is strange is that your system hangs.
This bug is fixed in Open Solaris and will be fixed in S10U5. I'll setup the same system with the latest Solaris Express release later this week to see whether the fix works. You could do the same, just to find out whether it is the same problem.
As the system that you are using as well as the system that I was using has never been qualified for use with SC, there is currently no way to get a backport of that patch. If we get the right business case, then we can move on.
Regards
Hartmut 
I understand that I provided less than satisfactory amount of info.
So here is more.
I am installing the two node cluster and during scinstall one of the nodes is being rebooted
and goes through (what I am suppose to be) an initial configuration. At the very end of the
boot process there is a message
obtaining access to all attached disksAt this point the boot disk activity LED is lit constantly. After some longish timeout
the following message is printed to console
NOTICE: /pci#0,0/pci1014,2dd#1f,2: port 0: device reset
WARNING: /pci#0,0/pci1014,2dd#1f,2/disk#0,0 (sd1):
     Error for Command: read(10)          Error Level: Retryable
     Requested Block: 135323318          Error Block: 135323318
     Vendor: ATA                    Serial Number:
     Sense Key: No Additional Sense
     ASC: 0x0 (no additional sense info), ASCQ: 0x0, FRU: 0x0and the disk activity LED is turned off. After that nothing more happens. The system isn't
hard hang, since the keyboard is working, and it responds to ping, but other than that
nothing seems to be functioning.
I understand that diagnosing such a problem isn't easy, but I am willing to invest some
time into getting it working. I would rally appreciate some help with this issue.
Regards,
Cyril 
According to Hartmut, very possibly that this bug is solved in OpenSolaris. Can you install Solaris Cluster Express 7/07 and check whether the problem persists? 
Looking at the error message, it is not so likely that it is the same problem. Although from the time during the boot process where this happens it is identical. SC tries to find out about all the available disks. It uses a special IOCTL (DKIOCSTATE) to do this. You might want to debug your system and find out whether this is the last action that happened.
What confuses me even more is that I now have 3 scenarios:
- my laptop has a SATA boot disk and works fine with SC3.2
- another server with SATA disks sees a panic and
- your system sees a hang.
Giving Solaris express a try is the only other hint that I can give to you.
Hartmut

Categories

Resources