Pages

Friday, November 24, 2017

VMWare LogInsight Agent Install

VMWare LogInsight Windows Agent Install


Recently I have had an ongoing trial of VMWare 6.5U1 running on my desk - literally 2 rack mount servers sitting on my  desk taking up some space. Anyways I noticed with the trial that I was able to install and use LogInsight.  I was at VMWare World in 2013 and got to see this tool at version 1.x and it was wonderful.  So what's changed and improved was my first thought.  

Well only way to find out was to test.  Adding the VCSA was a snap and I got data off it pretty quickly and hey it added all the hosts - easy enough.  How to add a Windows box?  Oh look agents...

Easy enough to download and since I have a domain admin account I can add to my host - WRONG!

"The system administrator has set policies to prevent this installation"

What?!? I am the system admin and I in no way set this or did I?  Well after trying a couple of other accounts, turning off and then back on UAC and other things; I finally figured it out.  

Right click the MSI and Run As - local admin account.  This agent appears to be adding local security auditing which thanks to some recent Windows 2012 R2 and Windows 2016 patches has changed and requires this user account to install and make those changes.  Thanks Microsoft!

Off to go and test this tool, hope this helps someone else.

Thursday, November 16, 2017

VMWare Converter 6.1.1 Headaches

VMWare Converter 6.1.1 Headaches

Ever have one of those moments when everything should be easy and just nothing goes your way?  That was my experience with Converter 6.1.1.  I have used this tool many times before in other environments but at my current post, this tool would not run at all.

My plan was to convert a powered on Remote Windows Machine. So things I tried.

>IP or name - never really an issue
>Username - domain\user; local\localadmin;

Either account gave me the dreaded permissions to the ADMIN$ share.  So went to the VMWare KB on the tool for common troubleshooting and tried all suggestions.  None of these worked.

Found the log files at C:\ProgramData\VMware\VMware vCenter Converter Standalone\logs - it was here I finally was given a clue to how to solve my issue.

During my first attempt I had advised the tool to automatically remove the agent afterwards. The logs confirmed the install and gave an exit code of 0.  All attempts afterwards were failing as the tool was already recognized the agent was present but it could not communicate with it.

Checked the firewall on the Windows client and no issue there.  So then I manually installed the agent to confirm it was present.  Tested again, and same issue.  Did a reboot of the machine - now it works like a charm using the domain account or the local user.

This puts a bit of a kink in my environment so now I am going to install the client on multiple hosts and reboot at scheduled times.  At least now I can live clone the machines now and schedule a switch over with the tool for after hours.


Monday, November 21, 2016

Which Disk Do I Expand Again?


Ever run into that nasty and annoying problem with a file server that screams, "I NEED DISK SPACE AND I WANT IT NOW!"

As a Systems Engineer or Admin you know that it is only a matter of time when this happens and not really something you can anticipate.  Sure you can track, add quotas and everything else but there is that one project that just pops up and now your file server is crying in its virtual environment somewhere.

So typically this is easy in VMWare where you locate your drive and expand it.  If you were proactive you might have two drives, but what happens if you have more?  Which disk do you expand?  Which is exactly my issue.


As you can see the old Admin did not leave me anything helpful to go by.  So I then looked at the Properties of my drive, looking for some information I could use.


And that is where I found my help.  Finally something I could use!

The first thing we need to understand is that Bus Number 0 is NOT the equivalent to the Bus Number actually in VMWare.  The VMWare Bus Number translates in the Windows "Location" value, so it is a bit misleading.  So I did a bit of research and cross-compared to my VMWare environment.

VMWare Bus Number Windows Location
older version 6
0 160 160
1 192 224
2 224 256
3 256 161
L

The Unit Number in VMWare relates to the Target ID in Windows.

For my issue the VMWare disk was on controller 1 or 1:0:0, which I then easily expanded.


Wednesday, September 7, 2016

Why do my Volumes fall offline?


Had this silly thing happening over and over again for one of my VMs.  It started something like this…

Each and every time I rebooted my new Win 2012R2 host any disk other than the OS volume would always show offline.   Patched the host, like a good Systems Engineer should, still no change.

Confirmed all VMWare machine type was current.  It was - and yes this means I am using VMWare and my preferred Hyper-V - but that is another story all together.

Then the light bulb went off!  This used to occur a good bit in 2008 and 2008R2 when I was working heavily on customer systems.  It could not be the same DISKPART setting could it?  Yes, yes it was.

DISKPART> SAN

SAN Policy : Offline Shared

DISKPART>SAN Policy=OnlineAll

So this solved my issue for this single host which is not clustered.  Yea me!


However I must caution there is a reason why this is set this way.  This is set this way by design from Microsoft to protect shared disks from being accessed by multiple servers.  Basically this is required for clustering as the Cluster Filter driver and services will "own" this disk and the volume on it.  If you set this on a node of a cluster you will corrupt data.

Monday, January 5, 2015

Troubleshooting iSCSI Networking

So for this New Year we all created resolutions right?  So for this year I have set a goal to write at least one article a week.  So with that in mind let's get to it shall we.

Troubleshooting iSCSI connectivity has been one of the most curious pieces of my job now for the last few years and it seems to be one of the most misunderstood.

So first of all let's understand the event logs and where to get some help in understanding them.

iSCSI Initiator Users Guide for Windows 7 and Windows Server 2008 R2

Approximate page 105 starts all the Event IDs and good general description of what the error is; however that is it.  It is generic.

Let's examine this event ID:

Event ID 20
Connection to the target was lost. The initiator will attempt to retry the connection

This event is logged when the initiator loses connection to the target when the connection was in iSCSI Full Feature Phase. This event typically happens when there are network problems, network cable is removed, network switch is shutdown, or target resets the connection. In all cases initiator will attempt to reestablish the TCP connection.

So what does this really mean?  

Where is my issue?  The key is that this is NOT an iSCSI initiator issue; however everything on the network chain is suspect to your storage array.  So when did the issue start?  What change in this environment occurred? Nothing? OS updates?  Has your utilization and general usage of the volume attached grown?

So let's start at the lowest hanging / easiest pieces to resolve.  Network drivers should be reviewed and you should consider updating them if possible.  NIC firmware and switch firmware is a bit of a consideration as well, but biggest thing to consider is the array firmware.  What updates and fixes have occurred to potentially resolve it.

In your OS performance trace the error output of the NICs.  Each OS has a bit of a different way to do this and Windows is fairly easy.  You could start a trace on the iSCSI MS iSCSI drivers.  It is good but typically it will show us that the is not the service.  We can then setup a Data Collector Set for the error output of the NICs to confirm if this is a hardware error.  Focus on the Physical Network Adapter and not interface - the following counters make it easier to focus in on the error.

Network Interface(*)\Bytes Received/sec
Network Interface(*)\Bytes Sent/sec
Network Interface(*)\Current Bandwidth
Network Interface(*)\Output Queue Length
Network Interface(*)\Packets Outbound Errors
Network Interface(*)\Packets Receive Errors

No errors reported?  Review then the switch logs.  Do you have pause frames on the ports that the iSCSI is utilizing?  If so which?  Typically I see them on the array side and then I have to deep dive our array's performance numbers.  Often I see the issue is a design in the RAID Group and the number of drives. 

So bottom line.  

Is it an OS problem? Could be, but often no.  Is it a switch issue?  Unless it is known issues with firmware then the answer is - No.  Is it an array issue?  Just as the switch firmware fixes can resolve issues, but often the answer here is - not really.  Consider this an indicator that your data needs have grown and it is time to consider the design and throughput of the environment you have now.  Look at the simple things first which often seem to reduce the issue and allow you to regroup and consider:

>Isolate iSCSI networking to its own physical switches - just make sure each has enough resources to handle the burst traffic that iSCSI is infamous for.  Your array vendor should have a matrix of what switches are tested and validated.  You can use those are references.

>Isolate the iSCSI network to its own NICS where possible.  If the OS is a guest VM this is really important.  When you present a NIC to the hypervisor that is not utilizing SRIOV the host has to manage the virtual traffic and sharing it with the host's iSCSI network often can add the extra congestion.

>Consider your array's design.  Whether you utilize Disk Pools, RAID Groups or something other - each has their limits in hardware and software.  Ensure you are not overrunning these.

Monday, August 25, 2014

Awesome TechNet FAQ on Storage Spaces (part 2)

Been gone for a bit due to work schedules and personal training schedules.  I cannot believe the class I have been waiting on for a year plus finally opened up and I got the opportunity!  Whee for me!

Any ways back to looking at this article.

The subtitle: What are the recommended configuration limits?

There is really nothing that I can add or take away from this side of the article.  These here are incredibly vague and gives us the sense that we can use, well almost, ANYTHING with a SAS card, JBOD and some drives.  This sadly is not the case as experience is teaching me.  Please ensure you do your research on your hardware!

SAS HBAs

Unfortunately not all are created equal.  Even though MS will say it can support it the hardware vendor might not.  The chipset used in the HBA might not as well.  So which to use and why?

I am most familiar with Dell hardware (well that is just how it seems to be for me at this time), not that other vendor equipment is out of spec or out of line for me to work on.  It just seems for me that is what I get to work with most often.

So which card to use?  Right now there is the SAS6E and LSI's 9207-8e available from Dell.  Either really allows us to work with Storage Spaces nicely.  Before making the decision here, look at the JBODs first!  I will advise you in just a bit why!

JBODs

This is where it gets exciting and where we have to use some insight as to how many pools and how large we want to grow to.  As always make sure you have expansion room and growth.  The most complex design is not always the best, so when possible keep it simple.

What is available?  Once again I can only speak to what I know - the MD1220, MD1200 and MD3060e.  All of which work wonderfully?!?  Yes and no!

This is where i have to say look before purchase.  The MD12xx is awesome for that simple single server running role X.  It will work with a cluster even.  There is even a white paper on how to deploy multiple JBODs into a HA configuration.  However let's look at the day to day management side.

The Enclosure Management Modules (EMMs) on the MD12xx are simple and do not have any real intelligence to them.  Why is this important?  Well one of the quarterly operations you will perform will be drive firmware updates.  In the MD12xx series you will HAVE to down the cluster nodes or single server and boot into the Life Cycle Controller to update drive firmware.  There is not an in OS option to perform this.

The MD3060e however has a CLI software package that can inside the OS update drive firmware.  You will still need to down the storage pool, but if you properly design this into multiple smaller pools this should work fine.  You then update the drives one at a time, then online the pool again and done.  This is only possible due to the EMM firmware on these JBODs.

Besides the 2nd benefit is you get 60 drives in a single enclosure of 4U size (the same as 2 MD12xx JBODs).  Another benefit is these EMMs are a bit more intelligent and you can setup a telnet session to these modules.  This will allow you to monitor the backend storage which Storage Spaces at this time seems to lack.

The only catch to the MD3060e is you CANNOT use the SAS6e card - you must use the LSI 9207-8e for it.  Yes it is a big more expensive, but save the headache.

Drives

Last but honestly one of the most important pieces.  Select the correct drives.  Since we are using SAS technology for Storage Spaces we are really able to use ANY drive.  The cards will not alert and bug us that drive X is not supported.  So nice to have some freedom right?  Well not really.

Drives are one of the areas that we cannot skimp on.  Do the research here and find drives that are compatible with your JBOD's vendor.  The management modules need to have no latency in communicating to the drives.  Another point is drive firmware changes often - if you do not get "supported" drives by your hardware vendor then you will never get that update.

Lastly and most importantly if your drive has a recall or some quality issue and you need a RMA - your hardware vendor on supported drives often will contact you and "swap" it out immediately for a different drive. This is much better than going directly to the drive manufacture and waiting 6-9 weeks for this same swap.

Summary

Do the research on your hardware
Sit down and design into multiple pools. JRR Tolkien was right - one pool to rule them all is not always best =P




Tuesday, July 29, 2014

Awesome TechNet FAQ on Storage Spaces

So going through a bit of articles today.  Yes I was having to find some study and work to keep myself productive today.  Not a bad thing right?

In my reading I found the following "awesome" FAQ on Storage Spaces: http://social.technet.microsoft.com/wiki/contents/articles/11382.storage-spaces-frequently-asked-questions-faq.aspx#How_can_I_manage_Storage_Spaces

So call outs on it, which I may spread out in a couple of articles here.

The subtitle:  How can I manage Storage Spaces?

This is by far one of my most painful areas of this awesome product.  While I can create a very simple and basic pool from the GUI and I have the flexibility of advanced options in PowerShell; it really is not as ease of management that it could be.

Maybe I am not as "fair" as I could be.  I come from a background where I used hardware array's and the vendors created a very elaborate GUI or used off-array tools to manage.  So yes I am a little tainted.  If it was a bit more of a stand alone module - say like Fail-Over Cluster Manager? Maybe?

The key is for MS to have this software-defined storage win people who have been using say EMC, Netapp, Dell, or other vendor's solution who have most of the information about the storage within easy grasp/view.  Storage guys do not want to look too deep or too long to find why virtual disk X is offline - too much rides on it to not have an answer fast.

To offset this I have used PowerShell to output some "typical" commands that I use and make a nice table for me to review.  I am by no means the 'Shell guru that many others are but now with ISE I can make a simple .ps1 and run it locally.  Export it to HTML and it is not a bad format.  Any ideas?